PyTorch RuntimeError: DataLoader worker (pid(s) 15332) exited unexpectedly(PyTorch RuntimeError: DataLoader worker (pid(s) 15332) 意外退出)
问题描述
我是 PyTorch 的初学者,我只是在此网页上尝试一些示例.但由于此错误,我似乎无法运行super_resolution"程序:
I am a beginner at PyTorch and I am just trying out some examples on this webpage. But I can't seem to get the 'super_resolution' program running due to this error:
RuntimeError: DataLoader worker (pid(s) 15332) 意外退出
上网查了一下,发现有人建议将num_workers
设置为0
.但是如果我这样做,程序会告诉我内存不足(无论是 CPU 还是 GPU):
I searched the Internet and found that some people suggest setting num_workers
to 0
. But if I do that, the program tells me that I am running out of memory (either with CPU or GPU):
RuntimeError: [enforce fail at ..c10coreCPUAllocator.cpp:72] 数据.DefaultCPUAllocator:内存不足:您试图分配 9663676416 字节.购买新内存!
或
运行时错误:CUDA 内存不足.尝试分配 1024.00 MiB(GPU 0;4.00 GiB 总容量;已分配 2.03 GiB;0 字节空闲;PyTorch 总共保留 2.03 GiB)
我该如何解决这个问题?
How do I fix this?
我在 Win10(64 位)和 pytorch 1.4.0 上使用 python 3.8.
I am using python 3.8 on Win10(64bit) and pytorch 1.4.0.
更完整的错误信息(--cuda
表示使用 GPU,--threads x
表示将 x
传递给 num_worker
参数):
More complete error messages (--cuda
means using GPU, --threads x
means passing x
to the num_worker
parameter):
- 带命令行参数
--upscale_factor 1 --cuda
File "E:Python38libsite-packages orchutilsdatadataloader.py", line 761, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "E:Python38libmultiprocessingqueues.py", line 108, in get
raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Z:super_resolutionmain.py", line 81, in <module>
train(epoch)
File "Z:super_resolutionmain.py", line 48, in train
for iteration, batch in enumerate(training_data_loader, 1):
File "E:Python38libsite-packages orchutilsdatadataloader.py", line 345, in __next__
data = self._next_data()
File "E:Python38libsite-packages orchutilsdatadataloader.py", line 841, in _next_data
idx, data = self._get_data()
File "E:Python38libsite-packages orchutilsdatadataloader.py", line 808, in _get_data
success, data = self._try_get_data()
File "E:Python38libsite-packages orchutilsdatadataloader.py", line 774, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 16596, 9376, 12756, 9844) exited unexpectedly
- 带命令行参数
--upscale_factor 1 --cuda --threads 0
File "Z:super_resolutionmain.py", line 81, in <module>
train(epoch)
File "Z:super_resolutionmain.py", line 52, in train
loss = criterion(model(input), target)
File "E:Python38libsite-packages orch
nmodulesmodule.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "Z:super_resolutionmodel.py", line 21, in forward
x = self.relu(self.conv2(x))
File "E:Python38libsite-packages orch
nmodulesmodule.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "E:Python38libsite-packages orch
nmodulesconv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "E:Python38libsite-packages orch
nmodulesconv.py", line 341, in conv2d_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 4.00 GiB total capacity; 2.03 GiB already allocated; 954.35 MiB free; 2.03 GiB reserved in total by PyTorch)
推荐答案
对于 GPU 内存不足错误没有完整"的解决方案,但是您可以做很多事情来缓解内存需求.另外,请确保您没有同时将训练集和测试集传递给 GPU!
There is no "complete" solve for GPU out of memory errors, but there are quite a few things you can do to relieve the memory demand. Also, make sure that you are not passing the trainset and testset to the GPU at the same time!
- 将批量大小减少到 1
- 降低全连接层的维数(它们最占用内存)
- (图像数据)应用中心裁剪
- (图像数据)将 RGB 数据转换为灰度
- (文本数据)在 n 个字符处截断输入(这可能没有多大帮助)
或者,您可以尝试在 Google Colaboratory(K80 GPU 上的 12 小时使用限制)和 Next Journal 上运行,两者都提供高达 12GB 的免费使用.最坏的情况是,您可能必须对 CPU 进行培训.希望这会有所帮助!
Alternatively, you can try running on Google Colaboratory (12 hour usage limit on K80 GPU) and Next Journal, both of which provide up to 12GB for use, free of charge. Worst case scenario, you might have to conduct training on your CPU. Hope this helps!
这篇关于PyTorch RuntimeError: DataLoader worker (pid(s) 15332) 意外退出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!