Difference between quot;detach()quot; and quot;with torch.nograd()quot; in PyTorch?(“分离()之间的区别和“with torch.nograd()在 PyTorch 中?)
问题描述
我知道两种从梯度计算中排除计算元素的方法backward
I know about two ways to exclude elements of a computation from the gradient calculation backward
方法一:使用with torch.no_grad()
with torch.no_grad():
y = reward + gamma * torch.max(net.forward(x))
loss = criterion(net.forward(torch.from_numpy(o)), y)
loss.backward();
方法二:使用.detach()
y = reward + gamma * torch.max(net.forward(x))
loss = criterion(net.forward(torch.from_numpy(o)), y.detach())
loss.backward();
这两者有区别吗?两者都有好处/坏处吗?
Is there a difference between these two? Are there benefits/downsides to either?
推荐答案
tensor.detach()
创建一个与不需要 grad 的张量共享存储的张量.它将输出与计算图分离.所以不会沿着这个变量反向传播梯度.
tensor.detach()
creates a tensor that shares storage with tensor that does not require grad. It detaches the output from the computational graph. So no gradient will be backpropagated along this variable.
包装器 with torch.no_grad()
临时将所有 requires_grad
标志设置为 false.torch.no_grad
表示没有操作应该构建图.
The wrapper with torch.no_grad()
temporarily set all the requires_grad
flag to false. torch.no_grad
says that no operation should build the graph.
不同的是,一个变量只引用一个给定的变量,它被调用.另一个影响在 with
语句中发生的所有操作.此外,torch.no_grad
将使用更少的内存,因为它从一开始就知道不需要梯度,因此不需要保留中间结果.
The difference is that one refers to only a given variable on which it is called. The other affects all operations taking place within the with
statement. Also, torch.no_grad
will use less memory because it knows from the beginning that no gradients are needed so it doesn’t need to keep intermediary results.
从 此处.
这篇关于“分离()"之间的区别和“with torch.nograd()"在 PyTorch 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!