对图像求梯度的异同
动机
最近在做补丁攻击相关研究,对使用adam更新与直接使用梯度更新方差产生了兴趣,想搞清楚这二者是否是一样的。此博客记录本次小测试与结论。
先说结论:使用adam对补丁进行更新与使用torch.autograd.grad()方式求得的梯度是一样的。
验证方式
import torch
from torch import models
import torch.nn as nn
model = models.resnet(pretained=True)
criterion = nn.CrossEntropy()
patch = torch.rand((1, 3, 50, 50))
patch = torch.autograd.Variable(patch)
patch.requires_grad = True
# ---------------- 1. 使用adam进行优化 ----------------
optimizer = torch.optim.Adam([patch], lr=0.01)
masks = torch.zeros((1,3,224,224))
mask[:,:,87:137,87:137] = 1
inputs = torch.ones((1, 3, 224, 224))
label = torch.tensor([1])
patch = F.interploate(patch, (224,224))
input_patch = inputs * (1 - mask) + patch * mask
output = model(input_patch)
loss = criterion(output, label)
grad_1 = torch.autograd.grad(loss, patch, create_graph=False, retain_graph=False)[0] # shape: [3,50,50]
optimizer.zero_grad()
optimizer.backward()
optimizer.step()
# ---------------- 2. 使用梯度进行优化 ----------------
masks = torch.zeros((1,3,224,224))
mask[:,:,87:137,87:137] = 1
inputs = torch.ones((1, 3, 224, 224))
label = torch.tensor([1])
patch = F.interploate(patch, (224,224))
input_patch = inputs * (1 - mask) + patch * mask
input_patch.requires_grad = True
output = model(input_patch)
grad_2 = torch.autograd.grad(loss, input_patch, create_graph=False, retain_graph=False)[0] # shape: [1,3,224,224]
grad_2_patch = grad_2[:,:,87:137,87:137]
# 判断二者梯度是否一致
print("Adam 优化的梯度", (grad_1.sign() == 1).sum().item())
print("操作优化的梯度", (grad_2_patch.sign() == 1).sum().item())
# 结论:上述二者的输出是一致的,因此两种更新方法没有差别