当前位置：首页 > news >正文

Perturbed-Attention Guidance(PAG) 笔记

news 2026/2/9 8:12:41

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Github

摘要

近期研究表明，扩散模型能够生成高质量样本，但其质量在很大程度上依赖于采样引导技术，如分类器引导（CG）和无分类器引导（CFG）。这些技术在无条件生成或诸如图像恢复等各种下游任务中往往并不适用。在本文中，我们提出了一种新颖的采样引导方法，称为Perturbed-Attention Guidance（PAG），它能在无条件和条件设置下提高扩散样本的质量，并且无需额外的训练或集成外部模块。PAG 旨在通过去噪过程逐步增强样本的结构。它通过用单位矩阵替换 UNet 中的self-attention map来生成结构退化的中间样本，这是考虑到自注意力机制捕捉结构信息的能力，并引导去噪过程远离这些退化样本。在 ADM 和 Stable Diffusion 中，PAG 在条件甚至无条件场景下都显著提高了样本质量。此外，在诸如空提示的 ControlNet 以及图像修复（如修补和去模糊）等现有引导（如 CG 或 CFG）无法充分利用的各种下游任务中，PAG 也显著提高了基线性能。
在这里插入图片描述
研究表明，在diffusion U-Net的self-attention 模块中，query-key 主要影响structure ，values主要影响appearance。

如果直接扰动Vt 的话，会导致 out-of-distribution (OOD)，因此选择使用单位矩阵替换query-key 部分。

那么具体扰动Unet的哪一部分呢？作者使用了5k个样本，在PAG guidance scale s = 2.5 and DDIM 25 step的条件下，表现最好的是mid-block “m0”
在这里插入图片描述

代码

Diffusers 已经支持PAG用在多种任务中，并且可以和ControlNet、 IP-Adapter 一起使用。

from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torchpipeline = AutoPipelineForText2Image.from_pretrained("~/.cache/modelscope/hub/AI-ModelScope/stable-diffusion-xl-base-1___0",enable_pag=True,  ##addpag_applied_layers=["mid"], ##addtorch_dtype=torch.float16
)
pipeline.enable_model_cpu_offload()prompt = "an insect robot preparing a delicious meal, anime style"
generator = torch.Generator(device="cpu").manual_seed(0)
images = pipeline(prompt=prompt,num_inference_steps=25,guidance_scale=7.0,generator=generator,pag_scale=2.5,
).imagesimages[0].save("pag.jpg")

PAG代码细节

如果同时使用PAG和CFG，那么输入到Unet中prompt_embeds定义如下，也就是[uncond,cond,cond]

    def _prepare_perturbed_attention_guidance(self, cond, uncond, do_classifier_free_guidance):cond = torch.cat([cond] * 2, dim=0)if do_classifier_free_guidance:cond = torch.cat([uncond, cond], dim=0)return cond

PAGCFGIdentitySelfAttnProcessor2_0计算，其中[uncond,cond]正常计算SA，第二个cond则计算PSA。

class PAGCFGIdentitySelfAttnProcessor2_0:r"""Processor for implementing PAG using scaled dot-product attention (enabled by default if you're using PyTorch 2.0).PAG reference: https://arxiv.org/abs/2403.17377"""def __init__(self):if not hasattr(F, "scaled_dot_product_attention"):raise ImportError("PAGCFGIdentitySelfAttnProcessor2_0 requires PyTorch 2.0, to use it, please upgrade PyTorch to 2.0.")def __call__(self,attn: Attention,hidden_states: torch.FloatTensor,encoder_hidden_states: Optional[torch.FloatTensor] = None,attention_mask: Optional[torch.FloatTensor] = None,temb: Optional[torch.FloatTensor] = None,) -> torch.Tensor:residual = hidden_statesif attn.spatial_norm is not None:hidden_states = attn.spatial_norm(hidden_states, temb)input_ndim = hidden_states.ndimif input_ndim == 4:batch_size, channel, height, width = hidden_states.shapehidden_states = hidden_states.view(batch_size, channel, height * width).transpose(1, 2)# chunkhidden_states_uncond, hidden_states_org, hidden_states_ptb = hidden_states.chunk(3)hidden_states_org = torch.cat([hidden_states_uncond, hidden_states_org])# original pathbatch_size, sequence_length, _ = hidden_states_org.shapeif attention_mask is not None:attention_mask = attn.prepare_attention_mask(attention_mask, sequence_length, batch_size)# scaled_dot_product_attention expects attention_mask shape to be# (batch, heads, source_length, target_length)attention_mask = attention_mask.view(batch_size, attn.heads, -1, attention_mask.shape[-1])if attn.group_norm is not None:hidden_states_org = attn.group_norm(hidden_states_org.transpose(1, 2)).transpose(1, 2)query = attn.to_q(hidden_states_org)key = attn.to_k(hidden_states_org)value = attn.to_v(hidden_states_org)inner_dim = key.shape[-1]head_dim = inner_dim // attn.headsquery = query.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)key = key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)# the output of sdp = (batch, num_heads, seq_len, head_dim)# TODO: add support for attn.scale when we move to Torch 2.1hidden_states_org = F.scaled_dot_product_attention(query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False)hidden_states_org = hidden_states_org.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)hidden_states_org = hidden_states_org.to(query.dtype)# linear projhidden_states_org = attn.to_out[0](hidden_states_org)# dropouthidden_states_org = attn.to_out[1](hidden_states_org)if input_ndim == 4:hidden_states_org = hidden_states_org.transpose(-1, -2).reshape(batch_size, channel, height, width)# perturbed path (identity attention)batch_size, sequence_length, _ = hidden_states_ptb.shapeif attn.group_norm is not None:hidden_states_ptb = attn.group_norm(hidden_states_ptb.transpose(1, 2)).transpose(1, 2)value = attn.to_v(hidden_states_ptb)hidden_states_ptb = valuehidden_states_ptb = hidden_states_ptb.to(query.dtype)# linear projhidden_states_ptb = attn.to_out[0](hidden_states_ptb)# dropouthidden_states_ptb = attn.to_out[1](hidden_states_ptb)if input_ndim == 4:hidden_states_ptb = hidden_states_ptb.transpose(-1, -2).reshape(batch_size, channel, height, width)# cathidden_states = torch.cat([hidden_states_org, hidden_states_ptb])if attn.residual_connection:hidden_states = hidden_states + residualhidden_states = hidden_states / attn.rescale_output_factorreturn hidden_states

经过Unet后，noise_pred的计算方法。

    def _apply_perturbed_attention_guidance(self, noise_pred, do_classifier_free_guidance, guidance_scale, t, return_pred_text=False):r"""Apply perturbed attention guidance to the noise prediction.Args:noise_pred (torch.Tensor): The noise prediction tensor.do_classifier_free_guidance (bool): Whether to apply classifier-free guidance.guidance_scale (float): The scale factor for the guidance term.t (int): The current time step.return_pred_text (bool): Whether to return the text noise prediction.Returns:Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]: The updated noise prediction tensor after applyingperturbed attention guidance and the text noise prediction."""pag_scale = self._get_pag_scale(t)if do_classifier_free_guidance:noise_pred_uncond, noise_pred_text, noise_pred_perturb = noise_pred.chunk(3)noise_pred = (noise_pred_uncond+ guidance_scale * (noise_pred_text - noise_pred_uncond)+ pag_scale * (noise_pred_text - noise_pred_perturb))else:noise_pred_text, noise_pred_perturb = noise_pred.chunk(2)noise_pred = noise_pred_text + pag_scale * (noise_pred_text - noise_pred_perturb)if return_pred_text:return noise_pred, noise_pred_textreturn noise_pred

Perturbed-Attention Guidance(PAG) 笔记

摘要

代码

PAG代码细节

相关文章：

Perturbed-Attention Guidance(PAG) 笔记

自动驾驶控制与规划——Project 6: A* Route Planning

通俗易懂之线性回归时序预测PyTorch实践

[离线数仓] 总结二、Hive数仓分层开发

页面顶部导航栏（Navbar）的功能（Navbar/index.vue）

thinnkphp5.1和 thinkphp6以及nginx，apache 解决跨域问题

vue2新增删除

测试ip端口-telnet开启与使用

Python爬虫基础——XPath表达式

ansible-性能优化

高等数学学习笔记 ☞ 一元函数微分的基础知识

前后端实现防抖节流实现

【笔记】算法记录

【网络云SRE运维开发】2025第2周-每日【2025/01/08】小测-【第8章 STP生成树协议】理论和实操解析

git push -f 指定分支

CTF知识点总结（二）

解决Edge打开PDF总是没有焦点

69.基于SpringBoot + Vue实现的前后端分离-家乡特色推荐系统（项目 + 论文PPT）

计算机视觉目标检测-DETR网络

《自动驾驶与机器人中的SLAM技术》ch1：自动驾驶

css实现圆环展示百分比，根据值动态展示所占比例

在鸿蒙HarmonyOS 5中实现抖音风格的点赞功能

全球首个30米分辨率湿地数据集(2000—2022)

srs linux

【论文笔记】若干矿井粉尘检测算法概述

Java 加密常用的各种算法及其选择

QT： `long long` 类型转换为 `QString` 2025.6.5

Java面试专项一-准备篇

代码随想录刷题day30

[大语言模型]在个人电脑上部署ollama 并进行管理,最后配置AI程序开发助手.