当前位置: 首页 > article >正文

论文阅读笔记:Denoising Diffusion Implicit Models (5)

0、快速访问

论文阅读笔记:Denoising Diffusion Implicit Models (1)
论文阅读笔记:Denoising Diffusion Implicit Models (2)
论文阅读笔记:Denoising Diffusion Implicit Models (3)
论文阅读笔记:Denoising Diffusion Implicit Models (4)
论文阅读笔记:Denoising Diffusion Implicit Models (5)

5、接上文论文阅读笔记:Denoising Diffusion Implicit Models (4)

这里使用中的 σ t \sigma_t σt是可以自己定义的量。有两种特殊的情况:
1、 σ t 2 = 0 \sigma_t^2=0 σt2=0此时,
x t − 1 x_{t-1} xt1满足公式(3)
x t − 1 = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + 1 − α t − 1 − σ t 2 ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x 0 + 1 − α t − 1 ⋅ z t \begin{equation} \begin{split} x_{t-1}&=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{1-\alpha_{t-1}-\sigma_t^2}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot x_0+\sqrt{1-\alpha_{t-1}}\cdot z_t \\ \end{split} \end{equation} xt1=αt1 αt xt1αt zt+1αt1σt2 zt+σt2ϵt=αt1 x0+1αt1 zt
x t − n x_{t-n} xtn满足
x t − n = α t − n ⋅ x t − 1 − α t ⋅ z t α t + 1 − α t − n − σ t 2 ⋅ z t + σ t 2 ϵ t = α t − n ⋅ x 0 + 1 − α t − n ⋅ z t \begin{equation} \begin{split} x_{t-n}&=\sqrt{\alpha_{t-n}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{1-\alpha_{t-n}-\sigma_t^2}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-n}}\cdot x_0+\sqrt{1-\alpha_{t-n}}\cdot z_t \\ \end{split} \end{equation} xtn=αtn αt xt1αt zt+1αtnσt2 zt+σt2ϵt=αtn x0+1αtn zt
可以看出,此时, x t − 1 x_{t-1} xt1 x t − n x_{t-n} xtn退化成上文论文阅读笔记:Denoising Diffusion Implicit Models (2)中的Lemma 1.

2、 σ t 2 = 1 − α t − 1 1 − α t ⋅ ( 1 − α t α t − 1 ) \sigma_t^2=\frac{1-\alpha_{t-1}}{1-\alpha_t}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}}) σt2=1αt1αt1(1αt1αt)此时, x t − 1 x_{t-1} xt1满足公式(4)
x t − 1 = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + 1 − α t − 1 − σ t 2 ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + 1 − α t − 1 − 1 − α t − 1 1 − α t ⋅ ( 1 − α t α t − 1 ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + ( 1 − α t − 1 ) ( 1 − 1 1 − α t ⋅ α t − 1 − α t α t − 1 ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + ( 1 − α t − 1 ) α t − 1 − α t − 1 ⋅ α t − α t − 1 + α t α t − 1 ⋅ ( 1 − α t ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + ( 1 − α t − 1 ) − α t − 1 ⋅ α t + α t α t − 1 ⋅ ( 1 − α t ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t − 1 − α t ⋅ z t α t + ( 1 − α t − 1 ) α t α t − 1 ⋅ ( 1 − α t ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t α t − α t − 1 1 − α t ⋅ z t α t + ( 1 − α t − 1 ) α t α t − 1 ⋅ ( 1 − α t ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t α t − ( α t − 1 1 − α t ⋅ α t − 1 1 − α t − ( 1 − α t − 1 ) ⋅ α t ⋅ α t α t ⋅ α t − 1 ⋅ ( 1 − α t ) ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t α t − ( α t − 1 1 − α t ⋅ α t − 1 1 − α t − ( 1 − α t − 1 ) ⋅ α t ⋅ α t α t ⋅ α t − 1 ⋅ ( 1 − α t ) ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t α t − ( α t − 1 ⋅ ( 1 − α t ) − ( 1 − α t − 1 ) ⋅ α t α t ⋅ α t − 1 ⋅ ( 1 − α t ) ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t α t − ( α t − 1 − α t ⋅ α t − 1 − α t + α t ⋅ α t − 1 α t ⋅ α t − 1 ⋅ ( 1 − α t ) ) ⋅ z t = α t − 1 ⋅ x t α t − ( α t − 1 − α t α t ⋅ α t − 1 ⋅ ( 1 − α t ) ) ⋅ z t + σ t 2 ϵ t = α t − 1 ⋅ x t α t − ( α t − 1 ⋅ ( α t − 1 − α t ) α t − 1 ⋅ α t ⋅ ( 1 − α t ) ) ⋅ z t + σ t 2 ϵ t = α t − 1 α t ( x t − α t − 1 − α t α t − 1 ⋅ 1 − α t ) + σ t 2 ϵ t = α t − 1 α t ( x t − 1 1 − α t ⋅ ( 1 − α t α t − 1 ) ) ⋅ z t + σ t 2 ϵ t = 1 α t ( x t − β t 1 − α ˉ t ) ⋅ z t + σ t 2 ϵ t (换成 D D P M 中的符号) \begin{equation} \begin{split} x_{t-1}&=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{1-\alpha_{t-1}-\sigma_t^2}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{1-\alpha_{t-1}-\frac{1-\alpha_{t-1}}{1-\alpha_t}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}})}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{(1-\alpha_{t-1})(1-\frac{1}{1-\alpha_t}\cdot \frac{\alpha_{t-1}-\alpha_t}{\alpha_{t-1}})}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{(1-\alpha_{t-1})\frac{\alpha_{t-1}-\alpha_{t-1}\cdot \alpha_{t}-\alpha_{t-1}+\alpha_t}{\alpha_{t-1}\cdot(1-\alpha_{t})}}\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+\sqrt{(1-\alpha_{t-1})\frac{-\alpha_{t-1}\cdot \alpha_{t}+\alpha_t}{\alpha_{t-1}\cdot(1-\alpha_{t})}}\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t-{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+(1-\alpha_{t-1})\sqrt{\frac{\alpha_t}{\alpha_{t-1}\cdot(1-\alpha_{t})}}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}}-\frac{\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}\cdot z_t}}{\sqrt{\alpha_t}}+(1-\alpha_{t-1})\sqrt{\frac{\alpha_t}{\alpha_{t-1}\cdot(1-\alpha_{t})}}\cdot z_t + \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}}\cdot\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}}-(1-\alpha_{t-1})\cdot\sqrt{\alpha_t}\cdot\sqrt{\alpha_t}}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}} \Bigg)\cdot z_t+ \sigma_t^2 \epsilon_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}}\cdot\sqrt{\alpha_{t-1}}{\sqrt{1-\alpha_t}}-(1-\alpha_{t-1})\cdot\sqrt{\alpha_t}\cdot\sqrt{\alpha_t}}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}}\Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\alpha_{t-1}\cdot({1-\alpha_t)}-(1-\alpha_{t-1})\cdot \alpha_t}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}} \Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\alpha_{t-1}-\bcancel{\alpha_t\cdot \alpha_{t-1}}-\alpha_t+\bcancel{\alpha_t\cdot \alpha_{t-1}}}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}} \Bigg)\cdot z_t \\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\alpha_{t-1}-\alpha_t}{\sqrt{\alpha_t}\cdot \sqrt{\alpha_{t-1}\cdot(1-\alpha_t)}} \Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\sqrt{\alpha_{t-1}}\cdot\frac{x_t}{\sqrt{\alpha_t}} -\Bigg(\frac{\sqrt{\alpha_{t-1}}\cdot (\alpha_{t-1}-\alpha_t)}{\alpha_{t-1}\cdot\sqrt{\alpha_t}\cdot \sqrt{(1-\alpha_t)}} \Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\frac{\sqrt{\alpha_{t-1}}}{\sqrt{\alpha_{t}}}\Bigg(x_t-\frac{\alpha_{t-1}-\alpha_t}{\alpha_{t-1}\cdot\ \sqrt{1-\alpha_t}}\Bigg) + \sigma_t^2 \epsilon_t\\ &=\frac{\sqrt{\alpha_{t-1}}}{\sqrt{\alpha_{t}}}\Bigg(x_t-\frac{1}{\ \sqrt{1-\alpha_t}}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}})\Bigg)\cdot z_t + \sigma_t^2 \epsilon_t\\ &=\frac{1}{\sqrt{\alpha_{t}}}\Bigg(x_t-\frac{\beta_t}{\ \sqrt{1-\bar\alpha_t}}\Bigg)\cdot z_t + \sigma_t^2 \epsilon_t(换成DDPM中的符号)\\ \end{split} \end{equation} xt1=αt1 αt xt1αt zt+1αt1σt2 zt+σt2ϵt=αt1 αt xt1αt zt+1αt11αt1αt1(1αt1αt) zt+σt2ϵt=αt1 αt xt1αt zt+(1αt1)(11αt1αt1αt1αt) zt+σt2ϵt=αt1 αt xt1αt zt+(1αt1)αt1(1αt)αt1αt1αtαt1+αt zt+σt2ϵt=αt1 αt xt1αt zt+(1αt1)αt1(1αt)αt1αt+αt zt+σt2ϵt=αt1 αt xt1αt zt+(1αt1)αt1(1αt)αt zt+σt2ϵt=αt1 αt xtαt αt1 1αt zt+(1αt1)αt1(1αt)αt zt+σt2ϵt=αt1 αt xt(αt αt1(1αt) αt1 1αt αt1 1αt (1αt1)αt αt )zt+σt2ϵt=αt1 αt xt(αt αt1(1αt) αt1 1αt αt1 1αt (1αt1)αt αt )zt+σt2ϵt=αt1 αt xt(αt αt1(1αt) αt1(1αt)(1αt1)αt)zt+σt2ϵt=αt1 αt xt(αt αt1(1αt) αt1αtαt1 αt+αtαt1 )zt=αt1 αt xt(αt αt1(1αt) αt1αt)zt+σt2ϵt=αt1 αt xt(αt1αt (1αt) αt1 (αt1αt))zt+σt2ϵt=αt αt1 (xtαt1 1αt αt1αt)+σt2ϵt=αt αt1 (xt 1αt 1(1αt1αt))zt+σt2ϵt=αt 1(xt 1αˉt βt)zt+σt2ϵt(换成DDPM中的符号)
可以看出,此时,DDIM退化成了DDPM。
论文讨论了 σ t 2 \sigma_t^2 σt2选取 η ⋅ 1 − α t − 1 1 − α t ⋅ ( 1 − α t α t − 1 ) , η ∈ [ 0 , 1 ] \eta\cdot \frac{1-\alpha_{t-1}}{1-\alpha_t}\cdot (1-\frac{\alpha_t}{\alpha_{t-1}}),\eta\in[0,1] η1αt1αt1(1αt1αt),η[0,1],即在0和DDPM之间变化时。不同 η \eta η以及跳不同步时所对应的表现,如下图所示。
请添加图片描述

6、代码

class DDIMPipeline(DiffusionPipeline):model_cpu_offload_seq = "unet"def __init__(self, unet, scheduler):super().__init__()# make sure scheduler can always be converted to DDIMscheduler = DDIMScheduler.from_config(scheduler.config)self.register_modules(unet=unet, scheduler=scheduler)@torch.no_grad()def __call__(self,batch_size: int = 1,generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,eta: float = 0.0,num_inference_steps: int = 50,use_clipped_model_output: Optional[bool] = None,output_type: Optional[str] = "pil",return_dict: bool = True,) -> Union[ImagePipelineOutput, Tuple]:# Sample gaussian noise to begin loopif isinstance(self.unet.config.sample_size, int):image_shape = (batch_size,self.unet.config.in_channels,self.unet.config.sample_size,self.unet.config.sample_size,)else:image_shape = (batch_size, self.unet.config.in_channels, *self.unet.config.sample_size)if isinstance(generator, list) and len(generator) != batch_size:raise ValueError(f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"f" size of {batch_size}. Make sure the batch size matches the length of the generators.")# 随即生成噪音image = randn_tensor(image_shape, generator=generator, device=self._execution_device, dtype=self.unet.dtype)# 设置步数间隔。例如num_inference_steps = 50,然而总步长为1000,那么就是每次跳20步,例如在当前时刻, timestep=980, prev_timestep=960self.scheduler.set_timesteps(num_inference_steps)for t in self.progress_bar(self.scheduler.timesteps):# 1. 预测出timestep=980时刻对应噪音model_output = self.unet(image, t).sample# 2. 调用scheduler的方法step,执行公式()得到prev_timestep=960时刻的图像image = self.scheduler.step(model_output, t, image, eta=eta, use_clipped_model_output=use_clipped_model_output, generator=generator).prev_sampleimage = (image / 2 + 0.5).clamp(0, 1)image = image.cpu().permute(0, 2, 3, 1).numpy()if output_type == "pil":image = self.numpy_to_pil(image)if not return_dict:return (image,)return ImagePipelineOutput(images=image)class DDIMScheduler(SchedulerMixin, ConfigMixin):_compatibles = [e.name for e in KarrasDiffusionSchedulers]order = 1@register_to_configdef __init__(self,num_train_timesteps: int = 1000,beta_start: float = 0.0001,beta_end: float = 0.02,beta_schedule: str = "linear",trained_betas: Optional[Union[np.ndarray, List[float]]] = None,clip_sample: bool = True,set_alpha_to_one: bool = True,steps_offset: int = 0,prediction_type: str = "epsilon",thresholding: bool = False,dynamic_thresholding_ratio: float = 0.995,clip_sample_range: float = 1.0,sample_max_value: float = 1.0,timestep_spacing: str = "leading",rescale_betas_zero_snr: bool = False,):if trained_betas is not None:self.betas = torch.tensor(trained_betas, dtype=torch.float32)elif beta_schedule == "linear":self.betas = torch.linspace(beta_start, beta_end, num_train_timesteps, dtype=torch.float32)elif beta_schedule == "scaled_linear":# this schedule is very specific to the latent diffusion model.self.betas = torch.linspace(beta_start**0.5, beta_end**0.5, num_train_timesteps, dtype=torch.float32) ** 2elif beta_schedule == "squaredcos_cap_v2":# Glide cosine scheduleself.betas = betas_for_alpha_bar(num_train_timesteps)else:raise NotImplementedError(f"{beta_schedule} is not implemented for {self.__class__}")# Rescale for zero SNRif rescale_betas_zero_snr:self.betas = rescale_zero_terminal_snr(self.betas)self.alphas = 1.0 - self.betasself.alphas_cumprod = torch.cumprod(self.alphas, dim=0)# At every step in ddim, we are looking into the previous alphas_cumprod# For the final step, there is no previous alphas_cumprod because we are already at 0# `set_alpha_to_one` decides whether we set this parameter simply to one or# whether we use the final alpha of the "non-previous" one.self.final_alpha_cumprod = torch.tensor(1.0) if set_alpha_to_one else self.alphas_cumprod[0]# standard deviation of the initial noise distributionself.init_noise_sigma = 1.0# setable valuesself.num_inference_steps = Noneself.timesteps = torch.from_numpy(np.arange(0, num_train_timesteps)[::-1].copy().astype(np.int64))def scale_model_input(self, sample: torch.Tensor, timestep: Optional[int] = None) -> torch.Tensor:"""Ensures interchangeability with schedulers that need to scale the denoising model input depending on thecurrent timestep.Args:sample (`torch.Tensor`):The input sample.timestep (`int`, *optional*):The current timestep in the diffusion chain.Returns:`torch.Tensor`:A scaled input sample."""return sampledef _get_variance(self, timestep, prev_timestep):alpha_prod_t = self.alphas_cumprod[timestep]alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprodbeta_prod_t = 1 - alpha_prod_tbeta_prod_t_prev = 1 - alpha_prod_t_prevvariance = (beta_prod_t_prev / beta_prod_t) * (1 - alpha_prod_t / alpha_prod_t_prev)return variance# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler._threshold_sampledef _threshold_sample(self, sample: torch.Tensor) -> torch.Tensor:""""Dynamic thresholding: At each sampling step we set s to a certain percentile absolute pixel value in xt0 (theprediction of x_0 at timestep t), and if s > 1, then we threshold xt0 to the range [-s, s] and then divide bys. Dynamic thresholding pushes saturated pixels (those near -1 and 1) inwards, thereby actively preventingpixels from saturation at each step. We find that dynamic thresholding results in significantly betterphotorealism as well as better image-text alignment, especially when using very large guidance weights."https://arxiv.org/abs/2205.11487"""dtype = sample.dtypebatch_size, channels, *remaining_dims = sample.shapeif dtype not in (torch.float32, torch.float64):sample = sample.float()  # upcast for quantile calculation, and clamp not implemented for cpu half# Flatten sample for doing quantile calculation along each imagesample = sample.reshape(batch_size, channels * np.prod(remaining_dims))abs_sample = sample.abs()  # "a certain percentile absolute pixel value"s = torch.quantile(abs_sample, self.config.dynamic_thresholding_ratio, dim=1)s = torch.clamp(s, min=1, max=self.config.sample_max_value)  # When clamped to min=1, equivalent to standard clipping to [-1, 1]s = s.unsqueeze(1)  # (batch_size, 1) because clamp will broadcast along dim=0sample = torch.clamp(sample, -s, s) / s  # "we threshold xt0 to the range [-s, s] and then divide by s"sample = sample.reshape(batch_size, channels, *remaining_dims)sample = sample.to(dtype)return sampledef set_timesteps(self, num_inference_steps: int, device: Union[str, torch.device] = None):"""Sets the discrete timesteps used for the diffusion chain (to be run before inference).Args:num_inference_steps (`int`):The number of diffusion steps used when generating samples with a pre-trained model."""if num_inference_steps > self.config.num_train_timesteps:raise ValueError(f"`num_inference_steps`: {num_inference_steps} cannot be larger than `self.config.train_timesteps`:"f" {self.config.num_train_timesteps} as the unet model trained with this scheduler can only handle"f" maximal {self.config.num_train_timesteps} timesteps.")self.num_inference_steps = num_inference_steps# "linspace", "leading", "trailing" corresponds to annotation of Table 2. of https://arxiv.org/abs/2305.08891if self.config.timestep_spacing == "linspace":timesteps = (np.linspace(0, self.config.num_train_timesteps - 1, num_inference_steps).round()[::-1].copy().astype(np.int64))elif self.config.timestep_spacing == "leading":step_ratio = self.config.num_train_timesteps // self.num_inference_steps# creates integer timesteps by multiplying by ratio# casting to int to avoid issues when num_inference_step is power of 3timesteps = (np.arange(0, num_inference_steps) * step_ratio).round()[::-1].copy().astype(np.int64)timesteps += self.config.steps_offsetelif self.config.timestep_spacing == "trailing":step_ratio = self.config.num_train_timesteps / self.num_inference_steps# creates integer timesteps by multiplying by ratio# casting to int to avoid issues when num_inference_step is power of 3timesteps = np.round(np.arange(self.config.num_train_timesteps, 0, -step_ratio)).astype(np.int64)timesteps -= 1else:raise ValueError(f"{self.config.timestep_spacing} is not supported. Please make sure to choose one of 'leading' or 'trailing'.")self.timesteps = torch.from_numpy(timesteps).to(device)def step(self,model_output: torch.Tensor,timestep: int,sample: torch.Tensor,eta: float = 0.0,use_clipped_model_output: bool = False,generator=None,variance_noise: Optional[torch.Tensor] = None,return_dict: bool = True,) -> Union[DDIMSchedulerOutput, Tuple]:if self.num_inference_steps is None:raise ValueError("Number of inference steps is 'None', you need to run 'set_timesteps' after creating the scheduler")# 1. get previous step value (=t-1);# timestep=980,self.config.num_train_timesteps=1000, self.num_inference_steps=50# prev_timestep = 960,步数的跳跃间隔为20prev_timestep = timestep - self.config.num_train_timesteps // self.num_inference_steps# 2. compute alphas, betasalpha_prod_t = self.alphas_cumprod[timestep]alpha_prod_t_prev = self.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else self.final_alpha_cumprodbeta_prod_t = 1 - alpha_prod_t# 3. compute predicted original sample from predicted noise also called# "predicted x_0" of formula (12) from https://arxiv.org/pdf/2010.02502.pdfif self.config.prediction_type == "epsilon":pred_original_sample = (sample - beta_prod_t ** (0.5) * model_output) / alpha_prod_t ** (0.5)pred_epsilon = model_outputelif self.config.prediction_type == "sample":pred_original_sample = model_outputpred_epsilon = (sample - alpha_prod_t ** (0.5) * pred_original_sample) / beta_prod_t ** (0.5)elif self.config.prediction_type == "v_prediction":pred_original_sample = (alpha_prod_t**0.5) * sample - (beta_prod_t**0.5) * model_outputpred_epsilon = (alpha_prod_t**0.5) * model_output + (beta_prod_t**0.5) * sampleelse:raise ValueError(f"prediction_type given as {self.config.prediction_type} must be one of `epsilon`, `sample`, or"" `v_prediction`")# 4. Clip or threshold "predicted x_0"if self.config.thresholding:pred_original_sample = self._threshold_sample(pred_original_sample)elif self.config.clip_sample:pred_original_sample = pred_original_sample.clamp(-self.config.clip_sample_range, self.config.clip_sample_range)# 5. compute variance: "sigma_t(η)" -> see formula (16)# σ_t = sqrt((1 − α_t−1)/(1 − α_t)) * sqrt(1 − α_t/α_t−1)variance = self._get_variance(timestep, prev_timestep)std_dev_t = eta * variance ** (0.5)if use_clipped_model_output:# the pred_epsilon is always re-derived from the clipped x_0 in Glidepred_epsilon = (sample - alpha_prod_t ** (0.5) * pred_original_sample) / beta_prod_t ** (0.5)# 6. compute "direction pointing to x_t" of formula (12) from https://arxiv.org/pdf/2010.02502.pdfpred_sample_direction = (1 - alpha_prod_t_prev - std_dev_t**2) ** (0.5) * pred_epsilon# 7. compute x_t without "random noise" of formula (12) from https://arxiv.org/pdf/2010.02502.pdfprev_sample = alpha_prod_t_prev ** (0.5) * pred_original_sample + pred_sample_directionif eta > 0:if variance_noise is not None and generator is not None:raise ValueError("Cannot pass both generator and variance_noise. Please make sure that either `generator` or"" `variance_noise` stays `None`.")if variance_noise is None:variance_noise = randn_tensor(model_output.shape, generator=generator, device=model_output.device, dtype=model_output.dtype)variance = std_dev_t * variance_noiseprev_sample = prev_sample + varianceif not return_dict:return (prev_sample,pred_original_sample,)return DDIMSchedulerOutput(prev_sample=prev_sample, pred_original_sample=pred_original_sample)# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.add_noisedef add_noise(self,original_samples: torch.Tensor,noise: torch.Tensor,timesteps: torch.IntTensor,) -> torch.Tensor:# Make sure alphas_cumprod and timestep have same device and dtype as original_samples# Move the self.alphas_cumprod to device to avoid redundant CPU to GPU data movement# for the subsequent add_noise callsself.alphas_cumprod = self.alphas_cumprod.to(device=original_samples.device)alphas_cumprod = self.alphas_cumprod.to(dtype=original_samples.dtype)timesteps = timesteps.to(original_samples.device)sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5sqrt_alpha_prod = sqrt_alpha_prod.flatten()while len(sqrt_alpha_prod.shape) < len(original_samples.shape):sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()while len(sqrt_one_minus_alpha_prod.shape) < len(original_samples.shape):sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)noisy_samples = sqrt_alpha_prod * original_samples + sqrt_one_minus_alpha_prod * noisereturn noisy_samples# Copied from diffusers.schedulers.scheduling_ddpm.DDPMScheduler.get_velocitydef get_velocity(self, sample: torch.Tensor, noise: torch.Tensor, timesteps: torch.IntTensor) -> torch.Tensor:# Make sure alphas_cumprod and timestep have same device and dtype as sampleself.alphas_cumprod = self.alphas_cumprod.to(device=sample.device)alphas_cumprod = self.alphas_cumprod.to(dtype=sample.dtype)timesteps = timesteps.to(sample.device)sqrt_alpha_prod = alphas_cumprod[timesteps] ** 0.5sqrt_alpha_prod = sqrt_alpha_prod.flatten()while len(sqrt_alpha_prod.shape) < len(sample.shape):sqrt_alpha_prod = sqrt_alpha_prod.unsqueeze(-1)sqrt_one_minus_alpha_prod = (1 - alphas_cumprod[timesteps]) ** 0.5sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.flatten()while len(sqrt_one_minus_alpha_prod.shape) < len(sample.shape):sqrt_one_minus_alpha_prod = sqrt_one_minus_alpha_prod.unsqueeze(-1)velocity = sqrt_alpha_prod * noise - sqrt_one_minus_alpha_prod * samplereturn velocitydef __len__(self):return self.config.num_train_timesteps

相关文章:

论文阅读笔记:Denoising Diffusion Implicit Models (5)

0、快速访问 论文阅读笔记&#xff1a;Denoising Diffusion Implicit Models &#xff08;1&#xff09; 论文阅读笔记&#xff1a;Denoising Diffusion Implicit Models &#xff08;2&#xff09; 论文阅读笔记&#xff1a;Denoising Diffusion Implicit Models &#xff08…...

【AI论文】GPT-ImgEval:一个用于诊断GPT4o在图像生成方面的综合基准

摘要&#xff1a;OpenAI的GPT4o模型最近的突破在图像生成和编辑方面展现了令人惊讶的良好能力&#xff0c;引起了社区的极大兴奋。 本技术报告介绍了第一眼评估基准&#xff08;名为GPT-ImgEval&#xff09;&#xff0c;定量和定性诊断GPT-4o在三个关键维度的性能&#xff1a;&…...

CSS3学习教程,从入门到精通, 学院网站完整项目 - HTML5 + CSS3 实现(25)

学院网站完整项目 - HTML5 CSS3 实现 下面是一个完整的学院网站项目&#xff0c;包含主页、新闻列表页、新闻详情页和视频宣传页的实现。我将按照您的要求提供详细的代码和注释。 项目结构 college-website/ ├── index.html # 主页 ├── news-list.html …...

Java虚拟机面试题:内存管理(中)

&#x1f9d1; 博主简介&#xff1a;CSDN博客专家&#xff0c;历代文学网&#xff08;PC端可以访问&#xff1a;https://literature.sinhy.com/#/?__c1000&#xff0c;移动端可微信小程序搜索“历代文学”&#xff09;总架构师&#xff0c;15年工作经验&#xff0c;精通Java编…...

如何绕过myabtis-plus的逻辑删除条件

目标 mp中所有方法都会带上逻辑删除,如果启用了逻辑删除,有时候我们需要忽略逻辑删除.改如何实现 解决方法 自定义DeleteReal 方法 import com.baomidou.mybatisplus.core.enums.SqlMethod; import com.baomidou.mybatisplus.core.injector.AbstractMethod; import com.ba…...

王者荣耀的游戏匹配机制

王者荣耀的匹配机制主要基于ELO评分系统&#xff08;隐藏分机制&#xff09;和段位匹配&#xff0c;旨在平衡对局双方实力&#xff0c;同时通过多种策略控制玩家胜率趋近50%。 一、匹配机制核心 1. ELO评分&#xff08;隐藏分&#xff09; - 系统根据玩家的胜负、KDA、伤害量、…...

游戏无法启动?XINPUT1_3.dll 丢失的终极解决方案

当你兴奋地启动一款新游戏时&#xff0c;突然弹出一个错误提示——‘程序无法启动&#xff0c;因为计算机中丢失 XINPUT1_3.dll’。这种问题在 PC 玩家中非常常见&#xff0c;尤其是运行一些较老的游戏时。XINPUT1_3.dll 是 DirectX 运行库的关键组件&#xff0c;缺失会导致游戏…...

macOS下SourceInsight的替代品

macOS 推荐的几款开源、轻量级、且功能类似于 SourceInsight 的源码阅读工具&#xff08;排除 VS Code&#xff09;&#xff1a; 1. Zeal&#xff08;离线文档 简单代码导航&#xff09; 官网/GitHub: https://zealdocs.org/特点&#xff1a; 轻量级离线文档浏览器&#xff0…...

嵌入式硬件如何在PADS中将原理图转换为PCB详解

本文旨在讲述如何在PADS中将原理图转换为PCB。 本文以C51原理图作为例子。 1.首先在桌面上打开PADS Logic 2.找到菜单栏的文件选项,然后点击新建。 点击新建之后出现如下界面。...

FreeRTOS 软件定时器工作原理及应用

FreeRTOS 软件定时器工作原理及应用 FreeRTOS 的 软件定时器(Software Timer) 是一种基于系统节拍(Tick)的计时机制,允许开发者创建周期性或单次触发的定时任务,而无需依赖硬件定时器。软件定时器由 定时器服务任务(Timer Service Task) 管理,适用于需要时间控制但无…...

软件工程-UML

例图&#xff0c;类图&#xff0c;状态图&#xff0c;顺序图&#xff0c;活动图 目录 例图 类图 状态图 顺序图 活动图 例图 例图由四个元素组成&#xff0c;参与者、用例、系统边界、参与者和用例之间的关系 参与者用一个小人表示&#xff0c;用例用椭圆表示&#xff…...

力扣经典算法篇-9-跳跃游戏(贪心算法,反向递推)

题干&#xff1a; 给你一个非负整数数组 nums &#xff0c;你最初位于数组的 第一个下标 。数组中的每个元素代表你在该位置可以跳跃的最大长度。 判断你是否能够到达最后一个下标&#xff0c;如果可以&#xff0c;返回 true &#xff1b;否则&#xff0c;返回 false 。 示例 …...

【Linux学习笔记】初识进程概念和进程PCB

【Linux学习笔记】初识冯诺依曼体系和进程PCB &#x1f525;个人主页&#xff1a;大白的编程日记 &#x1f525;专栏&#xff1a;Linux学习笔记 文章目录 【Linux学习笔记】初识冯诺依曼体系和进程PCB前言一. 冯诺依曼体系结构1.1 关于冯诺依曼体系的要点&#xff1a; 二. 操…...

深入探索 Linux Top 命令:15 个实用示例

在 Linux 系统管理中&#xff0c;top 命令是系统性能监控不可或缺的工具。它能够实时显示系统的 CPU、内存、进程等资源的使用情况&#xff0c;帮助您快速识别性能瓶颈和异常进程。本文将详细介绍 15 个实用的 top 命令使用示例&#xff0c;旨在帮助您更高效地进行系统管理与优…...

Linux命令-cut

cut 命令是一个非常实用的工具&#xff0c;用于从文本中提取特定部分。 参数 功能 -b 按字节提取内容 -c 按字符提取内容 -f 按字段提取内容&#xff0c;需配合 -d 指定分隔符 -d 指定字段分隔符&#xff08;默认是 \t&#xff09; -s 只处理包含分隔符的行 –complement 提取除…...

风电行业预测性维护解决方案:AIoT驱动下的风机健康管理革命

在风电行业向平价化与智慧化转型的关键阶段&#xff0c;如何通过预测性维护技术将风机可用率提升至99%以上&#xff1f;本文基于中讯烛龙系统的实战经验&#xff0c;解析如何构建基于LSTM、数字孪生与边缘计算的智能运维体系&#xff0c;实现从“故障维修”到“健康预判”的技术…...

通过Postman和OAuth 2.0连接Dynamics 365 Online的详细步骤

&#x1f31f; 引言 在企业应用开发中&#xff0c;Dynamics 365 Online作为微软的核心CRM平台&#xff0c;提供了强大的Web API接口。本文将教你如何通过Postman和OAuth 2.0认证实现与Dynamics 365的安全连接&#xff0c;轻松调用数据接口。 &#x1f4dd; 准备工作 工具安装…...

Ubuntu-安装redis

apt list | grep redis apt 类似于应用商店的感觉 ‘|’的作用是作为管道&#xff0c;把前者到的数据列表再通过grep筛选出包含redis字眼的一行数据 需要联网 apt install redis -y 修改配置文件 vi /etc/redis/redis.conf redis是客户端服务器程序 需要先把服务器给后台启…...

Mac 上使用 mysql -u root -p 命令,出现“zsh: command not found: mysql“?

一、确定 MySQL 安装路径&#xff1a; 如果你是使用 Homebrew 安装的 MySQL&#xff0c;通常安装路径是 /usr/local/mysql/bin 。 如果你是通过官方 DMG 安装包安装的 MySQL&#xff0c;默认安装路径可能是 /usr/local/mysql/bin 。你可以在终端中使用以下命令来查找 MySQL 的…...

P1883 【模板】三分 | 函数

题目描述 给定 n 个二次函数 f1​(x),f2​(x),…,fn​(x)&#xff08;均形如 ax2bxc&#xff09;&#xff0c;设 F(x)max{f1​(x),f2​(x),...,fn​(x)}&#xff0c;求 F(x) 在区间 [0,1000] 上的最小值。 输入格式 输入第一行为正整数 T&#xff0c;表示有 T 组数据。 每组…...

制造装备物联及生产管理ERP系统设计与实现(代码+数据库+LW)

摘 要 传统办法管理信息首先需要花费的时间比较多&#xff0c;其次数据出错率比较高&#xff0c;而且对错误的数据进行更改也比较困难&#xff0c;最后&#xff0c;检索数据费事费力。因此&#xff0c;在计算机上安装制造装备物联及生产管理ERP系统软件来发挥其高效地信息处理…...

[ctfshow web入门] web4

前置知识 robots.txt是机器人协议&#xff0c;在使用爬虫爬取网站内容时应该遵循的协议。协议并不能阻止爬虫爬取&#xff0c;更像是一种道德规范。 假设robots.txt中写道 Disallow: /admind.php&#xff0c;那我就暴露了自己的后台&#xff0c;这属于信息泄漏&#xff0c;攻击…...

Java的Selenium的特殊元素操作与定位之iframe切换

iframe切换 四种切换方式: driver.switchTo().frame(index);driver.switchTo().frame(id);driver.switchTo().frame(name);driver.switchTo().frame(WebElement); 切换之后&#xff0c;回到默认内容页面(否则会找不到元素 driver.switchTo().defaultContent(); //iframe处…...

【JavaWeb-Spring boot】学习笔记

目录 <<回到导览Spring boot1. http协议1.1.请求协议1.2.响应协议 2.Tomcat2.1.请求2.1.1.apifox2.1.2.简单参数2.1.3.实体参数2.1.4.数组集合参数2.1.5.日期参数2.1.6.(重点)JSON参数2.1.7.路径参数 2.2.响应2.3.综合练习 3.三层架构3.1.三层拆分3.2.分层解耦3.3.补充 &…...

SQLmap工具使用

1. sqlmap介绍 sqlmap是一款自动化的SQL注入工具&#xff0c;用于检测和利用web应用程序中的SQL注入漏洞。不需要我们进行手注&#xff0c;当我们输入url地址后&#xff0c;会自动进行注入指令并将payload返回显示。 在kali中自带。在本机中需要下载&#xff0c;在相应的路径…...

OpenCV 实现对形似宝马标的黄黑四象限标定位

文章目录 功能背景代码效果 功能 实现对形似宝马标的黄黑四象限光学识别标定位 背景 大学同学遇到了这个场景&#xff0c;琢磨了下&#xff0c;以备不时之需。 代码 所用opencv版本&#xff1a;4.12 numpy2.2.4 scikit_learn1.6.1import time import cv2 import numpy as…...

2025 年 4 月补丁星期二预测:微软将推出更多 AI 安全功能

微软正在继续构建其 AI 网络安全战略&#xff0c;并于本月宣布在 Microsoft Security Copilot 中引入新代理。 他们引入了用于网络钓鱼分类的代理、用于数据丢失预防和内部风险管理的警报分类、条件访问优化、漏洞修复和威胁情报简报。 这些代理的目标是不断从这些不同学科中…...

从吉卜力漫画到艺术创造:GPT-4o多种风格绘图Prompt大全

在3月底&#xff0c;GPT-4o掀起了一阵吉卜力绘图浪潮&#xff0c;大家纷纷输入一张图片&#xff0c;让4o模型进行风格化迁移&#xff0c;其中吉卜力风格的漫画在社交媒体上最为火热。在大家争议4o的训练数据是否侵权和4o背后的技术原理的时候&#xff0c;我们先来玩一玩&#x…...

resttemplate设置params

如何使用RestTemplate设置请求参数 RestTemplate设置请求参数的方式根据请求类型&#xff08;GET/POST&#xff09;和参数形式&#xff08;路径参数、查询参数、JSON请求体&#xff09;有所不同&#xff0c;以下是具体实现方法&#xff1a; 一、GET请求参数设置 路径参数 使用…...

16.1Linux自带的LED灯驱动实验(知识)_csdn

前面我们都是自己编写 LED 灯驱动&#xff0c;其实像 LED 灯这样非常基础的设备驱动&#xff0c; Linux 内核已经集成了。 Linux 内核的 LED 灯驱动采用 platform 框架&#xff0c;因此我们只需要按照要求在设备树文件中添加相应的 LED 节点即可&#xff0c;本章我们就来学习如…...