Diffusion Models
Denoising Diffusion Probabilistic Models
🌊 Apa itu Diffusion Models?
Diffusion Models adalah generative models yang belajar menghasilkan data dengan cara membalikkan proses penambahan noise secara bertahap.
Core Idea:
Forward: Tambahkan noise
bertahap (x₀ → x_T)
Reverse: Pelajari cara
menghilangkan noise (x_T → x₀)
🎨 Mengapa Diffusion Models?
Gradually add Gaussian noise
x₀ (clean) → x₁ → x₂ → ... → x_T (pure noise)
Learn to denoise step-by-step
x_T (noise) → ... → x₁ → x₀ (generated image)
📊 Comparison: GAN vs VAE vs Diffusion
| Model | Pros | Cons |
|---|---|---|
| GAN | High quality, fast sampling | Training instability, mode collapse |
| VAE | Stable training, good latent space | Blurry outputs |
| Diffusion | Excellent quality, stable training, diverse outputs | Slow sampling (many steps) |
🎯 Yang Akan Dipelajari
Forward Process
Noise schedule β_t
Reverse Process
Denoising dengan U-Net
DDPM
Training objective
DDIM
Fast sampling
Forward Diffusion Process
Gradually Adding Noise
➡️ Forward Process
Forward diffusion adalah proses Markov chain yang menambahkan Gaussian noise secara bertahap ke data bersih x₀.
Each step adds small Gaussian noise
🔍 Noise Schedule β_t
β_t mengontrol seberapa banyak noise ditambahkan di step t:
- 📈 Linear schedule: β_t increases linearly from β₁ to β_T
- 📉 Cosine schedule: Smoother transition, better performance
- 📊 Custom schedules: Optimized untuk dataset tertentu
Common Values:
β₁ = 10⁻⁴ (very small noise)
β_T = 0.02 (substantial noise)
T = 1000 steps
⚡ Direct Sampling: q(x_t | x₀)
Kita tidak perlu iterasi T kali! Ada closed-form solution:
Reparameterization: x_t = √(ᾱ_t)x₀ + √(1-ᾱ_t)ε, where ε ~ N(0,I)
📅 Diffusion Timeline
🎬 Forward Diffusion Animation
Reverse Diffusion Process
Learning to Denoise
⬅️ Reverse Process
Reverse diffusion adalah proses yang dipelajari untuk menghilangkan noise secara bertahap, starting dari pure noise x_T → clean data x₀.
Goal: Learn to reverse the forward process!
🏗️ U-Net Noise Predictor
Diffusion models menggunakan U-Net untuk memprediksi noise ε_θ(x_t, t):
- 📥 Input: Noisy image x_t + timestep embedding t
- 🎯 Output: Predicted noise ε_θ that was added
- 🏛️ Architecture: Encoder-decoder dengan skip connections
- ⏰ Time embedding: Sinusoidal positional encoding
Key Insight:
Instead of predicting x₀ directly, predict the noise that was added!
Then compute: x₀ = (x_t - √(1-ᾱ_t)ε_θ) / √(ᾱ_t)
🔄 Denoising Step
Given x_t, untuk mendapatkan x_{t-1}:
For t=1, set z=0 (deterministic final step)
🎬 Reverse Diffusion Animation
DDPM
Denoising Diffusion Probabilistic Models
📐 DDPM Training Objective
DDPM (Ho et al., 2020) mempelajari reverse process dengan memaksimalkan variational lower bound:
This is complex! Ho et al. showed a simpler equivalent objective...
⚡ Simplified Loss (Actually Used)
In practice, kita gunakan simplified objective:
Simply: predict the noise that was added!
Training Algorithm:
- Sample x₀ from data
- Sample timestep t uniformly
- Sample noise ε ~ N(0,I)
- Compute noisy x_t using reparameterization
- Predict noise: ε_pred = ε_θ(x_t, t)
- Loss = ||ε - ε_pred||²
- Backprop and update θ
🎲 Sampling Algorithm (DDPM)
💡 Key Insights
- 🎯 Noise prediction easier than direct x₀ prediction
- 🔄 Reparameterization allows direct sampling of any x_t
- 📊 Simple MSE loss works better than complex VLB
- ⏱️ T=1000 steps typical for high-quality generation
DDIM & Improvements
Faster Sampling & Enhancements
⚡ DDIM (Denoising Diffusion Implicit Models)
Problem dengan DDPM: Sampling memerlukan T=1000 steps (slow!)
Solution: DDIM (Song et al., 2020) - deterministic sampling dengan fewer steps.
🚀 Faster Sampling
DDIM allows skipping timesteps without retraining:
- 🏃 10-50 steps instead of 1000 (20-100x faster!)
- 🎯 Deterministic: same noise → same image
- 🔄 Interpolation: smooth latent space
- ✏️ Inversion: encode real images to latent
DDIM Sampling (S=50 steps):
Use subset of timesteps: {1000, 980, 960, ..., 40, 20}
Much faster while maintaining quality!
📊 Noise Schedules
Berbagai noise schedules untuk performa lebih baik:
- 📈 Linear: Original DDPM, simple
- 📉 Cosine: Better for high-res, less noise at extremes
- 📊 Custom: Learned or hand-tuned for specific data
🎨 Classifier-Free Guidance
Untuk conditional generation (e.g., text-to-image):
Train both conditional and unconditional models jointly!
Example (Text-to-Image):
Condition: "a cat wearing a hat"
w=1.0 → mostly ignore text
w=7.5 → strong text adherence
w=15.0 → very strong, may sacrifice quality
Score-Based Models
Score Matching Perspective
🎯 Score Matching
Alternative perspective: learn the score function ∇_x log p(x).
Connection: score = -ε / √(1-ᾱ_t)
🌊 Langevin Dynamics
Sampling dengan Langevin dynamics:
Move toward high density + random walk
📐 SDE Formulation
Song et al. unified view: diffusion as Stochastic Differential Equation:
Learn score, solve reverse SDE → generate samples
🔗 Connection to Diffusion
- 🎯 Score-based: ∇_x log p(x) perspective
- 🌊 Diffusion: Forward/reverse process perspective
- 🔄 Equivalent: Different views of same model!
- ⚡ Unified: SDE framework combines both
Key Insight:
Denoising score matching ≈ Diffusion model training
Both learn to predict noise/score at different noise levels
Implementation
PyTorch Code
💻 DDPM PyTorch Implementation
import torch
import torch.nn as nn
import torch.nn.functional as F
class DiffusionModel(nn.Module):
def __init__(self, unet, noise_steps=1000, beta_start=1e-4, beta_end=0.02):
super().__init__()
self.unet = unet # Noise prediction network
self.noise_steps = noise_steps
# Noise schedule (linear)
self.beta = torch.linspace(beta_start, beta_end, noise_steps)
self.alpha = 1 - self.beta
self.alpha_hat = torch.cumprod(self.alpha, dim=0)
def add_noise(self, x_0, t, noise):
"""
Add noise to x_0 to get x_t.
Args:
x_0: clean images (batch, C, H, W)
t: timesteps (batch,)
noise: Gaussian noise (batch, C, H, W)
Returns:
x_t: noisy images
"""
sqrt_alpha_hat = torch.sqrt(self.alpha_hat[t])[:, None, None, None]
sqrt_one_minus_alpha_hat = torch.sqrt(1 - self.alpha_hat[t])[:, None, None, None]
x_t = sqrt_alpha_hat * x_0 + sqrt_one_minus_alpha_hat * noise
return x_t
def forward(self, x, t):
"""Predict noise."""
return self.unet(x, t)
🎓 Training Loop
def train_diffusion(model, dataloader, optimizer, device, epochs=100):
"""Train diffusion model."""
model.train()
for epoch in range(epochs):
for x_0 in dataloader:
x_0 = x_0.to(device)
batch_size = x_0.shape[0]
# Sample random timesteps
t = torch.randint(0, model.noise_steps, (batch_size,), device=device)
# Sample noise
noise = torch.randn_like(x_0)
# Add noise to get x_t
x_t = model.add_noise(x_0, t, noise)
# Predict noise
noise_pred = model(x_t, t)
# Simplified loss
loss = F.mse_loss(noise_pred, noise)
# Backward
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
🎲 Sampling (DDPM)
@torch.no_grad()
def sample_ddpm(model, n_samples, img_size, device):
"""
Sample images using DDPM algorithm.
Args:
model: trained diffusion model
n_samples: number of images to generate
img_size: (C, H, W)
device: cuda or cpu
Returns:
generated images
"""
model.eval()
C, H, W = img_size
# Start from pure noise
x = torch.randn(n_samples, C, H, W, device=device)
# Iteratively denoise
for t in reversed(range(model.noise_steps)):
# Create timestep tensor
t_tensor = torch.full((n_samples,), t, device=device, dtype=torch.long)
# Predict noise
noise_pred = model(x, t_tensor)
# Get schedule values
alpha = model.alpha[t]
alpha_hat = model.alpha_hat[t]
beta = model.beta[t]
# Denoise step
if t > 0:
noise = torch.randn_like(x)
else:
noise = torch.zeros_like(x)
x = (1 / torch.sqrt(alpha)) * (x - ((1 - alpha) / torch.sqrt(1 - alpha_hat)) * noise_pred) + torch.sqrt(beta) * noise
return x
⚡ DDIM Sampling
@torch.no_grad()
def sample_ddim(model, n_samples, img_size, device, ddim_steps=50):
"""Fast sampling with DDIM."""
model.eval()
C, H, W = img_size
# Subset of timesteps
timesteps = torch.linspace(model.noise_steps - 1, 0, ddim_steps, dtype=torch.long)
# Start from noise
x = torch.randn(n_samples, C, H, W, device=device)
for i, t in enumerate(timesteps):
t_tensor = torch.full((n_samples,), t, device=device, dtype=torch.long)
# Predict noise
noise_pred = model(x, t_tensor)
# Predict x_0
alpha_hat = model.alpha_hat[t]
pred_x0 = (x - torch.sqrt(1 - alpha_hat) * noise_pred) / torch.sqrt(alpha_hat)
if i < len(timesteps) - 1:
t_prev = timesteps[i + 1]
alpha_hat_prev = model.alpha_hat[t_prev]
# DDIM update (deterministic)
x = torch.sqrt(alpha_hat_prev) * pred_x0 + torch.sqrt(1 - alpha_hat_prev) * noise_pred
else:
x = pred_x0
return x
Applications
Diffusion Models in the Wild
🚀 Diffusion Model Applications
Diffusion models telah menjadi state-of-the-art untuk berbagai generative tasks:
🎨 Stable Diffusion
Latent Diffusion Models
Diffusion in compressed latent space (VAE encoder)
🌟 DALL-E 2
CLIP + Diffusion
Text → CLIP embedding → Diffusion decoder
🖼️ Image Generation
Unconditional/Class-conditional
Generate realistic images from noise
✏️ Image Editing
Inpainting & Outpainting
Fill missing regions or extend images
🔍 Super-Resolution
SR3, Imagen
Upscale low-res images to high-res
🎬 Video Generation
Temporal Diffusion
Generate coherent video sequences
🎵 Audio Synthesis
WaveGrad, DiffWave
Generate high-quality audio waveforms
🧬 Molecular Design
Protein/Drug Generation
Generate novel molecular structures
💡 Why Diffusion Models Excel
- ✅ High Quality: State-of-the-art image/video generation
- ✅ Stable Training: No mode collapse like GANs
- ✅ Diverse Outputs: Stochastic sampling
- ✅ Flexible Conditioning: Text, class, layout, etc.
- ✅ Principled Framework: Strong theoretical foundation
🔮 Future Directions
- ⚡ Faster Sampling: 1-step diffusion models
- 📹 Longer Videos: Temporal consistency
- 🎮 3D Generation: NeRF + diffusion
- 🧠 Efficiency: Smaller models, edge deployment
- 🎨 Control: Better user control over generation
✅ Selamat!
🎉 Tutorial Selesai!
Anda telah mempelajari:
- ✅ Forward diffusion process (noise addition)
- ✅ Reverse diffusion process (denoising)
- ✅ DDPM training & sampling
- ✅ DDIM for faster generation
- ✅ Score-based perspective
- ✅ PyTorch implementation
- ✅ Real-world applications