InfoNCE
Noise Contrastive Estimation
π₯ Apa itu InfoNCE?
InfoNCE (Info Noise Contrastive Estimation) adalah loss function untuk contrastive learning yang memaksimalkan mutual information antara positive pairs sambil meminimalkan similarity dengan negative samples.
Core Idea:
Pull positive pairs closer
together,
Push negative pairs farther
apart!
π History
InfoNCE diperkenalkan dalam berbagai influential papers:
- π CPC (2018): Contrastive Predictive Coding - prediktif representasi
- π¨ MoCo (2019): Momentum Contrast - self-supervised vision
- πΌοΈ SimCLR (2020): Simple framework for contrastive learning
- π CLIP (2021): Vision-language alignment dengan InfoNCE
π‘ Mengapa InfoNCE Powerful?
Anchor & positive sample harus punya similarity tinggi
Example: dua augmentasi dari image yang sama
Anchor & negative samples harus punya similarity rendah
Example: images dari classes berbeda
π― Yang Akan Dipelajari
Mathematical Foundation
InfoNCE derivation
Contrastive Mechanics
Pull/push forces
Temperature Ο
Controlling sharpness
PyTorch Code
Implementation
Mathematical Foundation
InfoNCE Derivation
π InfoNCE Formula
InfoNCE adalah loss function yang di-derive dari Noise Contrastive Estimation:
Minimize loss β maximize similarity dengan positive, minimize dengan negatives
π Component Breakdown
- q: Query/anchor embedding
- kβΊ: Positive key embedding (matched pair)
- kα΅’β»: Negative key embeddings (i = 1...N-1)
- sim(Β·,Β·): Similarity function (biasanya cosine)
- Ο: Temperature (controls distribution sharpness)
π Intuition: Softmax Classification
InfoNCE bisa dipandang sebagai N-way classification problem:
Model harus "classify" mana yang positive pair di antara N candidates!
π Mutual Information Perspective
InfoNCE memaksimalkan lower bound dari mutual information I(q; kβΊ):
Goal: Maximize I(q; kβΊ)
InfoNCE loss = -log P(positive | q, {kβ, ..., k_N})
Minimizing InfoNCE β Maximizing MI
Contrastive Mechanics
Pull Positive, Push Negative
βοΈ How Contrastive Learning Works
Contrastive learning training menggunakan pull dan push forces:
Positive pairs ditarik lebih dekat di embedding space
sim(anchor, positive) β 1.0
Negative pairs didorong lebih jauh
sim(anchor, negative) β 0.0
π¦ Batch Construction
Untuk batch size N, kita punya:
- β N positive pairs: (anchorβ, positiveβ), ..., (anchor_N, positive_N)
- β NΓ(N-1) negative pairs: Semua kombinasi lain dalam batch!
Efficient Negatives:
Dengan batch size N=256, setiap sample punya:
- 1 positive pair
- 255 negative pairs (in-batch negatives)
Tidak perlu explicit negative sampling!
Example: SimCLR Augmentation
Dalam self-supervised vision learning (SimCLR):
- Ambil batch N images
- Create 2 augmented views per image β 2N total
- Positive: dua views dari same image
- Negative: views dari different images
β
[Augment 1: crop+flip] & [Augment 2: color+rotate]
β
These are positive pairs!
π¬ Contrastive Forces Animation
Temperature Parameter
Controlling Distribution Sharpness
π‘οΈ What is Temperature Ο?
Temperature Ο (tau) adalah parameter yang mengontrol "sharpness" dari softmax distribution di InfoNCE loss.
Ο kecil β sharp distribution
Ο besar β smooth distribution
π Temperature Effects
Low Temperature (Ο = 0.07)
- β Sharp distribution
- β Very confident predictions
- β Better differentiation
- β Harder optimization
High Temperature (Ο = 1.0)
- β Smooth distribution
- β Easier optimization
- β Gradients spread out
- β Less confident
ποΈ Temperature Interactive Demo
π§ Learnable vs Fixed Temperature
Temperature bisa di-set sebagai:
- π Fixed: Ο = 0.07 (umum di SimCLR, CLIP)
- π Learnable: Ο sebagai nn.Parameter (CLIP approach)
CLIP's Approach:
Learnable log-scale temperature:
self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1/0.07))
temperature = logit_scale.exp()
Symmetric Loss
Bidirectional Alignment
βοΈ Why Symmetric?
Dalam multimodal learning (e.g., CLIP), kita punya dua modalities: image & text. Symmetric loss ensure alignment dua arah!
π Similarity Matrix
Untuk batch N=4, similarity matrix S (4Γ4):
β‘οΈ Row-wise Loss (Image β Text)
For each image, classify which text is the match:
Softmax across each row
β¬οΈ Column-wise Loss (Text β Image)
For each text, classify which image is the match:
Softmax across each column
π¬ Symmetric Loss Visualization
Implementation
PyTorch Code
π» InfoNCE PyTorch Implementation
import torch
import torch.nn as nn
import torch.nn.functional as F
class InfoNCE(nn.Module):
def __init__(self, temperature=0.07):
super().__init__()
self.temperature = temperature
def forward(self, query, keys):
"""
Compute InfoNCE loss.
Args:
query: (batch, dim) - anchor embeddings
keys: (batch, dim) - key embeddings
Returns:
loss: scalar InfoNCE loss
"""
# Normalize embeddings
query = F.normalize(query, dim=-1)
keys = F.normalize(keys, dim=-1)
# Compute cosine similarity matrix
logits = query @ keys.T / self.temperature # (batch, batch)
# Diagonal are positive pairs
labels = torch.arange(len(query), device=query.device)
# Cross-entropy loss
loss = F.cross_entropy(logits, labels)
return loss
π Symmetric InfoNCE (CLIP-style)
class SymmetricInfoNCE(nn.Module):
def __init__(self, temperature=0.07):
super().__init__()
self.temperature = temperature
def forward(self, image_embeds, text_embeds):
"""
Symmetric contrastive loss.
Args:
image_embeds: (N, dim)
text_embeds: (N, dim)
Returns:
loss: symmetric InfoNCE loss
"""
# Normalize
image_embeds = F.normalize(image_embeds, dim=-1)
text_embeds = F.normalize(text_embeds, dim=-1)
# Similarity matrix
logits = image_embeds @ text_embeds.T / self.temperature # (N, N)
# Labels: diagonal indices
labels = torch.arange(len(image_embeds), device=image_embeds.device)
# Row-wise (image β text)
loss_i2t = F.cross_entropy(logits, labels)
# Column-wise (text β image)
loss_t2i = F.cross_entropy(logits.T, labels)
# Symmetric average
loss = (loss_i2t + loss_t2i) / 2
return loss
π Training Loop Example
def train_contrastive(model, dataloader, optimizer, device):
"""Training loop dengan InfoNCE."""
model.train()
criterion = SymmetricInfoNCE(temperature=0.07).to(device)
for images, texts in dataloader:
images = images.to(device)
texts = texts.to(device)
# Forward: get embeddings
image_embeds = model.encode_image(images) # (N, dim)
text_embeds = model.encode_text(texts) # (N, dim)
# Compute symmetric InfoNCE loss
loss = criterion(image_embeds, text_embeds)
# Backward
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Loss: {loss.item():.4f}")
return loss.item()
π§ Learnable Temperature
import numpy as np
class LearnableTemperature(nn.Module):
def __init__(self, init_temp=0.07):
super().__init__()
# Log-scale learnable parameter (CLIP approach)
self.logit_scale = nn.Parameter(
torch.ones([]) * np.log(1 / init_temp)
)
def forward(self, query, keys):
# Normalize
query = F.normalize(query, dim=-1)
keys = F.normalize(keys, dim=-1)
# Similarity with learnable scale
temperature = self.logit_scale.exp()
logits = query @ keys.T * temperature
# InfoNCE
labels = torch.arange(len(query), device=query.device)
loss = F.cross_entropy(logits, labels)
return loss, temperature.item()
Applications
InfoNCE in the Wild
π InfoNCE Applications
InfoNCE telah menjadi foundational loss untuk banyak breakthrough models:
π CLIP
Vision-Language Alignment
400M (image, text) pairs training dengan symmetric InfoNCE
πΌοΈ SimCLR
Self-Supervised Vision
Learn representations dari augmented views dengan InfoNCE
π― MoCo
Momentum Contrast
Large negative queue + momentum encoder
π CPC
Contrastive Predictive Coding
Predict future representations dengan InfoNCE
π Audio SSL
wav2vec 2.0
Learn speech representations without transcripts
π¬ Video Understanding
VideoMoCo
Temporal contrastive learning untuk video
π‘ Why InfoNCE Works So Well
- β Simple: Easy to implement (just softmax + cross-entropy)
- β Scalable: Efficient with large batches
- β Effective: Strong performance across domains
- β Flexible: Works for unimodal & multimodal
- β No labels needed: Self-supervised learning
π― Key Takeaways
InfoNCE Core Principles:
- π₯ Maximize similarity untuk positive pairs
- βοΈ Minimize similarity untuk negative pairs
- π‘οΈ Temperature controls sharpness
- βοΈ Symmetric loss untuk bidirectional alignment
- π Softmax framework makes it easy
β Selamat!
π Tutorial Selesai!
Anda telah mempelajari:
- β InfoNCE mathematical foundation
- β Contrastive mechanics (pull/push)
- β Temperature parameter effects
- β Symmetric loss untuk multimodal
- β PyTorch implementation
- β Real-world applications