Aliens School
Cinematic Knowledge Experience
0%
Aliens School
Now Playing
Aliens School ยท HIEN
โŒจ๏ธ Keyboard Shortcuts
โ†’Next slide โ†Previous slide SpacePlay / Pause MNarration on/off FFullscreen ?Show/hide this
Press any key to close
Skill Topic ยท Cinematic

๐ŸŽฏ Topic 41: DPO โ€” Direct Preference Optimization

Course: LLM Engineering โ€” Pair 41/80 Section: 5 โ€” Fine-Tuning Level: โญโญโญโญโญ Expert Prev: 40-RLHF.mdโ€ฆ

Overview
๐ŸŒŸ

๐ŸŽฏ Topic 41: DPO โ€” Direct Preference Optimization โ€” Quick Facts

๐Ÿ“Œ

x) - log ฯ€_ref(y_w: x))

๐ŸŽฏ

- ฮฒ * (log ฯ€_ฮธ(y_l: x) - log ฯ€_ref(y_l

Topic 1
โœจ

๐ŸŽฏ Is Topic Me Kya Seekhoge?

๐Ÿ“š ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โœ… DPO kya hai โ€” RLHFโ€ฆ
Topic 2
โœจ

๐Ÿ“š 1. DPO vs RLHF โ€” Core Idea

๐Ÿ’ก ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ RLHF (Complex)โ€ฆ
Topic 3
โœจ

๐Ÿ“Š 2. DPO Variants Comparison

๐ŸŽฏ ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Method โ”‚ Fullโ€ฆ
Topic 4
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐Ÿ“ 3. DPO Loss Function

๐Ÿ’ก

ฮฒ * (log ฯ€_ฮธ(y_l|x) - logโ€ฆ

๐Ÿ”‘

Increase probability of chosenโ€ฆ

โšก

Decrease probability of rejectedโ€ฆ

๐ŸŽฏ

ฮฒ controls strength (higher ฮฒ =โ€ฆ

Topic 5
โœจ

๐Ÿ’ป 4. Complete Python Implementation

๐Ÿ’ก

Standard DPO with beta tuning

๐Ÿ”‘

IPO (Identity Preferenceโ€ฆ

โšก

KTO (unpaired preferences)

๐ŸŽฏ

ORPO (no reference model needed)

Topic 6
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐Ÿง  5. Quiz Time!

๐Ÿ’ก

A) Haan, mandatory hai

๐Ÿ”‘

B) Nahi, DPO directly preferenceโ€ฆ

โšก

C) Optional hai

๐ŸŽฏ

D) Sirf inference me chahiye

Topic 7
๐Ÿ”’

๐Ÿ”— Navigation

๐ŸŒŸ โฌ…๏ธ Previous: 40-RLHF.md โžก๏ธ Next: 42-Fine-Tuning-Evaluation.md ๐ŸŽฏ DPO = RLHF ki power, binaโ€ฆ
Quick Quiz
๐Ÿง  QUIZ TIME

Quiz โ€” Question 1

๐ŸŽฏ Topic 41: DPO โ€” Direct Preference Optimization ka sabse sahi definition kya hai?

Complete! ๐ŸŽ‰
COMPLETE

๐ŸŽฏ Topic 41: DPO โ€” Direct Preference Optimization Complete!

Aliens School ยท HIEN ยท Cinematic Knowledge

โœ…

๐ŸŽฏ Topic 41: DPO โ€” Direct Preference Optimization Complete

1/11
0:00
REC 00:00ESC=Cancel
Aliens School
3
Recording shuru hone wali hai...
โœ…
Recording Complete
Video process ho rahi hai...
Live Class
Slide 1 / 7
Timer
00:00
๐Ÿ“ Speaker Notes
โ€”
โญ๏ธ Up Next
โ€”
โ€”
๐Ÿ—‚๏ธ All Slides