Aliens School
Cinematic Knowledge Experience
0%
Aliens School
Now Playing
Aliens School ยท HIEN
โŒจ๏ธ Keyboard Shortcuts
โ†’Next slide โ†Previous slide SpacePlay / Pause MNarration on/off FFullscreen ?Show/hide this
Press any key to close
Skill Topic ยท Cinematic

๐Ÿ† Topic 40: RLHF โ€” Reinforcement Learning from Human Feedback

Course: LLM Engineering โ€” Pair 40/80 Section: 5 โ€” Fine-Tuning Level: โญโญโญโญโญ Expert Prev:โ€ฆ

Topic 1
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐ŸŽฏ Is Topic Me Kya Seekhoge?

๐Ÿ“š ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โœ… RLHF kya hai โ€”โ€ฆ
Topic 2
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐Ÿ“š 1. RLHF Pipeline โ€” 3 Stages

๐Ÿ’ก ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ RLHF PIPELINE โ”‚โ€ฆ
Topic 3
โœจ

๐Ÿ“Š 2. RLHF Components Comparison

๐ŸŽฏ ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Component โ”‚ Inputโ€ฆ
Topic 4
โœจ

๐Ÿ’ป 3. Complete Python Implementation

๐Ÿ’ก

Full 3-stage pipeline

๐Ÿ”‘

Reward model with preferenceโ€ฆ

โšก

PPO with KL penalty

๐ŸŽฏ

Value function estimation

Topic 5
๐Ÿ”‘

๐Ÿง  5. Quiz Time!

๐Ÿ’ก

A) PPO โ†’ SFT โ†’ Reward

๐Ÿ”‘

B) Reward โ†’ PPO โ†’ SFT

โšก

C) SFT โ†’ Reward Model โ†’ PPO โœ…

๐ŸŽฏ

D) SFT โ†’ PPO โ†’ Reward

Topic 6
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐Ÿ”— Navigation

โœจ โฌ…๏ธ Previous: 39-HuggingFace-Pipeline.md โžก๏ธ Next: 41-DPO.md ๐Ÿ† RLHF = ChatGPT jaisi qualityโ€ฆ
Quick Quiz
๐Ÿง  QUIZ TIME

Quiz โ€” Question 1

๐Ÿ† Topic 40: RLHF โ€” Reinforcement Learning from Human Feedback ka sabse sahi definition kya hai?

Complete! ๐ŸŽ‰
COMPLETE

๐Ÿ† Topic 40: RLHF โ€” Reinforcement Learning from Human Feedback Complete!

Aliens School ยท HIEN ยท Cinematic Knowledge

โœ…

๐Ÿ† Topic 40: RLHF โ€” Reinforcement Learning from Human Feedback Complete

1/9
0:00
REC 00:00ESC=Cancel
Aliens School
3
Recording shuru hone wali hai...
โœ…
Recording Complete
Video process ho rahi hai...
Live Class
Slide 1 / 7
Timer
00:00
๐Ÿ“ Speaker Notes
โ€”
โญ๏ธ Up Next
โ€”
โ€”
๐Ÿ—‚๏ธ All Slides