Aliens School
Cinematic Knowledge Experience
0%
Aliens School
Now Playing
Aliens School ยท HIEN
โŒจ๏ธ Keyboard Shortcuts
โ†’Next slide โ†Previous slide SpacePlay / Pause MNarration on/off FFullscreen ?Show/hide this
Press any key to close
Skill Topic ยท Cinematic

๐Ÿ”ฅ Topic 06: Pre-Training โ€” LLM Kaise Seekhta Hai

Course: LLM Engineering โ€” Hinglish Section: 1 โ€” LLM Foundations Level: Beginner โ†’ Intermediateโ€ฆ

Overview
๐ŸŒŸ

๐Ÿ”ฅ Topic 06: Pre-Training โ€” LLM Kaise Seekhta Hai โ€” Quick Facts

๐Ÿ“Œ

Feature: CLM (GPT)

๐ŸŽฏ

Direction: Left-to-right only

โšก

Task: Next token predict

๐Ÿ”‘

Best for: Generation (chat, code)

Topic 1
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐Ÿ“Œ Objectives

๐Ÿ’ก

Pre-training process ko detail meโ€ฆ

๐Ÿ”‘

Training objectives (CLM, MLM) kaโ€ฆ

โšก

Training data, compute, aurโ€ฆ

๐ŸŽฏ

Training pipeline architectureโ€ฆ

Topic 2
๐Ÿ“ฅ ๐Ÿ“ฅ ๐Ÿง  ๐Ÿ”ฌ ๐Ÿ’ก ๐ŸŽฏ

๐Ÿง  1. Pre-Training Kya Hai?

๐Ÿ’ก

Pre-training = School + Collegeโ€ฆ

๐Ÿ”‘

Fine-tuning = Job trainingโ€ฆ

โšก

RLHF = Performance review (behaveโ€ฆ

Topic 3
โœจ

๐Ÿ“š 2. Training Objectives

๐Ÿ’ก 2.1 CLM โ€” Causal Language Modeling (GPT Style) `โ€ฆ
Topic 4
โšก

๐Ÿ“Š 3. Training Data โ€” Internet Ka Knowledge

โšก ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ TRAINING DATA SOURCES โ”‚โ€ฆ
Topic 5
๐Ÿ“ฅ ๐Ÿ“ฅ ๐Ÿง  ๐Ÿ”ฌ ๐Ÿ’ก ๐ŸŽฏ

โšก 4. Training Compute โ€” The Cost

๐ŸŽฏ ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ TRAINING COMPUTE & COSTโ€ฆ
Topic 6
โœจ

๐Ÿ“ 5. Scaling Laws โ€” Chinchilla

๐Ÿ”ฎ ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ SCALING LAWS โ”‚ โ”‚ โ”‚ โ”‚ ๐Ÿ”ฌโ€ฆ
Topic 7
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐Ÿ”„ 6. Training Pipeline Architecture

๐ŸŒ ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ DISTRIBUTED TRAININGโ€ฆ
Topic 8
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐Ÿงน 7. Data Quality โ€” The Secret Sauce

๐Ÿ“Š ` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ DATA QUALITY PIPELINE โ”‚โ€ฆ
Topic 9
โœจ

๐Ÿ’ป 8. Python Code โ€” Pre-Training Simulator

๐Ÿ’ก

Remove very short docs

๐Ÿ”‘

Remove duplicates

โšก

Basic quality filter

๐ŸŽฏ

Loss computation

Topic 10
๐Ÿง  ๐Ÿ“Š ๐Ÿ”ฌ

โ“ 9. Quiz โ€” 5 MCQs

๐Ÿ’ก

a) Specific task ke liye modelโ€ฆ

๐Ÿ”‘

b) Massive text data se generalโ€ฆ

โšก

c) Model ko fast banana

๐ŸŽฏ

d) Model ko small banana

Topic 11
๐Ÿ“ฅ โš™๏ธ ๐Ÿ”ฌ ๐Ÿ’ก

๐Ÿ”— Navigation

๐Ÿ’ก | โฌ…๏ธ Previous | ๐Ÿ“š Index | โžก๏ธ Next | |---|---|---| | 05-Embeddings.md | 00-Index.md |โ€ฆ
Quick Quiz
๐Ÿง  QUIZ TIME

Quiz โ€” Question 1

๐Ÿ”ฅ Topic 06: Pre-Training โ€” LLM Kaise Seekhta Hai ka sabse sahi definition kya hai?

Quick Quiz
๐Ÿง  QUIZ TIME

Quiz โ€” Question 2

๐Ÿ”ฅ Topic 06: Pre-Training โ€” LLM Kaise Seekhta Hai ka 'Direction' kya hai?

Complete! ๐ŸŽ‰
COMPLETE

๐Ÿ”ฅ Topic 06: Pre-Training โ€” LLM Kaise Seekhta Hai Complete!

Aliens School ยท HIEN ยท Cinematic Knowledge

โœ…

๐Ÿ”ฅ Topic 06: Pre-Training โ€” LLM Kaise Seekhta Hai Complete

1/16
0:00
REC 00:00ESC=Cancel
Aliens School
3
Recording shuru hone wali hai...
โœ…
Recording Complete
Video process ho rahi hai...
Live Class
Slide 1 / 7
Timer
00:00
๐Ÿ“ Speaker Notes
โ€”
โญ๏ธ Up Next
โ€”
โ€”
๐Ÿ—‚๏ธ All Slides