Breaking down how Large Language Models work Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support --- Here are a few other relevant resources Build a GPT from scratch, by Andrej Karpathy • Let's build GPT: from scratch, in cod... If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic: • What does it mean for computers to un... If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources. https://transformer-circuits.pub/2021... History of language models by Brit Cruise, @ArtOfTheProblem • The 35 Year History of ChatGPT An early paper on how directions in embedding spaces have meaning: https://arxiv.org/pdf/1301.3781.pdf Звуковая дорожка на русском языке: Влад Бурмистров. --- Timestamps 0:00 - Predict, sample, repeat 3:03 - Inside a transformer 6:36 - Chapter layout 7:20 - The premise of Deep Learning 12:27 - Word embeddings 18:25 - Embeddings beyond words 20:22 - Unembedding 22:22 - Softmax with temperature 26:03 - Up next

Explore this course

Neural networks

3Blue1Brown

Course progress

0 of 8 lessons complete

Transcript

Follow along using the transcript.

3Blue1Brown

7.27M subscribers

Neural networks

by 3Blue1Brown

Attention in transformers, step-by-step | DL6

by 3Blue1Brown

Transformers (how LLMs work) explained visually | DL5

Chapters View all

Predict, sample, repeat

Predict, sample, repeat

Predict, sample, repeat

Inside a transformer

Inside a transformer

Inside a transformer

Chapter layout

Chapter layout

Chapter layout

The premise of Deep Learning

The premise of Deep Learning

The premise of Deep Learning

Word embeddings

Word embeddings

Word embeddings

Embeddings beyond words

Embeddings beyond words

Embeddings beyond words

Unembedding

Unembedding

Unembedding

Softmax with temperature

Softmax with temperature

Softmax with temperature

Course progress

3Blue1Brown

Transformers (how LLMs work) explained visually | DL5

Comments 3.1K

Chapters

Predict, sample, repeat

Predict, sample, repeat

Predict, sample, repeat

Inside a transformer

Inside a transformer

Inside a transformer

Chapter layout

Chapter layout

Chapter layout

The premise of Deep Learning

The premise of Deep Learning

The premise of Deep Learning

Word embeddings

Word embeddings

Word embeddings

Embeddings beyond words

Embeddings beyond words

Embeddings beyond words

Unembedding

Unembedding

Unembedding

Softmax with temperature

Softmax with temperature

Softmax with temperature

Up next

Up next

Up next

Products

3Blue1Brown | Mathematical Quotebook Notebook

3Blue1Brown | Torus Mug Black Edition

3Blue1Brown | TALL Pi Plushie

3Blue1Brown | Fourier Series Socks M

3Blue1Brown | Vertical Alignment Shirt UNI / S

3Blue1Brown | Knot Theory Tie

Description

Chapters View all

Course progress

3Blue1Brown

Transcript

Attention in transformers, step-by-step | DL6

Chapters

Chapters