Level 27: LLM | Tech Knowledge Hub

🧩 The Token

LLMs don't read words; they read "Tokens". A token is about 0.75 of a word (e.g., "ing", "the", "apple"). GPT-4 knows about 100,000 unique tokens.

🎲 Probabilistic

It doesn't "know" facts. It knows patterns. It knows that after "The capital of France is", the token "Paris" appears 99% of the time in its training data.

🌡️ Temperature

The "Creativity" setting.
Temp 0: Always picks the most likely word (Robotic).
Temp 1: Sometimes picks unlikely words (Creative/Hallucinations).

🕹️ The Autocompleter 3000

Mission: You are the GPU. Based on the current sentence, choose the next word.
Adjust Temperature to see how it changes the "Dice Roll".

👻 What is a Hallucination?

Imagine the model is predicting: "The first person on Mars was..."

Training data doesn't have this answer (because it hasn't happened).
The model sees "Elon" has a 15% probability and "Neil" has a 10% probability in similar sci-fi contexts.
If Temperature is high, it might pick "Elon Musk".
It states it as fact because it's just completing the sentence pattern. That is a hallucination.