Essay

What Is a Large Language Model, Really?

Strip away the hype and you're left with a remarkably elegant statistical machine. Here's what's actually happening inside GPT-4 and its siblings.

Read08 MINDate10 Mar 2025ByHCB

Contents

The Prediction Machine
What "Training" Actually Means
The Transformer Architecture
What It Cannot Do
Why This Matters for Literacy

Filed under

← AI Literacy

Strip away the hype and you're left with a remarkably elegant statistical machine. Here's what's actually happening inside GPT-4 and its siblings.

The Prediction Machine

At its core, a large language model is a system trained to predict the next token in a sequence. That's it. Every capability that emerges from that — coding, reasoning, storytelling, translation — is a consequence of doing this one thing at extraordinary scale across extraordinary amounts of text.

When you type a prompt into ChatGPT, you're not querying a database of facts. You're feeding a sequence of tokens to a neural network that returns a probability distribution over what token should come next. The model samples from that distribution, appends the token, and repeats.

What "Training" Actually Means

Training an LLM means adjusting hundreds of billions of numerical parameters — weights in a neural network — so that the model becomes better at predicting held-out text from its training corpus. The optimisation algorithm (typically some variant of stochastic gradient descent) nudges these weights in directions that lower prediction error.

After enough nudging across enough data, something remarkable happens: the network develops internal representations that correspond to grammar, facts, logic, and eventually what looks like reasoning.

The Transformer Architecture

Modern LLMs are built on the Transformer architecture, introduced by Google researchers in 2017. The key innovation is the attention mechanism — a way for the model to dynamically decide which parts of its context are relevant when generating each token.

When the model reads "The capital of France is ___", attention lets it weight "France" and "capital" heavily, pulling the right associations across the full context.

What It Cannot Do

Understanding the limits is as important as understanding the capabilities:

LLMs have no persistent memory between conversations unless given tools
They cannot access the internet without a plugin or tool call
They don't "know" things the way humans do — they have statistical associations
They can be confidently wrong, especially about niche or recent topics

Why This Matters for Literacy

You don't need to understand backpropagation to use these tools well — but you do need a mental model of what's happening. When you know you're working with a next-token predictor, you understand why prompting matters, why hallucinations happen, and why the model is neither magic nor malevolent.

That mental model is the foundation of AI literacy.