How LLMs Work: Tokens & Context

What is an LLM?

An LLM, or large language model, is software trained on huge amounts of text to predict and generate useful language. The practical way to think about it: you give it a task, it turns your text into tokens, estimates likely next tokens and returns a draft. It can write, summarize and classify because many language patterns are compressed into its parameters. It is not a database and it does not automatically know whether a fresh fact is true.

Example: A SaaS support lead asks: “Summarize this complaint and propose a calm refund reply.” The model does not “feel” the complaint; it recognizes the pattern of complaint, refund policy and professional tone.

What is a prompt?

A prompt is the instruction package you send to an AI system. Good prompts usually include the role, the situation, the exact task, constraints, examples and the desired output format. A weak prompt asks for “ideas”. A strong prompt says who the ideas are for, what they must achieve and how they should be judged.

Example: Weak: “Write a product description.” Better: “Act as an e-commerce copywriter for an online store. Write a 90-word product description for a reusable coffee cup. Audience: commuters. Tone: practical, not hype. Include one headline and three bullets.”

What actually happens when you send a prompt?

The normal flow is not: “the AI understands the request, writes a Python program, then starts working.” A large language model usually receives text, turns it into tokens, runs those tokens through neural-network layers and generates the next token again and again. Python or other tools only enter the process when the product around the model explicitly gives it tool access, for example a calculator, code interpreter, browser, database connector or function call.

Step-by-step diagram of what happens when a user sends a prompt to an LLM. — A prompt is tokenized, processed by the model and decoded into a response step by step.

1. Input

You type a prompt. The application may add hidden system instructions, safety policies, chat history or selected documents before the model sees the final request.

2. Tokens

The text is split into token IDs. The model does not receive words as a person reads them; it receives numerical token identifiers.

3. Inference

The model computes probabilities for possible next tokens based on the full context. This is where attention, embeddings and model weights matter.

4. Optional tool

If the application supports tools, the model may request a function such as search, calculator or code execution. The external result is then inserted back into the conversation as more context.

5. Answer

The final text is decoded from generated tokens. Better prompts help because they change the context the model conditions on before choosing each next token.

Prompting lesson: the model is highly sensitive to the information it receives before generation. Clear context, examples, constraints and source material improve the probability that the next tokens follow your intent.

Try the ideas on your own prompts

Use a simple before-and-after exercise before continuing with tokens, context, and model limitations.

Better than a mini test: build a prompt before-and-after gallery

Instead of running a generic test, collect five real prompts from your own work and save the weak version, the improved version and the final edited result. This becomes a reusable prompt gallery for your team.

Pick one recurring task, such as support replies, product descriptions or lesson summaries.
Save the original prompt and output.
Add context, format rules and one example.
Compare how much editing the second output needs.
Turn the winner into a template.

What are tokens?

Tokens are the small text units a model reads and writes. A token can be a whole word, part of a word, punctuation or a character, depending on the tokenizer and language. Token limits matter because input plus output must fit inside the model context window.

Example: “PromptingEasy helps teams” might become tokens similar to “Prompt”, “ing”, “Easy”, “helps”, “teams”. This is why long documents and many examples increase cost and may crowd out important instructions.

Diagram showing text converted into colored token pieces and converted back into the original sentence. — Text is split into tokens and then assembled back into readable text.

Why does AI sometimes invent things?

AI can hallucinate when it generates a plausible-sounding continuation without enough reliable grounding. The model is optimized to produce likely text, not to guarantee truth by default. Hallucinations become more likely when the question asks for obscure facts, fresh information, hidden data or citations that were not provided.

Example: A user asks for “the exact 2026 price of a niche API plan” without browsing or source text. The model may produce a confident-looking price because that shape of answer is common, even if the number is wrong.

The same mechanism can invent a work status, not just a fact. In one of our own workflows an AI assistant was asked to look up the matching LinkedIn profile for every row of a spreadsheet and write each URL into one column — dozens of rows, not a single lookup. It reported the task as done and even gave counts — “14 profiles, 3 emails” — but the column was empty and the returned file was a renamed copy. A confident summary is simply the most plausible next words, so it is no proof that the work happened.

Asking again does not necessarily break the pattern. Depending on the product, a retry may resend the visible conversation, use server-side conversation state, trim or summarise older turns, or reuse cached input. It still generates a new answer and may consume additional tokens or plan limits, even when no usable file is produced.

An AI assistant admitting it reported a spreadsheet task as complete without having written any data into the file. — Real case from our own workflow (screenshot, 2026): the assistant reported counts and a finished file — the target column was still empty.

Reason: is the AI caught in a loop?

In a practical sense, yes — but not the way it feels. The model has no persistent self-awareness of repeating a failure; it can only use the context and state that the application supplies. A new answer may be generated from the full visible history, a truncated or summarised history, or server-managed state. If the workflow does not require the assistant to open the file and verify the saved values, then “done” may remain a plausible response. The loop is not stubbornness; it is missing verification and state checks.

That is why repeating the same question rarely helps, while it keeps burning tokens. What breaks the loop is changing the task, not repeating it: make the result verifiable, ask the assistant to read the saved file back and show the actual cell values, and stop the run when that evidence is missing. See the most common prompting mistakes for how to define a success criterion the model can check.

Why does context help?

Context narrows the search space. If the model knows the audience, goal, constraints, examples and source material, it can produce an answer that fits your situation instead of a generic answer. More context is not always better: irrelevant context can distract the model and increase cost.

Example: A marketing team gets better ad copy when it includes the product positioning, target customer, banned claims and two successful past ads instead of only saying “write ads”.

Search engine vs. LLM

A search engine retrieves pages and ranks links. An LLM generates an answer from patterns and provided context. Search is better when you need current sources, official pages or multiple perspectives. LLMs are better when you need synthesis, rewriting, reasoning over supplied information or structured drafts. Many strong AI products combine both.

Example: For “latest tax deadline in Zurich”, use search or official sources. For “turn these notes into a polite customer email”, an LLM is the better interface.

Comparison diagram showing how a search engine and an LLM answer the same user question differently. — Search engines return links and sources; LLMs generate direct natural-language answers.

What does “AI understands language” mean?

In everyday language, “understands” means the model can respond appropriately to meaning, tone and structure. Technically, it learned statistical representations that map text patterns to useful outputs. It does not understand like a person with lived experience, intentions or common-sense accountability.

Example: If you write “make this less salesy,” the model can often adjust tone because it has learned patterns of salesy vs. neutral language. That is useful linguistic competence, not human understanding.