What you will learn
Transformer explained
The Transformer is the architecture behind many modern LLMs. Its key idea is to process tokens with attention layers so the model can weigh relationships across the context. This made training more parallelizable than older recurrent approaches and became a foundation for modern language models.
Embeddings explained
Embeddings are numeric vectors that represent text, images or other data in a space where similar meanings tend to be closer. They are useful for semantic search, clustering, recommendations and retrieval. The important detail: embeddings are not truth; they are similarity signals.
Self-Attention explained
Self-attention lets each token weigh other tokens in the same context. The model computes relationships repeatedly across layers, creating richer representations of words, phrases and dependencies. It is one reason LLMs can use long instructions and examples.
Tokenization deep dive
Tokenization converts text into model-readable units. Different languages and scripts can use tokens differently, which affects cost and context length. Tokenization also explains why exact character limits, code snippets and rare words can be tricky.
RAG architecture
A production RAG system has ingestion, chunking, embeddings, vector index, retrieval, reranking, prompt assembly, answer generation and evaluation. Most failures happen before generation: poor chunks, stale documents, bad ranking or missing source filtering.
Prompt Injection
Prompt injection happens when untrusted text tries to override system or developer instructions. It is common in RAG and agent workflows because retrieved pages can contain hidden instructions. Treat external content as data, not authority.
LLM Evaluation
LLM evaluation measures whether outputs are correct, useful, safe and consistent. Use a mix of automated checks, model-graded rubrics, human review and task-specific tests. Track regressions over time, especially after prompt, model or retrieval changes.
Function Calling
Function calling lets a model return structured arguments for a tool instead of free text. The application then decides whether to call the tool, validate arguments and handle errors. The model should not be the security boundary.
Agents explained
An agent is an LLM-driven system that can plan steps, call tools, observe results and continue. Agents are useful for multi-step workflows but harder to evaluate and secure than single prompts. Start narrow before building a general agent.
Hallucinations technically explained
Hallucinations arise from the gap between fluent generation and grounded verification. The model can produce likely text even when it lacks evidence, misreads retrieved context or overgeneralizes from training patterns. Mitigation needs retrieval quality, uncertainty handling, validation and evaluation — not just a better wording of the prompt.
Mini test you can run
Pick five real tasks from your own workflow. Run one short prompt, one structured prompt and one prompt with examples or source context. Score each output from 1 to 5 for usefulness, factual risk and edit time. Keep the winning prompt as your baseline and retest after every major change.
| Variant | Usefulness | Factual risk | Edit time |
|---|---|---|---|
| Short prompt | Medium | Higher | High |
| Structured prompt | High | Medium | Medium |
| Context + examples | Highest for repeat tasks | Lower if sources are good | Low |