Talk to AI
Explore the mechanics of Large Language Models and the foundational techniques of prompt engineering. This short guide breaks down tokenization, common AI failure modes, etc culminating in a machine learning pipeline that evaluates prompt quality.
AI Prompt Mastery
1. Introduction: What Is an LLM and How Does It Work?
Think of an LLM (Large Language Model) like an incredibly well-read system that has processed trillions of words from the internet, books, and code. When you ask it something, it doesn’t “look it up.” It predicts what the most likely next word should be, based on everything it has ingested.
At a technical level, input text is tokenized, embedded into high-dimensional vectors, and passed through Transformer blocks utilizing self-attention mechanisms. The model outputs a probability distribution over a vocabulary to generate the final response token by token.
Tech Stack
The concepts detailed below were practically implemented into a scoring application using the following stack:
- AI/ML Engineering: Python, scikit-learn, ONNX export, feature engineering, SHAP analysis.
- Frontend Development: Svelte SPA, Apache ECharts, Tailwind CSS.
- Backend/Cloud: Firebase Firestore, Firebase Hosting, serverless architecture.
Your Prompt
Token Visualization
Enter a prompt above to see token breakdown
CRAFT Score
Feature Impact (SHAP-Style)
Analyze a prompt to see feature impact
2. Tokens — The Currency of AI
A token is the smallest unit of text the model processes. It is not quite a word and not quite a character. As a rule of thumb, 1 token is approximately 0.75 words in English.
Tokens govern the context window (the maximum data the model can process at once), execution cost, and response efficiency. Vague prompts waste tokens on hedging and fillers, whereas precise prompts yield higher quality output per token.
3. What Is a Prompt?
A prompt is the instruction, question, or context given to an AI model. It is the sole lever available to control output quality. Prompt quality is measured across five dimensions: Clarity, Specificity, Context, Constraints, and Role/Persona.
4. Types of Prompting
Different tasks require distinct prompting methodologies:
- Zero-Shot Prompting: No examples given. Best for simple, well-defined tasks the model already understands.
- Few-Shot Prompting: Providing 2–5 examples before the task. Ideal for domain-specific classification and format matching.
- Chain-of-Thought (CoT): Forcing the model to reason step-by-step before answering. Highly effective for logic and multi-step problems.
- Role / Persona Prompting: Assigning a character or expertise to the model to dictate tone and domain-specific framing.
- Tree of Thought (ToT): Asking the model to explore multiple reasoning paths simultaneously before committing to a decision.
- ReAct (Reasoning + Acting): Combining reasoning with external tool use (web search, API calls).
5. Failure Modes — What Bad Prompting Causes
Without strict parameters, LLMs exhibit several known failure modes:
- Hallucination: The model confidently states something factually wrong by filling knowledge gaps with statistically probable text.
- Prompt Injection: Malicious input tricks the AI into ignoring its original instructions.
- Sycophancy: The model tells the user what they want to hear instead of objective truth, a byproduct of RLHF training.
- Overconfidence: Generating precise but fabricated numbers or claims without inherent uncertainty calibration.
- Context Drift: In long conversations, the model forgets early context due to token limitations.
- Ambiguity Misfire: Underspecified tasks leading to unexpected output formats or scopes.
6. How LLMs Combat Poor Prompting & Verification
Developers employ Reinforcement Learning from Human Feedback (RLHF), system prompt hardening, and Retrieval-Augmented Generation (RAG) to reduce hallucinations and ground responses in factual documents.
Users must independently verify outputs using the SIFT Framework:
- Stop: Pause before acting on AI output.
- Investigate: Verify the cited source.
- Find: Check authoritative coverage.
- Trace: Follow the chain of reasoning.
7. Effective Prompting: The CRAFT Framework
The CRAFT framework guarantees highly specific and controlled AI outputs.
- C — Context: Set the scene. (e.g., “We are a startup serving 50,000 users…“)
- R — Role: Tell the model who to be. (e.g., “Act as a senior analyst…“)
- A — Action: The specific task, verb-first. (e.g., “Analyze the attached data…“)
- F — Format: Exactly how you want the output. (e.g., “Return a markdown table…“)
- T — Tone + Constraints: Guardrails on style and length. (e.g., “Under 300 words. Plain English.“)
8. Methods: Pipeline Setup
To quantify prompt quality against the CRAFT framework, a machine learning pipeline was built. The system extracts heuristic features from raw prompt text and utilizes a Gradient Boosting Regressor to assign dimension-specific scores.
Feature Extraction Snippet:
View Source Code
Click to expand interactive code modal