AI Terms Everyone Is Using — Simplified and Explained
Like most people, I struggle to keep up with the endless AI jargon I come across online every day. There’s always a new term, a new framework, or a new concept that everyone suddenly starts using.
So, I’m starting a list — a simple, easy-to-follow compilation of all the new AI terminologies I come across.
MMLU — Massive Multitask Language Understanding
Explanation: Exam for LLMs to compare the performance; which one is the most capable. It was created by researchers at UC Berkeley. It has 57 diverse subjects, including areas like elementary mathematics, U.S. history, computer science, and law.
There is something called 1-Shot MMLU and 5-Shot which basically means how many examples(question + answers ) are given as reference before it attempts to answer the test questions. This setup evaluates the model’s ability to generalize from a minimal amount of context.
Where I heard it? https://arxiv.org/abs/2009.03300
Direct Preference Optimization (DPO)
Explanation: Direct Preference Optimization (DPO) is a machine learning technique that fine-tunes AI models to align with human preferences without requiring Reinforcement Learning (RL). It simplifies preference learning by directly optimizing a model based on a dataset of ranked responses. In simple words, there is a dataset which showcases what preferences do humans have and AI learns it. Imagine showing AI dataset of Amazon or YouTube where it can find correlation into which products do humans like more based on like, comments, watch time metrics and others.
Where I heard it? Direct Preference Optimization (DPO) on both off- and on-policy preference data https://allenai.org/blog/tulu-3-405B
Supervised finetuning (SFT)
Explanation: Supervised learning means training a machine learning model using labeled data (example: Image A = CAT, Image B = Person), where each input has a corresponding correct output. Supervised fine tuning (SFT) is a machine learning technique where a pre-trained model(like LAMA, DeepSeek) is further trained (or “fine-tuned”) using a labeled dataset to specialize it for a specific task(like face detection or customer support).
Where I heard it? Supervised finetuning (SFT) on our carefully selected mix of prompts and their completions. https://allenai.org/blog/tulu-3-405B
RLVR — Reinforcement Learning from Verifiable Rewards
Explanation: For a mathematical problem like “What is 15 multiplied by 7?”, the model generates an answer. The verification function checks if the answer matches the correct result (105). If it does, the model receives a positive reward; otherwise, it doesn’t. These rewards tells a transforms weights it’s doing a right job.
Where I heard it? Interestingly, TULU 3’s team found that our Reinforcement Learning from Verifiable Rewards (RLVR) framework improved the MATH performance more significantly at a larger scale, i.e., 405B compared to 70B and 8B, similar to the findings in the DeepSeek-R1 report. https://allenai.org/blog/tulu-3-405B
Reinforcement Learning from Human Feedback (RLHF)
Explanation: Imagine training an AI chatbot by showing it multiple responses to a question. Humans rank which response is the best, and the AI learns to improve its future answers based on those rankings. This is how models like ChatGPT were fine-tuned to be more aligned with human preferences.
Where I heard it? https://arxiv.org/pdf/2204.05862
Retrieval-Augmented Generation (RAG)
Explanation: Think of it as an AI with an open-book test. Instead of answering questions only based on what it has memorized, it can look up extra details before responding, making its answers more reliable. This is especially useful for real-time applications like customer support, legal research, and medical assistance.
Mixture of Experts (MoE)
Explanation: MoE is an AI model architecture that uses multiple specialized networks (called “experts”) to process different types of data or tasks. Instead of a single model handling everything, it routes inputs to the most relevant expert, improving efficiency and performance.
Note:
This is a live document, and I’ll keep adding new terms over time.