What Are Embeddings?

Article summary

An embedding is a list of numbers (a vector) that represents a piece of text, placed so that texts with similar meaning end up close together in that space. You compare two embeddings with cosine similarity to get a single "how related are these" score. That one idea powers semantic search, retrieval for RAG, clustering, deduplication, and classification. You do not read the numbers; you measure distances between them.

An embedding is the trick that lets a computer treat "meaning" as something it can measure. You hand a piece of text to an embedding model, and it hands back a list of numbers, usually a few hundred to a few thousand of them. That list is the embedding. On its own a single embedding tells you nothing. The magic shows up when you compare two of them.

A map of meaning

The cleanest way to picture an embedding is as a point on a map. Imagine a giant map where every possible sentence has a location. The embedding model decides where to drop each pin. Its one job is to place texts with similar meaning close together and texts with unrelated meaning far apart.

On that map, "the cat sat on the mat" lands near "a feline rested on the rug" even though they share almost no words. "king" lands near "queen" and "monarch." "I need to reset my password" lands near "how do I recover my account" but far from "what time does the store close." The model learned these positions from huge amounts of text, so it captures meaning rather than spelling.

The map is not two-dimensional, of course. It has as many dimensions as the model outputs. You cannot picture 1,024 dimensions, and you do not need to. The intuition holds: nearby means similar.

Measuring "near"

To turn "near" into a number, you compare two embeddings with cosine similarity. It returns a score, usually between -1 and 1, where 1 means the two texts point in the same direction (very similar), 0 means unrelated, and negative means opposed. In practice for text you mostly see scores between roughly 0 and 1.

The computation is a dot product divided by the two vectors' lengths. You almost never write it by hand. In NumPy it is one line:

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

If your embedding model returns unit-length vectors (most do), the denominator is 1 and cosine similarity is just the dot product. That is why vector databases optimize so hard for fast dot products.

What embeddings power

Once you can score how related any two texts are, a stack of useful tools falls out:

Semantic search. Embed the query, embed every document, return the documents with the highest similarity. This finds matches by meaning, not by keyword, so "car won't start" surfaces a doc titled "engine ignition failure."
Retrieval for RAG. The retrieval half of retrieval-augmented generation is semantic search over your chunked documents. Embeddings are how the system decides which chunks to feed the chat model.
Clustering. Group similar texts automatically. Useful for sorting thousands of support tickets into themes without predefining the themes.
Deduplication. Near-duplicate detection. Two records with similarity above a threshold are probably the same thing said two ways.
Classification. Embed labeled examples, embed a new input, assign the label of the nearest examples. A cheap classifier with no training loop.

What an embedding is not

An embedding does not store the original text and you cannot read it back out reliably. It is a lossy fingerprint of meaning, not a compression of the words. It also reflects whatever the model learned, including its blind spots: a model trained mostly on English will place non-English text less precisely, and a general model will blur fine distinctions in a specialized domain like law or medicine.

That is the whole core idea. Text in, vector out, compare by distance. Everything else in this cluster, picking a model, storing the vectors, and chunking the text before you embed it, is engineering on top of this one foundation. If this is the first time the concept has clicked, an app for curious people who actually want to learn is a good way to cement it before moving on.

Frequently asked questions

Are embeddings the same thing as the model that generates text?

No. A text-generation model (the chat model) produces words. An embedding model produces a fixed-length vector that represents meaning. They are usually separate models with separate APIs, even when the same company ships both. You embed text to compare or search it; you generate text to answer with it. A RAG system uses both: an embedding model to find relevant chunks, then a chat model to answer using them.

Why cosine similarity and not plain distance?

Cosine similarity measures the angle between two vectors and ignores their length. For text embeddings, direction carries the meaning and magnitude mostly does not, so cosine is the standard. Euclidean distance also works and gives nearly identical rankings when vectors are normalized to unit length, which most embedding models already do. If you normalize, cosine similarity and Euclidean distance become interchangeable for ranking.

How many numbers are in an embedding?

It depends on the model. Common dimensions range from 384 up to a few thousand. More dimensions can capture more nuance but cost more memory and compute to store and search. The dimension is fixed by the model: a 1,024-dimension model always returns 1,024 numbers per input, whether the input is one word or one page.