Subject

Embeddings & Vector Search

The numbers that turn meaning into geometry. What embeddings are, how to pick an embedding model, how vector databases store and search them, and how to chunk text so the vectors actually capture what a passage is about.

An embedding is the trick that lets a computer treat "meaning" as something it can measure. You hand a piece of text to an embedding model, and it hands back a list of numbers, usually a few hundred to a few thousand of them. That list is the embedding. On its own a single embedding tells you nothing. The magic shows up when you compare two of them.

A map of meaning

The cleanest way to picture an embedding is as a point on a map. Imagine a giant map where every possible sentence has a location. The embedding model decides where to drop each pin. Its one job is to place texts with similar meaning close together and texts with unrelated meaning far apart.

On that map, "the cat sat on the mat" lands near "a feline rested on the rug" even though they share almost no words. "king" lands near "queen" and "monarch." "I need to reset my password" lands near "how do I recover my account" but far from "what time does the store close." The model learned these positions from huge amounts of text, so it captures meaning rather than spelling.

The map is not two-dimensional, of course. It has as many dimensions as the model outputs. You cannot picture 1,024 dimensions, and you do not need to. The intuition holds: nearby means similar.

Measuring "near"

To turn "near" into a number, you compare two embeddings with cosine similarity. It returns a score, usually between -1 and 1, where 1 means the two texts point in the same direction (very similar), 0 means unrelated, and negative means opposed. In practice for text you mostly see scores between roughly 0 and 1.

The computation is a dot product divided by the two vectors' lengths. You almost never write it by hand. In NumPy it is one line:

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

If your embedding model returns unit-length vectors (most do), the denominator is 1 and cosine similarity is just the dot product. That is why vector databases optimize so hard for fast dot products.

What embeddings power

Once you can score how related any two texts are, a stack of useful tools falls out:

Semantic search. Embed the query, embed every document, return the documents with the highest similarity. This finds matches by meaning, not by keyword, so "car won't start" surfaces a doc titled "engine ignition failure."
Retrieval for RAG. The retrieval half of retrieval-augmented generation is semantic search over your chunked documents. Embeddings are how the system decides which chunks to feed the chat model.
Clustering. Group similar texts automatically. Useful for sorting thousands of support tickets into themes without predefining the themes.
Deduplication. Near-duplicate detection. Two records with similarity above a threshold are probably the same thing said two ways.
Classification. Embed labeled examples, embed a new input, assign the label of the nearest examples. A cheap classifier with no training loop.

What an embedding is not

An embedding does not store the original text and you cannot read it back out reliably. It is a lossy fingerprint of meaning, not a compression of the words. It also reflects whatever the model learned, including its blind spots: a model trained mostly on English will place non-English text less precisely, and a general model will blur fine distinctions in a specialized domain like law or medicine.

That is the whole core idea. Text in, vector out, compare by distance. Everything else in this cluster, picking a model, storing the vectors, and chunking the text before you embed it, is engineering on top of this one foundation. If this is the first time the concept has clicked, an app for curious people who actually want to learn is a good way to cement it before moving on.