As of 2026-05-31
As of 2026-05-31
A vector database does two jobs. It stores your embeddings, and it builds an index that finds the nearest vectors to a query without scanning all of them. The second job is the reason these systems exist. Comparing your query embedding against ten thousand stored vectors is trivial; doing it against fifty million on every request is not, and a brute-force scan would be far too slow.
The core job: store plus ANN index
Finding the closest vectors by checking every one is called exact nearest-neighbor search. It is accurate and it does not scale. Vector databases instead build an approximate-nearest-neighbor (ANN) index, most commonly HNSW (Hierarchical Navigable Small World graphs). HNSW arranges vectors into a navigable graph so a query hops toward its neighbors and only inspects a small slice of the data.
The "approximate" part means you might occasionally miss the true top result, but you tune that. Index parameters trade recall against speed and memory. This is the central knob a vector database gives you, and it is why a real comparison is about your recall and latency targets, not about a single throughput number from someone else's benchmark.
The axes that actually decide it
Forget the leaderboard framing. Pick on these.
Managed versus self-hosted. A managed service (you send vectors, they run the cluster) removes operational load and is the fastest way to ship. Self-hosting keeps data in your environment and removes per-vector pricing, at the cost of running and scaling the system yourself. This is usually the first fork in the road.
Standalone engine versus Postgres extension. Standalone vector engines such as Pinecone, Weaviate, Qdrant, and Milvus are purpose-built for vectors and tend to scale further with more index tuning. The pgvector extension adds vector columns and ANN indexes to Postgres, so your vectors live next to your relational data with one backup and one transaction story. If you already run Postgres at moderate scale, that simplicity is worth a lot.
Metadata filtering. Real queries are rarely pure similarity. You want "the nearest chunks from this user's documents, published after January, tagged 'billing.'" How a system combines metadata filters with the ANN search matters enormously. Some apply the filter after retrieval (which can return too few results), some build filtering into the index. If your access patterns lean on filters, test this specifically.
Scale and operations. Number of vectors, write throughput, query latency at your p99, and whether you need horizontal sharding. A system that is delightful at one million vectors can fall over at one billion. Match the tool to the scale you actually have, plus a year of growth, not to a hypothetical.
A starting recommendation
The lowest-regret move is to start with whatever is closest to your existing stack:
- Already on Postgres, moderate scale: pgvector.
- Want zero operations, fast to ship: a managed vector service.
- Need a self-hosted engine that scales and tunes deeply: a standalone engine you run.
Then prototype with real data. Load a representative slice, run your actual query mix with metadata filters attached, and measure recall and latency. The product that wins on your workload is the one to keep. A vendor benchmark run on someone else's data and someone else's filters tells you almost nothing about how the system behaves on yours.
Comments 0
No comments yet. Be the first to share your thoughts.