Are Pinecone, Weaviate, and Qdrant dying?

No, but the "easy RAG for chat-with-docs" segment that drove their early growth is being absorbed by file search inside OpenAI and contextual retrieval / Files API inside Anthropic. The standalone vector DBs are surviving on the use cases the providers do not cover well: very large corpora, complex filtered retrieval, multi-model deployments, on-prem or sovereign hosting, governance requirements. The category narrows; it does not disappear.

Why has nobody absorbed re-ranking yet?

They have, partially. Provider-side reranking is starting to appear, but high-quality cross-encoder reranking (Cohere Rerank, BGE rerankers, Voyage Rerank) remains a meaningfully better solution than the native options for production RAG. Reranking benefits from specialized small models trained specifically for the task; the foundation models are good at many things but a specialized reranker still wins on the specific job.

Is fine-tuning getting absorbed?

Half-yes. Provider-hosted fine-tuning (OpenAI, Anthropic, Google) has gotten much better — it is now first-class for most teams. What the providers do not absorb is fine-tuning your own model on your own infrastructure with your own data and your own deployment story. That part is more durable, especially for regulated industries and for teams using open-weight models.

What Foundation Models Keep Absorbing

As of 2026-05-23

The absorption pattern is easier to see category by category. What follows is the 2026 snapshot of which LLM tooling has been absorbed by providers, which is still independent, and where each category sits on the "durable vs absorbed" axis.

1. Function calling and tool use

Absorbed by: OpenAI (function calling, June 2023 → tools in Responses API), Anthropic (tool use), Google Gemini (function calling). All three providers have first-class APIs for this.

State: Fully absorbed. The early LangChain-style "we provide a tool layer over the LLM" value proposition no longer holds — the provider APIs handle this natively, with better latency and tighter model integration.

What survives: Multi-provider orchestration (run the same agent on Claude or GPT or Gemini), complex multi-step agent state management (LangGraph and similar), specialized tool catalogs (MCP servers).

2. Code execution / code interpreter

Absorbed by: OpenAI Code Interpreter / Advanced Data Analysis (2023, now native in Assistants/Responses). Anthropic Computer Use and code execution tools. Google Gemini code execution.

State: Fully absorbed for general "model can run code in a sandbox" use cases. The standalone "give the LLM a sandboxed Python" category is gone.

What survives: Specialized execution environments for specific domains (R, Julia, specific data science stacks), execution with custom infrastructure dependencies (your code running against your private services), and very-secure sandboxing for adversarial workloads.

3. RAG and file search

Absorbed by: OpenAI File Search (Assistants/Responses), Anthropic Contextual Retrieval + Files API, Google Gemini grounding with Files / Search. All three providers now have first-class "attach documents, retrieve, answer" features.

State: The "easy RAG, chat with my docs" use case is absorbed for most teams. The native versions are good enough that building your own ingestion + chunking + embeddings + vector DB + retrieval for that specific shape no longer pays.

What survives: Production RAG at scale (millions of documents), hybrid retrieval (vector + BM25 + reranking), corpus-aware chunking, multi-tenant retrieval, retrieval over data the providers do not see (regulated, on-prem), and any RAG architecture that needs to work the same across providers. See how-rag-actually-works for the full picture; the provider-native versions are now table stakes for simple cases but most production RAG still benefits from a deliberate stack.

4. Vector databases

Absorbed by: The "vector search built into a general database" trend — pgvector for Postgres, MongoDB Atlas Vector Search, Redis vectors, Elasticsearch vector search, SQL Server vector columns. Foundation providers also offer their own vector storage as part of file-search features.

State: Partial absorption. The "we are a vector database" startup category has narrowed substantially. For under-a-million-documents use cases, pgvector or equivalent in your existing database is usually sufficient.

What survives: Pinecone, Weaviate, Qdrant continue to win at very large scale, with filtered search, hybrid retrieval, multi-tenant features, and operational maturity (managed service, monitoring, replication) that general databases do not match. The category has not died; the addressable market is smaller than it looked in 2023.

5. Embeddings

Absorbed by: Every major provider ships embedding endpoints (OpenAI text-embedding-3, Anthropic Voyage acquisition / embeddings, Gemini gemini-embedding). Native models keep getting better.

State: Largely absorbed for general-purpose embeddings. The "we are an embeddings-as-a-service company" pure-play has narrowed.

What survives: Specialized embeddings (legal, medical, code), fine-tuned embeddings for specific corpora, very-multilingual embeddings, ultra-low-latency embeddings, on-prem embeddings. Cohere, Voyage (pre-acquisition), Jina, BGE on Hugging Face all still ship; the differentiation has moved up the stack from "embeddings" to "embeddings for X."

6. OCR and document understanding

Absorbed by: Anthropic Claude PDF (page-level visual reading), Gemini multimodal document processing, OpenAI vision-on-files. For clean printed documents and standard layout extraction, the vision LLMs in the major APIs now work end-to-end.

State: Partial absorption. For routine documents (slides, screenshots, simple PDFs, contracts), provider-native vision wins on cost and developer experience.

What survives: Bulk text extraction at scale (Tesseract, Textract, Document Intelligence remain cheaper per page), specialized form processing (templates, structured extraction), handwriting OCR for archives, very-low-resolution scans, OCR with confidence scores for compliance. Mistral OCR and similar dedicated services still ship because the volume/cost economics differ from general LLM calls.

7. Audio transcription

Absorbed by: OpenAI's GPT-4o/5.x native audio input, Anthropic's audio support via tool integrations, Google Gemini native audio + Live API for streaming. Realtime voice APIs are now first-class.

State: Partial absorption. For interactive voice (a chatbot you talk to), native multimodal models often beat the older "Whisper + LLM" chain. For batch transcription of archives, Whisper is still the cost-effective default.

What survives: Specialized transcription (medical, legal, multilingual edge cases, diarization), large-scale batch transcription where price-per-minute matters, transcription that integrates into broader audio infrastructure. Deepgram, AssemblyAI, Speechmatics continue to ship and grow.

8. Content moderation and safety classification

Absorbed by: OpenAI native moderation, Anthropic safety classifiers built into the model, Google Vertex AI safety filters, Llama Guard for open-weight stacks. AWS Bedrock Guardrails and Azure AI Content Safety provide cloud-native versions.

State: Largely absorbed. The standalone "we are a content moderation company for AI" category has narrowed.

What survives: Domain-specific moderation (medical, legal, financial nuance), policy-customization at scale, region-specific compliance, observability and audit logging across moderation decisions. Perspective API, Hive, and similar still serve large customers; the easy general-content-classification case has been eaten.

9. Eval and observability

Absorbed by: Provider-side traces (OpenAI Logs/Evals, Anthropic Workbench, Google Vertex AI evaluation). Partial absorption.

State: Not very absorbed. Eval frameworks (Promptfoo, Braintrust, LangSmith, Inspect, Phoenix, Weave) are still healthy because the value proposition is multi-model, multi-environment, in your existing CI, and tied to your specific eval set.

What survives: Almost all of it. Eval and observability are durable categories.

How to read the map

The pattern across the nine categories above is consistent: the easy 70% of each category gets absorbed; the hard 30% becomes the new market for the third parties. The categories that fully die — like "we provide a tool-calling abstraction over the bare LLM" — are the ones where the easy case is the whole market.

The implication for your own stack: walk through each layer and ask, "is the value here in solving the easy case or the hard case?" If it is the easy case, expect the provider to absorb it in the next 12–18 months. If it is the hard case, the dependency is probably durable.

This is also the framing the picking-ai-tools-that-age-well article goes deeper on. Recognizing absorption is the first half; making good tech bets in the face of it is the second.

Comments 2

u/absorb_a · 1 month ago

the 'your feature is one model release away from being free' point is terrifying if your building exactly that thing. plan your moat accordingly
u/future_watch · 1 month ago

Good framing — every wrapper feature is basically one model release away from being absorbed. Sobering if you happen to be building in exactly that gap.