An 18-staff legal practice builds a retrieval system over twelve years of client matters using OpenAI’s text-embedding-3-small. The system works. Lawyers ask in plain English and the system returns the relevant precedents, contract clauses, and case notes from the firm’s own archive. Twelve months later, on data-residency grounds, the firm decides to move its primary AI provider to Claude. The legal team assumes the vector database transfers across. It does not. The embedding vector space is incompatible with anything else.
Re-embedding the corpus costs around £8,000 in compute and three weeks of engineering time to redesign the retrieval pipeline. The technology had worked. The economics of switching had not been priced. The owner recognises, on the way out, what was never named on the way in: the embedding choice was a vendor lock-in vector that nobody flagged at procurement, and the bill was real.
What is an embedding, and what is a vector database?
An embedding is a list of numbers, typically 768 to 3,072 decimal values, that represents the meaning of a piece of text, an image, or audio. Each number captures one dimension of meaning. Similar concepts land near each other in that mathematical space, so “blue sofa” and “azure couch” cluster together, while “blue sofa” and “tax return” sit nowhere near. Closeness is measured with cosine similarity, and that geometry is what powers semantic search.
A vector database stores those embeddings and returns the closest matches to a query vector in milliseconds. Traditional relational databases like PostgreSQL or MySQL cannot search millions of high-dimensional vectors efficiently, so vector databases use indexing algorithms (HNSW, IVF) to make the search feasible at scale. The pair sits underneath every modern AI system that searches your own data.
Why this matters for your business
It matters because keyword search loses the question and semantic search keeps it. Traditional full-text search needs the exact words, so a customer typing “I cannot upload files larger than 5MB” finds nothing if the article is titled “file size limitations”. Semantic search embeds the question and the article, finds them close in meaning, and returns the right answer. Industry analysis reports support-ticket deflection of 50 to 70 per cent when implemented with discipline.
It also matters because retrieval-augmented generation runs on this layer. RAG pairs a vector database with a large language model: the system retrieves the most relevant passages from your data, hands them to the model alongside the question, and the model answers grounded in your actual policies rather than guessing. Without embeddings there is no retrieval. Without retrieval, deploying language models against your own knowledge base risks confident-sounding hallucinations, which is why this architecture has become standard.
Where you will meet embeddings in practice
You will meet them in three places. The first is internal knowledge bases. Embed twelve years of contracts, policies, and case notes, store the vectors, and ask questions in plain English. A new project manager can query “what is our onboarding timeline for mid-market clients” and get a sourced answer in seconds. The pattern slows institutional knowledge loss when senior people leave and speeds up onboarding for the people replacing them.
The second is customer-support semantic search. The system understands that “restart”, “reboot”, “reset”, and “reinitialise” all point to the same workflow, regardless of which one the customer typed. Industry data on properly implemented systems shows a 50 to 70 per cent reduction in support tickets, which for a 25-person team running at £15 to £21 per ticket pencils out to substantial annual savings. The third is product or service catalogue similarity, where “project management for distributed teams” surfaces semantically related offerings that a keyword filter would have missed entirely.
Each pattern needs three choices. An embedding model (OpenAI text-embedding-3-small is the SME default for cost and accuracy). A vector database (Pinecone for managed simplicity, Qdrant for cost control, pgvector if you already run PostgreSQL). And a chunking strategy, the rule for how documents get cut into searchable pieces, which directly affects retrieval quality and is its own design problem.
When to mitigate the lock-in versus accept it
Mitigate when the corpus is large, the data is sensitive, or the budget for re-embedding two years out would be genuinely uncomfortable. Three habits do the work. Store the original source text alongside the vectors. Tag every batch with the model that generated it. Budget for a periodic re-embedding pass when prices change or a meaningfully better model arrives. Cheap at indexing time, painful to retrofit later.
Accept the lock-in when the corpus is small (under a few hundred thousand documents), the use case is stable, and the cost of running a parallel system during migration is comparable to the engineering cost of designing for portability. For a 5,000-article support library on OpenAI text-embedding-3-small, a future re-embed is a one-day job and a few pounds in compute. For a 500,000-document legal archive, the same operation is the £8,000 surprise the practice in the opener absorbed, and the lost engineering weeks alongside it. The decision rule is to size the future re-embed in pounds and weeks before you choose the model, not after the bill arrives. That single calculation, done at procurement, would have changed the choice for the legal practice at the top of this post.
Related concepts you should know about
Retrieval-augmented generation is the architecture that uses embeddings to ground a language model in your own data. Most “AI over your documents” pitches in 2026 are RAG underneath, and the headline performance depends as much on the chunking and retrieval design as on the model choice. The full picture lives in the what is RAG post.
The vector database market in 2026 has consolidated around five platforms for SME work. Pinecone is fully managed and trades cost for operational simplicity at roughly £200 to £400 a month for 10 million vectors. Qdrant is open-source Rust and the cost-conscious choice at scale, from £25 a month managed or £500 to £1,000 self-hosted at 50 million vectors. Weaviate covers hybrid keyword-plus-semantic search. pgvector adds vector search to your existing PostgreSQL. Milvus, with managed Zilliz, handles the billion-vector range.
The Anthropic exception is worth knowing. Claude is a generation model, not an embedding model. If you build RAG on Claude, you pair it with a separate embedding API from OpenAI, Cohere, Voyage, or Google, and that decoupling has become standard production architecture in 2026. The flexibility is genuinely useful. Your generation provider and your embedding provider can be different vendors, which is one of the few cracks in the broader lock-in story and a useful question to put to anyone selling you a single-vendor stack.



