What is an embedding (and a vector database)? The lock-in nobody named

A person at a desk in a small office reviewing a search interface on a laptop next to an open bound contract
TL;DR

An embedding is a numerical representation of meaning, a list of hundreds or thousands of decimal numbers where similar concepts cluster in mathematical space. A vector database stores and searches these embeddings at scale and powers semantic search, retrieval-augmented generation, and similarity matching. The technology is mature and the steady-state cost is small. The structural catch is that embeddings from different vendors live in different mathematical spaces, so switching providers later commonly costs five-figure sums in re-embedding compute and engineering time.

Key takeaways

- An embedding is a list of 768 to 3,072 decimal numbers that captures the meaning of a piece of text. Similar concepts land near each other in that mathematical space, which is what makes semantic search work. - Vector databases store embeddings and return the closest matches in milliseconds. Pinecone, Qdrant, Weaviate, pgvector, and Milvus cover typical SME workloads at £25 to £400 a month for common scales. - Embeddings from different vendors are not interchangeable. OpenAI vectors cannot be queried with Cohere or Voyage embeddings, so a provider switch means re-embedding everything you previously indexed. - Indexing is cheap, around £1 to £6 to embed 100,000 documents. Querying scales with traffic and totals £18 to £300 a month at typical SME volumes. Vector database hosting adds £25 to £400 on top. - Anthropic does not offer an embedding model. If you build retrieval-augmented generation with Claude, you pair it with a separate embedding API from OpenAI, Cohere, Voyage, or Google, which is now standard production architecture.

An 18-staff legal practice builds a retrieval system over twelve years of client matters using OpenAI’s text-embedding-3-small. The system works. Lawyers ask in plain English and the system returns the relevant precedents, contract clauses, and case notes from the firm’s own archive. Twelve months later, on data-residency grounds, the firm decides to move its primary AI provider to Claude. The legal team assumes the vector database transfers across. It does not. The embedding vector space is incompatible with anything else.

Re-embedding the corpus costs around £8,000 in compute and three weeks of engineering time to redesign the retrieval pipeline. The technology had worked. The economics of switching had not been priced. The owner recognises, on the way out, what was never named on the way in: the embedding choice was a vendor lock-in vector that nobody flagged at procurement, and the bill was real.

What is an embedding, and what is a vector database?

An embedding is a list of numbers, typically 768 to 3,072 decimal values, that represents the meaning of a piece of text, an image, or audio. Each number captures one dimension of meaning. Similar concepts land near each other in that mathematical space, so “blue sofa” and “azure couch” cluster together, while “blue sofa” and “tax return” sit nowhere near. Closeness is measured with cosine similarity, and that geometry is what powers semantic search.

A vector database stores those embeddings and returns the closest matches to a query vector in milliseconds. Traditional relational databases like PostgreSQL or MySQL cannot search millions of high-dimensional vectors efficiently, so vector databases use indexing algorithms (HNSW, IVF) to make the search feasible at scale. The pair sits underneath every modern AI system that searches your own data.

Why this matters for your business

It matters because keyword search loses the question and semantic search keeps it. Traditional full-text search needs the exact words, so a customer typing “I cannot upload files larger than 5MB” finds nothing if the article is titled “file size limitations”. Semantic search embeds the question and the article, finds them close in meaning, and returns the right answer. Industry analysis reports support-ticket deflection of 50 to 70 per cent when implemented with discipline.

It also matters because retrieval-augmented generation runs on this layer. RAG pairs a vector database with a large language model: the system retrieves the most relevant passages from your data, hands them to the model alongside the question, and the model answers grounded in your actual policies rather than guessing. Without embeddings there is no retrieval. Without retrieval, deploying language models against your own knowledge base risks confident-sounding hallucinations, which is why this architecture has become standard.

Where you will meet embeddings in practice

You will meet them in three places. The first is internal knowledge bases. Embed twelve years of contracts, policies, and case notes, store the vectors, and ask questions in plain English. A new project manager can query “what is our onboarding timeline for mid-market clients” and get a sourced answer in seconds. The pattern slows institutional knowledge loss when senior people leave and speeds up onboarding for the people replacing them.

The second is customer-support semantic search. The system understands that “restart”, “reboot”, “reset”, and “reinitialise” all point to the same workflow, regardless of which one the customer typed. Industry data on properly implemented systems shows a 50 to 70 per cent reduction in support tickets, which for a 25-person team running at £15 to £21 per ticket pencils out to substantial annual savings. The third is product or service catalogue similarity, where “project management for distributed teams” surfaces semantically related offerings that a keyword filter would have missed entirely.

Each pattern needs three choices. An embedding model (OpenAI text-embedding-3-small is the SME default for cost and accuracy). A vector database (Pinecone for managed simplicity, Qdrant for cost control, pgvector if you already run PostgreSQL). And a chunking strategy, the rule for how documents get cut into searchable pieces, which directly affects retrieval quality and is its own design problem.

When to mitigate the lock-in versus accept it

Mitigate when the corpus is large, the data is sensitive, or the budget for re-embedding two years out would be genuinely uncomfortable. Three habits do the work. Store the original source text alongside the vectors. Tag every batch with the model that generated it. Budget for a periodic re-embedding pass when prices change or a meaningfully better model arrives. Cheap at indexing time, painful to retrofit later.

Accept the lock-in when the corpus is small (under a few hundred thousand documents), the use case is stable, and the cost of running a parallel system during migration is comparable to the engineering cost of designing for portability. For a 5,000-article support library on OpenAI text-embedding-3-small, a future re-embed is a one-day job and a few pounds in compute. For a 500,000-document legal archive, the same operation is the £8,000 surprise the practice in the opener absorbed, and the lost engineering weeks alongside it. The decision rule is to size the future re-embed in pounds and weeks before you choose the model, not after the bill arrives. That single calculation, done at procurement, would have changed the choice for the legal practice at the top of this post.

Retrieval-augmented generation is the architecture that uses embeddings to ground a language model in your own data. Most “AI over your documents” pitches in 2026 are RAG underneath, and the headline performance depends as much on the chunking and retrieval design as on the model choice. The full picture lives in the what is RAG post.

The vector database market in 2026 has consolidated around five platforms for SME work. Pinecone is fully managed and trades cost for operational simplicity at roughly £200 to £400 a month for 10 million vectors. Qdrant is open-source Rust and the cost-conscious choice at scale, from £25 a month managed or £500 to £1,000 self-hosted at 50 million vectors. Weaviate covers hybrid keyword-plus-semantic search. pgvector adds vector search to your existing PostgreSQL. Milvus, with managed Zilliz, handles the billion-vector range.

The Anthropic exception is worth knowing. Claude is a generation model, not an embedding model. If you build RAG on Claude, you pair it with a separate embedding API from OpenAI, Cohere, Voyage, or Google, and that decoupling has become standard production architecture in 2026. The flexibility is genuinely useful. Your generation provider and your embedding provider can be different vendors, which is one of the few cracks in the broader lock-in story and a useful question to put to anyone selling you a single-vendor stack.

Sources

IBM (2024). What are vector embeddings? The canonical introduction to embeddings as numerical representations of meaning. https://www.ibm.com/think/topics/vector-embedding OpenAI (2024). Text-embedding-3-large model card and pricing, the default reference for the OpenAI embedding family used in this post. https://developers.openai.com/api/docs/models/text-embedding-3-large Cohere (2025). Embed v4 pricing and dimensionality, the comparison point for OpenAI in commercial embedding pricing. https://cohere.com/pricing Voyage AI (2026). Voyage 4 release post, the source for shared embedding spaces and mixture-of-experts architecture in the embedding layer. https://blog.voyageai.com/2026/01/15/voyage-4/ AWS (2024). What is retrieval-augmented generation? The vendor reference used for the RAG pattern that pairs embeddings with a generation model. https://aws.amazon.com/what-is/retrieval-augmented-generation/ ITNext (2024). Vendor lock-in in the embedding layer, a migration story. The named case study behind the lock-in framing in this post. https://itnext.io/vendor-lock-in-in-the-embedding-layer-a-migration-story-183ea58e3668 TensorBlue (2025). Vector database comparison: Pinecone, Weaviate, Qdrant, Milvus. The benchmark and cost source for the five-platform market summary. https://tensorblue.com/blog/vector-database-comparison-pinecone-weaviate-qdrant-milvus-2025 Pinecone (2024). An opinionated checklist to choose a vector database. The procurement checklist used as the spine of the vendor-question section. https://www.pinecone.io/learn/an-opinionated-checklist-to-choose-a-vector-database/ Google Developers (2024). Embedding space in machine learning. The reference for cosine similarity and the geometry of embedding vectors. https://developers.google.com/machine-learning/crash-course/embeddings/embedding-space Milvus (2024). How to version and manage changes in embedding models. The source for embedding-model lifecycle and versioning practice. https://milvus.io/ai-quick-reference/how-do-you-version-and-manage-changes-in-embedding-models

Frequently asked questions

Why does switching embedding providers cost so much?

Each embedding model is trained differently and produces vectors in its own mathematical space. The numbers from OpenAI and the numbers from Cohere are not comparable, so a query embedded with one model cannot meaningfully search a corpus embedded with the other. Switching means re-embedding every document you previously indexed. For 500,000 documents that is real compute and several weeks of engineering time on retrieval pipelines.

Do I need a separate vector database, or can I use what I already have?

For tens of millions of vectors and an existing PostgreSQL deployment, the pgvector extension turns your current database into a vector store and removes the need for a second system. For larger scales or higher query volumes, a purpose-built vector database performs better. Pinecone trades cost for simplicity, Qdrant trades operational work for cost control, Weaviate covers hybrid keyword-plus-semantic search, and Milvus handles billion-vector deployments.

How do I avoid embedding lock-in without rebuilding everything?

Three habits keep your options open. Store the original source text alongside every vector so you can re-embed if you change models. Tag every batch with which embedding model produced it, so a partial migration is traceable. Budget for a periodic re-embedding pass as new models or prices appear. None of these is expensive at the start. All of them are painful to retrofit two years in.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation