The managing director of a 40-staff accountancy firm sat across the table from me with three vendor quotes spread out in front of her. All three pitched the same problem, triaging incoming client emails, and all three used the phrase “NLP-powered AI” in their decks. The cheapest was a thirty-pound Microsoft 365 add-on with built-in classification. The middle option was a seven-hundred-pound Intercom Fin seat that ran a GPT-class model on every message. The third was a six-figure custom build pitched by a London consultancy promising a bespoke NLP engine.
She opened her own inbox while we talked. The bulk of the messages were routine, structured, and could be sorted by sender domain and a handful of keywords. A smaller slice were genuinely ambiguous and would benefit from a model that holds context. She had been quoted three different generations of NLP technology, at three very different prices, for the same task. None of the vendors had a commercial reason to tell her which generation she actually needed.
What is natural language processing?
Natural language processing, or NLP, is the set of computational techniques that let software read, classify, or generate human language. It is the umbrella term, not a product. It covers eight core tasks: classification, named entity recognition, sentiment analysis, summarisation, entity linking, information extraction, translation, and generation. Every business application you encounter, from spam filters to chatbots to contract review, is one or more of these tasks wired together.
The field has been shipping in commercial software since the 1990s. What changed between 2017 and 2023 was the arrival of transformer architecture and then large language models. Those advances did not replace older NLP. They added a new generation that sits alongside the previous ones. Four generations now coexist in production: rule-based systems, statistical machine learning, BERT-style transformers, and the LLM family that includes GPT, Claude, and Gemini. Each has its own cost, accuracy, and explainability profile.
The mistake vendors quietly rely on is the assumption that the newest generation is always the right one. It rarely is.
Why does it matter for your business?
It matters because vendor pricing for the same NLP task can vary by two orders of magnitude depending on which generation sits underneath. A rule-based filter classifying spam costs fractions of a penny per email. A statistical sentiment endpoint on AWS Comprehend or Azure Text Analytics runs at roughly fifty to five hundred pounds a month for typical small-business volumes. An LLM running on every message charges per token and climbs fast.
The cost is not the only axis. Rule-based and statistical systems are fast and explainable, which matters when an auditor or the ICO asks how a decision was made. BERT-style transformers are stronger on messy or domain-specific text but need training data. LLMs handle novel and open-ended language well but are opaque, hallucinate at non-trivial rates, and require human review on anything consequential. A vendor who says “our NLP” without naming the generation is hiding the trade-off that determines what you actually pay and what you can actually defend.
The owner’s question is not “should we use AI?” It is “which generation of NLP fits this specific task, given my accuracy needs, my data privacy constraints, and my budget?”
Where will you actually meet it?
You meet NLP every day, usually without knowing it. The spam filter on your email runs rule-based and statistical NLP. The transcription in Teams or Zoom is neural NLP turning speech into text. Your accounting software’s invoice OCR runs NLP on the extracted text to pull line items into structured fields. None of these are sold to you as “AI”. They are sold as features.
You also meet NLP in vendor sales conversations, dressed up as sophistication. The categories you will see most often in 2026 are customer support automation (Intercom Fin, Zendesk, Salesforce Einstein), contract review (LawGeex, Kira Systems), invoice and document processing (Rossum and similar), social listening, lead qualification, and meeting transcription. Each of these is a wrapper around one or more of the eight core NLP tasks, sitting on one of the four generations.
The most useful place to meet the term is at the procurement stage, where naming the underlying generation puts you back in control of the conversation. Ask which generation, what the per-transaction cost is at your volume, and what the vendor does when the underlying model gets deprecated. The answer pattern tells you who built the wrapper themselves and who is reselling someone else’s API with a margin on top.
When to ask, when to ignore
Ask hard questions when the NLP touches money, regulated decisions, customer-facing communication, or anything an auditor wants to inspect. Under UK GDPR Article 22, the ICO requires automated decisions with legal or significant effects on individuals to be transparent, subject to human review, and explainable. The FCA expects regulated firms to document algorithmic accountability. If your NLP feeds any of these, which generation, what training data, and what audit trail stop being optional.
Ignore the term when the work is low-stakes and easily checked. A staffer using transcription in Teams to capture meeting notes does not need to interrogate the model. The question worth asking is whether the output is good enough at the price, not which generation produced it. The same goes for spam filtering, autosuggest, and routine document search. The vendor’s branding around NLP is irrelevant if the feature works.
There is one trap worth naming. “NLP” is sometimes used by vendors to imply that a product can solve any text problem because it is built on a language model. It cannot. Smaller fine-tuned models often beat general-purpose LLMs on narrow classification tasks for accuracy, speed, and cost. Modern NLP also needs training data, hundreds to thousands of in-domain examples, to reach genuine accuracy on your terminology. A vendor promising overnight deployment with no data preparation is overselling.
Related concepts
A large language model is the newest and largest generation of NLP. Every LLM is an NLP system. Not every NLP system is an LLM, and treating the two as synonyms is the central confusion this post is built to clear up.
A foundation model is the broader category of pre-trained models that includes LLMs, vision models, and multimodal systems. The transformer architecture underpins most modern foundation models, including the BERT-style classifiers your contract review tool probably uses.
Prompt engineering is the technique for steering LLM-based NLP towards useful outputs. Retrieval-augmented generation is how vendors ground LLM output in your own documents to reduce hallucination. Fine-tuning is how a smaller transformer is taught your specific terminology, often the cheaper and more accurate route for narrow tasks.
Conversational AI is one application surface of NLP rather than a separate technology, the same way email triage and invoice extraction are application surfaces. Structured output is the technique vendors use to make NLP results reliable enough to feed into your accounting or CRM systems without manual review.
The point of carrying this vocabulary into your next vendor meeting is to ask which generation of NLP they are charging you for, and to recognise when the answer makes the price reasonable and when it makes the price absurd. Sounding technical is not the goal. Pricing the work properly is.



