An owner of a services firm spent a Tuesday afternoon trying to retrieve every piece of work the firm had done for automotive manufacturers over the previous three years. The CRM had a “sector” field on some accounts but not others. The shared drive held some projects in folders named by client, some by year, some by neither. A few proposals had the word “automotive” in the file name. A few mentioned it only in the meeting notes that lived in someone’s inbox. By the end of the afternoon she had reconstructed perhaps two-thirds of the work, and she had no way to know what she had missed.
She did not have a technology problem. She had a metadata problem and nobody had ever named it as one.
What is metadata, in plain English?
Metadata is data about data. In a book it is the title, author, chapter headings, and page numbers that make the text findable. In a business it is the contextual information attached to a document, customer record, or project that lets it be retrieved by category later. Descriptive metadata answers “what is this”, administrative metadata covers permissions and retention. For a small firm those two are where the payback sits.
The reason a typical SME ends up with a search problem is not that the tools are deficient. A CRM has custom fields. A shared drive supports folders. A document management tool enforces controlled vocabularies. Every one of these systems could carry consistent metadata. None of them does, because the business never collectively agreed what a “segment” field should contain, or what values would appear in a “project type” tag, or whether the client industry lived in the folder name, the file name, or the CRM field. The infrastructure is present. The agreement is not.
Why does this matter for your business?
Employees in firms without a shared tagging discipline spend an average of 1.8 hours per working day searching for information, roughly a quarter of staff cost going on retrieval friction. IDC research puts the productivity loss from document-related problems at around 21 per cent per information worker. For a ten-person firm that is several thousand pounds a year in direct waste, before counting the slower decisions and the recreated work.
The deeper cost is invisibility. The owner who cannot retrieve three years of automotive work does not just lose the afternoon. She walks into the next sector conversation without the precedent she actually built, gives the prospect a generic answer, and competes on price against a larger firm that can pull its precedent in thirty seconds. Tagging discipline is the difference between a firm whose history is a usable asset and a firm whose history sits locked in the heads of people who used to work there.
Where do tags actually live in a small business?
Tags live distributed across the systems the firm already uses, each playing a different retrieval role. The CRM is where client segment and project status sit, because mainstream platforms allow custom fields restricted to a fixed value list. File names carry year, client, and project type in a format like 2026_ClientName_ProjectType.docx. The shared drive folder hierarchy embeds project type and year in the folder structure, which makes browsing as fast as searching.
For firms running Microsoft 365 or a dedicated document management tool, controlled vocabularies can be enforced at the point of upload. SharePoint’s managed metadata feature lets an administrator define term sets, then attach them as required columns on document libraries. This is the most rigorous approach and the highest process overhead, which is why it is right for firms already running the tool and wrong for firms who would have to buy it for this reason alone.
Email and meeting notes are the hardest tier. A typical small firm runs Gmail or Outlook, neither of which enforces consistent tagging by default. The practical workaround is to adopt a single convention for client-related threads (client name or project code in the subject line) and use a small number of labels or categories to mark threads as “client-facing”, “proposal”, or “completed”. For meeting notes, storing them in a central knowledge base with a consistent template and tagging the stored notes is more reliable than trying to enforce tagging during the meeting itself.
When does light tagging pay back and when does heavy tagging fail?
Light tagging pays back when the scheme runs to ten to fifteen tags total across four or five categories, each owned by one person, with values agreed once and reviewed quarterly. A worked example for a services firm: five client segments, five project types, five statuses, four confidentiality levels, and five years. Around twenty-four values across five categories. Every active client record carries at minimum a segment, a project type, and a status.
Heavy tagging fails for predictable reasons. Tagging every document with the initials of who worked on it generates dozens of values that shift every time staff change. Tagging by specific task name rather than broader category creates an unbounded list nobody can choose from consistently. Retroactively tagging the historical archive runs out of momentum before completion and leaves the firm with a half-tagged corpus that is worse than no tagging at all because users cannot tell which records were reviewed and which were skipped. The principle that holds is: tag forward on the materials that matter, accept that some documents will not be tagged, and let semantic search forgive the gaps.
The quarterly review cadence is what stops the scheme from drifting. Once a quarter the owner of each category audits what new values have crept in, whether existing values are still used, and whether any merging is needed. Fifteen minutes per category, four times a year. Without this discipline, tag lists grow organically until they become unusable. With it, the scheme stays small and survives staff turnover.
How does AI change the calculation?
Semantic search, the kind AI tools now bundle by default, understands meaning rather than just matching keywords. A search for “automotive projects from 2024” can retrieve results from a document tagged “segment: manufacturing” even if the word “automotive” never appears, because the system infers the relationship. A firm where one invoice is tagged “segment: finance” and another is not will often see both retrieved when a user searches for finance-sector work.
It does not eliminate the need for tagging. Semantic search has well-documented failure modes on exact and temporal queries: retrieving every completed project in Q3 2024 requires precise categorisation and date filtering, which semantic similarity does not reliably provide. Research on AI-driven metadata standardisation found that combining large language models with structured metadata templates improved recall from around eighteen per cent baseline to over sixty per cent. The AI is far more useful when given a small structured frame to rank within.
The practical shift is that a firm can tolerate a lighter scheme than a decade ago and still get good retrieval. Five categories of ten to fifteen tags, applied where the high-value retrieval happens, with semantic search filling in the rest. That is the proportionate fix. No new software, no retroactive tagging marathon, and a half-day planning conversation to define the categories. The owner who spent her Tuesday afternoon reconstructing three years of automotive work would have spent eight minutes if the discipline had been agreed five years earlier.
If you want help defining a proportionate tagging scheme for your firm, book a conversation.



