Tagging and metadata: the SME search problem nobody named

A small business owner at a desk reviewing a printed sheet of tag categories alongside a CRM contact list on a laptop
TL;DR

Many SMEs cannot retrieve three years of work for a specific industry segment because nothing was consistently tagged across the CRM, the file system, and the shared drive. The proportionate fix is a small shared tagging discipline, ten to fifteen tags across four or five categories, agreed once, applied where the high-value retrieval happens, and reviewed quarterly. AI semantic search forgives some missing tags but still rewards a small shared vocabulary.

Key takeaways

- Every SME has a metadata problem and almost none has named it as one. The systems are there, the shared vocabulary is not. - A proportionate scheme runs to ten to fifteen tags total across the business, grouped into four or five categories, each owned by one person, reviewed quarterly. - Tags live in different places depending on the system: client segment and project status in the CRM, project type and year in folder structure and file names, confidentiality in document management tools. - Semantic search forgives some missed tags but amplifies the cost of having none. AI tools work better when given a small structured frame to rank by. - Implementation is one planning conversation to define categories, one owner per category, and fifteen minutes per quarter to audit value creep. No new software required.

An owner of a services firm spent a Tuesday afternoon trying to retrieve every piece of work the firm had done for automotive manufacturers over the previous three years. The CRM had a “sector” field on some accounts but not others. The shared drive held some projects in folders named by client, some by year, some by neither. A few proposals had the word “automotive” in the file name. A few mentioned it only in the meeting notes that lived in someone’s inbox. By the end of the afternoon she had reconstructed perhaps two-thirds of the work, and she had no way to know what she had missed.

She did not have a technology problem. She had a metadata problem and nobody had ever named it as one.

What is metadata, in plain English?

Metadata is data about data. In a book it is the title, author, chapter headings, and page numbers that make the text findable. In a business it is the contextual information attached to a document, customer record, or project that lets it be retrieved by category later. Descriptive metadata answers “what is this”, administrative metadata covers permissions and retention. For a small firm those two are where the payback sits.

The reason a typical SME ends up with a search problem is not that the tools are deficient. A CRM has custom fields. A shared drive supports folders. A document management tool enforces controlled vocabularies. Every one of these systems could carry consistent metadata. None of them does, because the business never collectively agreed what a “segment” field should contain, or what values would appear in a “project type” tag, or whether the client industry lived in the folder name, the file name, or the CRM field. The infrastructure is present. The agreement is not.

Why does this matter for your business?

Employees in firms without a shared tagging discipline spend an average of 1.8 hours per working day searching for information, roughly a quarter of staff cost going on retrieval friction. IDC research puts the productivity loss from document-related problems at around 21 per cent per information worker. For a ten-person firm that is several thousand pounds a year in direct waste, before counting the slower decisions and the recreated work.

The deeper cost is invisibility. The owner who cannot retrieve three years of automotive work does not just lose the afternoon. She walks into the next sector conversation without the precedent she actually built, gives the prospect a generic answer, and competes on price against a larger firm that can pull its precedent in thirty seconds. Tagging discipline is the difference between a firm whose history is a usable asset and a firm whose history sits locked in the heads of people who used to work there.

Where do tags actually live in a small business?

Tags live distributed across the systems the firm already uses, each playing a different retrieval role. The CRM is where client segment and project status sit, because mainstream platforms allow custom fields restricted to a fixed value list. File names carry year, client, and project type in a format like 2026_ClientName_ProjectType.docx. The shared drive folder hierarchy embeds project type and year in the folder structure, which makes browsing as fast as searching.

For firms running Microsoft 365 or a dedicated document management tool, controlled vocabularies can be enforced at the point of upload. SharePoint’s managed metadata feature lets an administrator define term sets, then attach them as required columns on document libraries. This is the most rigorous approach and the highest process overhead, which is why it is right for firms already running the tool and wrong for firms who would have to buy it for this reason alone.

Email and meeting notes are the hardest tier. A typical small firm runs Gmail or Outlook, neither of which enforces consistent tagging by default. The practical workaround is to adopt a single convention for client-related threads (client name or project code in the subject line) and use a small number of labels or categories to mark threads as “client-facing”, “proposal”, or “completed”. For meeting notes, storing them in a central knowledge base with a consistent template and tagging the stored notes is more reliable than trying to enforce tagging during the meeting itself.

When does light tagging pay back and when does heavy tagging fail?

Light tagging pays back when the scheme runs to ten to fifteen tags total across four or five categories, each owned by one person, with values agreed once and reviewed quarterly. A worked example for a services firm: five client segments, five project types, five statuses, four confidentiality levels, and five years. Around twenty-four values across five categories. Every active client record carries at minimum a segment, a project type, and a status.

Heavy tagging fails for predictable reasons. Tagging every document with the initials of who worked on it generates dozens of values that shift every time staff change. Tagging by specific task name rather than broader category creates an unbounded list nobody can choose from consistently. Retroactively tagging the historical archive runs out of momentum before completion and leaves the firm with a half-tagged corpus that is worse than no tagging at all because users cannot tell which records were reviewed and which were skipped. The principle that holds is: tag forward on the materials that matter, accept that some documents will not be tagged, and let semantic search forgive the gaps.

The quarterly review cadence is what stops the scheme from drifting. Once a quarter the owner of each category audits what new values have crept in, whether existing values are still used, and whether any merging is needed. Fifteen minutes per category, four times a year. Without this discipline, tag lists grow organically until they become unusable. With it, the scheme stays small and survives staff turnover.

How does AI change the calculation?

Semantic search, the kind AI tools now bundle by default, understands meaning rather than just matching keywords. A search for “automotive projects from 2024” can retrieve results from a document tagged “segment: manufacturing” even if the word “automotive” never appears, because the system infers the relationship. A firm where one invoice is tagged “segment: finance” and another is not will often see both retrieved when a user searches for finance-sector work.

It does not eliminate the need for tagging. Semantic search has well-documented failure modes on exact and temporal queries: retrieving every completed project in Q3 2024 requires precise categorisation and date filtering, which semantic similarity does not reliably provide. Research on AI-driven metadata standardisation found that combining large language models with structured metadata templates improved recall from around eighteen per cent baseline to over sixty per cent. The AI is far more useful when given a small structured frame to rank within.

The practical shift is that a firm can tolerate a lighter scheme than a decade ago and still get good retrieval. Five categories of ten to fifteen tags, applied where the high-value retrieval happens, with semantic search filling in the rest. That is the proportionate fix. No new software, no retroactive tagging marathon, and a half-day planning conversation to define the categories. The owner who spent her Tuesday afternoon reconstructing three years of automotive work would have spent eight minutes if the discipline had been agreed five years earlier.

If you want help defining a proportionate tagging scheme for your firm, book a conversation.

Sources

- BizTech Magazine (2026). What SMBs Should Know About Metadata Management. Explainer covering the three main metadata types (descriptive, administrative, structural) and where each pays back in small business retrieval. https://biztechmagazine.com/article/2026/01/what-smbs-should-know-about-metadata-management - Glean (2024). How to Implement Effective Tagging Strategies for Enterprise Data. Research finding that small strict tagging schemas outperform large expressive ones because enforcement becomes impossible at scale. https://www.glean.com/perspectives/how-to-implement-effective-tagging-strategies-for-enterprise-data - Wingenious AI (2024). Automate Document Tagging for SMEs. Industry analysis showing that between seventy and eighty per cent of organisational information in SMEs remains unstructured, scattered across email, scans, and bulk imports. https://www.wingenious.ai/blog-posts/automate-document-tagging-smes - Salesforce Help (2024). Records Overview and Custom Fields. Technical documentation on configuring custom fields with restricted value lists in the CRM, the highest-yield location for SME tagging discipline. https://help.salesforce.com/s/articleView?id=xcloud.basics_topics_records_overview.htm - Microsoft Learn (2024). Managed Metadata in SharePoint. Technical guide to term sets and controlled vocabulary columns in document libraries, the most rigorous tagging approach for businesses already on Microsoft 365. https://learn.microsoft.com/en-us/sharepoint/managed-metadata - Harvard Medical School Data Management (2024). File Naming Conventions. Practical guidance on embedding metadata in file names through consistent format (date, identifier, category, document type) for searchable file listings. https://datamanagement.hms.harvard.edu/plan-design/file-naming-conventions - Google Workspace Support (2024). Best Practices for Shared Drives. Documentation recommending folder hierarchy as structural metadata, with status indicators in folder names like [Archive] or [In Progress]. https://support.google.com/a/users/answer/13015138 - Milvus AI Reference (2024). Common Failure Modes in Semantic Search Systems. Technical analysis of where semantic search breaks down, particularly on exact temporal queries and domain-specific terminology that does not match training data. https://milvus.io/ai-quick-reference/what-are-common-failure-modes-in-semantic-search-systems - GigaScience (2024). AI-Driven Metadata Standardisation. Peer-reviewed research finding that combining large language models with structured metadata templates improved dataset recall from 17.65 per cent baseline to 62.87 per cent. https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giag019/8504119 - Virtual Cabinet (2024). File Naming Conventions for Businesses. Practitioner guide to consistent naming formats that put metadata in front of the user without requiring them to open files. https://blog.virtualcabinet.com/file-naming-conventions-for-businesses

Frequently asked questions

How many tags does a small business actually need?

Ten to fifteen total across four or five categories. A typical services firm runs five values for client segment, five for project type, five for status, four for confidentiality, and five for year. Research on enterprise tagging strategies finds that small strict schemas beat large expressive ones, because users cannot consistently choose between dozens of options and enforcement becomes impossible. Aim for fewer than twenty-five tag values in total across the entire business.

Do we still need consistent tagging if our AI tools do semantic search?

Yes, but less of it. Semantic search forgives some missed tags because the system understands meaning rather than just keywords. It still fails on exact and temporal queries, like retrieving every finance-sector project from Q3 2024. A small shared vocabulary on the categories you actually search by gives the AI a frame to rank within, which improves accuracy and cuts irrelevant results.

Should we go back and retroactively tag everything we already have?

No. Tag forward on the materials that matter (active client records, current proposals, delivery documents) and let the historical archive sit untagged unless a specific search need surfaces. Retroactive tagging is the project that kills proportionate metadata initiatives because it generates weeks of low-value administrative work and runs out of momentum before completion.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation