How to keep a data dictionary accurate and current

Three years ago, a consulting practice with 14 staff switched its CRM. The migration was clean enough. Client records came across, the old system was decommissioned, and the team moved on. What did not transfer was the meaning of the fields. Within six months, “engagement status” meant something different in the CRM than it did in the practice management tool, and something different again in the BI dashboard the director used to run quarterly reviews. By the time someone noticed, the reporting had been wrong for two quarters.

They had a data dictionary from the migration project. Nobody had updated it.

What is a data dictionary?

A data dictionary is a catalogue of your key data fields: their names, definitions, formats, allowed values, owning system, and who is responsible for them. For a services firm with 5 to 50 staff, the critical set typically covers client identifiers, engagement status, fees, utilisation, work in progress, and personal data fields that need documenting for GDPR compliance.

A complete entry does not need to be elaborate. The minimum useful record contains the human-readable field name, the technical name as it appears in the system, a plain-English definition, the data type and format (for example, date as YYYY-MM-DD), valid values or business rules (such as “Engagement_Status can only be Active, On Hold, or Closed”), the system where the field lives, the named owner, and when it was last reviewed. One row in a Google Sheet or a Notion page per field is enough to start with.

For a services firm at this scale, 20 to 50 entries covers the fields that actually drive business decisions. Trying to document every field in every system is the wrong instinct. The right instinct is to work back from the business questions you cannot currently answer reliably, then document the fields those answers depend on.

Why does it matter for your business?

A data dictionary without an owner goes stale within months, and the cost is predictable. Management reports require manual reconciliation before every board meeting, GDPR records become hard to defend, and AI tools run on input data nobody has formally defined. The fields your business uses to make decisions are worth protecting with a single source of meaning.

The ICO’s accountability framework expects organisations to understand what personal data they hold, where it is stored, and how it is used. A data dictionary that maps personal data fields to their categories, purposes, and lawful bases gives you a structured index for your Article 30 records of processing activities. That means faster preparation time and fewer gaps when the ICO asks questions.

The Cabinet Office’s Digital, Data and Technology Playbook makes the same point for reporting and analytics. Agreed definitions should be embedded into the systems and products people use daily, not left as standalone documentation that nobody reads. For an owner-managed business, the practical version of that is simple. When “revenue” means the same thing in Xero, your CRM, and your BI tool, your management reports stop requiring manual reconciliation before every quarterly review.

Where will you actually meet it in your business?

Three contexts make a data dictionary earn its keep in a services firm. Dashboards and management reporting come first. Every KPI in Power BI or Looker Studio should link to a dictionary entry, so anyone querying a number can see exactly how it is calculated and which fields feed it. GDPR records of processing come second. AI tools come third, and this is where the stakes are rising fastest.

For GDPR, the ICO expects a record of processing activities that documents what personal data you hold, why you hold it, who can access it, and how long you keep it. A dictionary that already maps each personal data field to its category, purpose, and lawful basis means you are not rebuilding this information from scratch each time you review your compliance position. The entries already exist; you are referencing them rather than recreating them.

For AI tools, the NCSC’s guidance on managing AI security risks advises organisations to understand data lineage and quality for AI training and inference, and to maintain documentation of data flows. If your team uses a language model to draft proposals from CRM data, or runs AI-generated summaries of client notes, a maintained dictionary is where you document which fields feed those processes. The ICO’s AI and data protection risk toolkit makes the same point for any AI application that touches personal data.

When should you update it, and when can you leave it alone?

A quarterly review of your 20 to 50 key fields is the right cadence for a services firm running one system per domain. Set aside one hour per quarter with whoever owns each data domain. The review asks three questions each time: are all definitions still accurate, are there new fields to add, and are there obsolete entries to remove? That is four hours a year to keep your reporting reconciliation-free.

Unscheduled updates should happen whenever a source system changes. A new field added to the CRM, a modified calculation on the management dashboard, or a new AI prompt that draws on client records are each a trigger for an immediate update. The Acceldata guidance on data dictionaries recommends a straightforward change-control loop. The request is logged, a steward approves it, and the dictionary is updated before the change goes live rather than after. That sequence prevents the divergence that happens when systems change and documentation follows weeks later, if at all.

Ownership per domain makes the whole cycle work. A working structure for a 5 to 50 person services firm is one named data owner per domain: clients, finance, and people. Three people, three domains, a shared calendar event for the quarterly review. The DAMA UK Data Management Body of Knowledge identifies ownership as the foundational control for data accuracy. Without a named steward, no one notices when a definition becomes outdated, because no one is looking.

What sits alongside a data dictionary in a working data setup?

Three things determine whether a data dictionary stays a live tool over time. The first is a direct connection to real work. The dictionary should be the answer to “how is that number calculated?”, not a document people remember only during compliance reviews. Named ownership per domain is the second. A scheduled review date that people actually keep is the third.

Two concepts sit alongside the dictionary in a complete working setup. An information asset register maps what personal data you hold and where it sits across your systems, aligned with ICO accountability requirements. Documentation of AI data flows records which datasets and fields feed each model or prompt your team uses, an obligation that the EU AI Act is beginning to formalise for organisations serving EU clients or deploying higher-risk AI applications. Neither of these is a separate project from the dictionary. A well-maintained dictionary, extended to include personal data categories and AI data sources, spans all three.

For tooling, a structured Google Sheet or a Notion table is sufficient for a firm with one to three systems. Enterprise catalogue platforms such as Alation and Dataedo automate metadata extraction from databases and SaaS tools, but they are sized for larger organisations and are not the right starting point at this scale. Leadership Services’ Business Leaders Playbook for Data recommends Confluence, Notion, SharePoint, or even a structured spreadsheet as practical first-line options for owner-managed businesses.

The NHS Digital experience confirms that inconsistent field definitions undermine management reporting regardless of organisation size. A working dictionary earns its place by being current, owned, and tied to the numbers people actually use.

The quickest starting point is your most recent reconciliation problem. Trace it back to the field that caused it. That field is your first dictionary entry. Build the 20 that matter around it, name the three domain owners, set a review date, and link each KPI in your dashboard to its entry before the next quarterly review.

How to keep a data dictionary accurate, useful, and current

Key takeaways

What is a data dictionary?

Why does it matter for your business?

Where will you actually meet it in your business?

When should you update it, and when can you leave it alone?

What sits alongside a data dictionary in a working data setup?

Sources

Frequently asked questions

How many fields should a data dictionary cover for a small services firm?

How often should a data dictionary be updated?

Can a data dictionary satisfy GDPR Article 30 requirements?

Ready to talk it through?

If any of this sounds familiar, let's talk.

How to keep a data dictionary accurate, useful, and current

Key takeaways

What is a data dictionary?

Why does it matter for your business?

Where will you actually meet it in your business?

When should you update it, and when can you leave it alone?

What sits alongside a data dictionary in a working data setup?

Sources

Frequently asked questions

How many fields should a data dictionary cover for a small services firm?

How often should a data dictionary be updated?

Can a data dictionary satisfy GDPR Article 30 requirements?

Ready to talk it through?

Related reading

Find the shadow AI in your agency before a client's data leaks through it

A four-tier data map so your team knows what AI can touch

Capture the shop-floor knowledge before it retires

If any of this sounds familiar, let's talk.