How to keep a data dictionary accurate, useful, and current

Person reviewing a structured spreadsheet on a laptop at a well-lit desk near a window
TL;DR

A data dictionary catalogues your key data fields, their definitions, and who owns them across your systems. For owner-managed services firms, the practical scope is 20 to 50 critical fields, reviewed quarterly with named domain owners. It only stays useful if it links directly to your management dashboards, your GDPR records of processing, and the AI tools your team uses.

Key takeaways

- A data dictionary catalogues your key data fields, their definitions, formats, allowed values, and who owns them across each system your business uses. - The most common failure mode is absent ownership: the dictionary is created once, then left to diverge from reality as systems and metrics change. - For a services firm with 5 to 50 staff, 20 to 50 critical fields with one named owner per data domain is a sufficient starting structure. - Link every KPI in your management dashboards to a dictionary entry, and document which data fields feed any AI tools your team uses. - A one-hour quarterly review with your data domain owners is enough to catch definition drift before it reaches your management reporting.

Three years ago, a consulting practice with 14 staff switched its CRM. The migration was clean enough: client records came across, the old system was decommissioned, and the team moved on. What did not transfer was the meaning of the fields. Within six months, “engagement status” meant something different in the CRM than it did in the practice management tool, and something different again in the BI dashboard the director used to run quarterly reviews. By the time someone noticed, the reporting had been quietly wrong for two quarters.

They had a data dictionary from the migration project. Nobody had updated it.

What is a data dictionary?

A data dictionary is a catalogue of your key data fields: their names, definitions, formats, allowed values, owning system, and who is responsible for them. For a services firm with 5 to 50 staff, the critical set typically covers client identifiers, engagement status, fees, utilisation, work in progress, and personal data fields that need documenting for GDPR compliance.

A complete entry does not need to be elaborate. The minimum useful record contains the human-readable field name, the technical name as it appears in the system, a plain-English definition, the data type and format (for example, date as YYYY-MM-DD), valid values or business rules (such as “Engagement_Status can only be Active, On Hold, or Closed”), the system where the field lives, the named owner, and when it was last reviewed. One row in a Google Sheet or a Notion page per field is enough to start with.

For a services firm at this scale, 20 to 50 entries covers the fields that actually drive business decisions. Trying to document every field in every system is the wrong instinct. The right instinct is to work back from the business questions you cannot currently answer reliably, then document the fields those answers depend on.

Why does it matter for your business?

A data dictionary without an owner goes stale within months, and stale definitions have a predictable cost: management reports that require manual reconciliation before every board meeting, GDPR records that become hard to defend, and AI tools running on input data nobody has formally defined. The fields your business uses to make decisions are worth protecting with a single source of meaning.

The ICO’s accountability framework expects organisations to understand what personal data they hold, where it is stored, and how it is used. A data dictionary that maps personal data fields to their categories, purposes, and lawful bases gives you a structured index for your Article 30 records of processing activities. That means faster preparation time and fewer gaps when the ICO asks questions.

The Cabinet Office’s Digital, Data and Technology Playbook makes the same point for reporting and analytics: agreed definitions should be embedded into the systems and products people use daily, not left as standalone documentation that nobody reads. For an owner-managed business, the practical version of that is simple. When “revenue” means the same thing in Xero, your CRM, and your BI tool, your management reports stop requiring manual reconciliation before every quarterly review.

Where will you actually meet it in your business?

Three contexts make a data dictionary earn its keep in a services firm. Dashboards and management reporting come first: every KPI in Power BI or Looker Studio should link to a dictionary entry, so anyone querying a number can see exactly how it is calculated and which fields feed it. GDPR records of processing come second. AI tools come third, and this is where the stakes are rising fastest.

For GDPR, the ICO expects a record of processing activities that documents what personal data you hold, why you hold it, who can access it, and how long you keep it. A dictionary that already maps each personal data field to its category, purpose, and lawful basis means you are not rebuilding this information from scratch each time you review your compliance position. The entries already exist; you are referencing them rather than recreating them.

For AI tools, the NCSC’s guidance on managing AI security risks advises organisations to understand data lineage and quality for AI training and inference, and to maintain documentation of data flows. If your team uses a language model to draft proposals from CRM data, or runs AI-generated summaries of client notes, a maintained dictionary is where you document which fields feed those processes. The ICO’s AI and data protection risk toolkit makes the same point for any AI application that touches personal data.

When should you update it, and when can you leave it alone?

A quarterly review of your 20 to 50 most critical fields is the right cadence for a services firm running one system per domain. Set aside one hour per quarter with whoever owns each data domain. The review asks three questions each time: are all definitions still accurate, are there new fields to add, and are there obsolete entries to remove? That is four hours a year to keep your reporting reconciliation-free.

Unscheduled updates should happen whenever a source system changes. A new field added to the CRM, a modified calculation on the management dashboard, or a new AI prompt that draws on client records are each a trigger for an immediate update. The Acceldata guidance on data dictionaries recommends a straightforward change-control loop: the request is logged, a steward approves it, and the dictionary is updated before the change goes live rather than after. That sequence prevents the quiet divergence that happens when systems change and documentation follows weeks later, if at all.

Ownership per domain makes the whole cycle work. A working structure for a 5 to 50 person services firm is one named data owner per domain: clients, finance, and people. Three people, three domains, a shared calendar event for the quarterly review. The DAMA UK Data Management Body of Knowledge identifies ownership as the foundational control for data accuracy: without a named steward, no one notices when a definition becomes outdated, because no one is looking.

What sits alongside a data dictionary in a working data setup?

Three things determine whether a data dictionary stays a live tool over time. A direct connection to real work is the first: the dictionary should be the answer to “how is that number calculated?” not a document people remember only during compliance reviews. Named ownership per domain is the second. A scheduled review date that people actually keep is the third.

Two concepts sit alongside the dictionary in a complete working setup. An information asset register maps what personal data you hold and where it sits across your systems, aligned with ICO accountability requirements. Documentation of AI data flows records which datasets and fields feed each model or prompt your team uses, an obligation that the EU AI Act is beginning to formalise for organisations serving EU clients or deploying higher-risk AI applications. Neither of these is a separate project from the dictionary. A well-maintained dictionary, extended to include personal data categories and AI data sources, spans all three.

For tooling, a structured Google Sheet or a Notion table is sufficient for a firm with one to three systems. Enterprise catalogue platforms such as Alation and Dataedo automate metadata extraction from databases and SaaS tools, but they are sized for larger organisations and are not the right starting point at this scale. Leadership Services’ Business Leaders Playbook for Data recommends Confluence, Notion, SharePoint, or even a structured spreadsheet as practical first-line options for owner-managed businesses.

What the NHS Digital experience confirms at scale: inconsistent field definitions undermine management reporting regardless of organisation size. A working dictionary earns its place by being current, owned, and tied to the numbers people actually use.

The quickest starting point is whichever reconciliation problem you have had most recently. Trace it back to the field that caused it. That field is your first dictionary entry. Build the 20 that matter most around it, name the three domain owners, set a review date, and link each KPI in your dashboard to its entry before the next quarterly review.

Sources

- ICO (2023). Accountability Framework. Covers the ICO's expectations around information asset registers, records of processing, and assigned ownership of data. https://ico.org.uk/for-organisations/accountability-framework/ - ICO (2023). Records of processing activities guidance (Article 30 UK GDPR). Sets out what organisations must document under UK GDPR accountability obligations. https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/guide-to-uk-gdpr/accountability-and-governance/documenting-processing-activities/ - NCSC (2024). Managing the security risks of artificial intelligence. Advises organisations to document data lineage and quality for AI training and inference, and to maintain documentation of data flows. https://www.ncsc.gov.uk/collection/managing-ai-security-risks - Cabinet Office / CDDO (2023). The Digital, Data and Technology Playbook. UK government guidance recommending data standards, ownership assignment, and prioritisation of high-value data. https://www.gov.uk/government/publications/the-digital-data-and-technology-playbook - Leadership Services (2026). The Business Leaders Playbook for Data. UK-focused guide for business leaders recommending data dictionaries, named data owners, and GDPR documentation practices. https://leadership-services.co.uk/wp-content/uploads/2026/04/The-Business-Leaders-Playbook-for-Data-2.pdf - DAMA UK (n.d.). Data Management Body of Knowledge overview. Covers data governance, ownership structures, and stewardship as foundational controls for data accuracy. https://www.dama-uk.org/content.aspx?page_id=22&club_id=339913&module_id=410349 - NHS Digital (ongoing). Data Quality Maturity Index. Identifies inconsistent field definitions and coding practices across systems as a primary source of data quality problems. https://digital.nhs.uk/data-and-information/publications/ci-hub/data-quality-maturity-index - Acceldata (2024). Why a Data Dictionary Is Critical for Data Accuracy and Control. Covers naming conventions, stewardship, automated metadata extraction, and review cycles for accurate data dictionaries. https://www.acceldata.io/blog/why-a-data-dictionary-is-critical-for-data-accuracy-and-control - European Parliament (2023). EU AI Act: political agreement and key obligations. Requires providers of high-risk AI systems to establish data governance and management practices, including dataset documentation. https://www.europarl.europa.eu/news/en/press-room/20231206IPR15330/artificial-intelligence-act-deal-on-comprehensive-rules-for-trustworthy-ai

Frequently asked questions

How many fields should a data dictionary cover for a small services firm?

Start with the 20 to 50 fields that drive your key business questions and compliance obligations. For a services firm, that typically means client identifiers, engagement status, fees, utilisation rates, work in progress, and the personal data categories you need for GDPR records of processing. Keeping it focused on fields that actually drive decisions makes the dictionary sustainable to maintain.

How often should a data dictionary be updated?

A quarterly one-hour review with your data domain owners is enough for a services firm running one system per domain. Trigger unscheduled updates whenever a source system changes: a new CRM field, a modified dashboard metric, or a new AI prompt that draws on client records. A simple change-request log, where a steward approves updates before they go live, prevents silent drift.

Can a data dictionary satisfy GDPR Article 30 requirements?

A data dictionary that includes personal data categories, lawful bases, and purposes of processing can form the structured index for your Article 30 records of processing activities. The ICO expects organisations to document what personal data they hold, where it is stored, and how it is used. A well-structured dictionary supports that record rather than duplicating it, reducing preparation time and keeping the information in one place.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation