Testing AI inside the SaaS you already use

A woman in her forties at an office desk comparing her laptop screen to a paper invoice with a thoughtful expression.
TL;DR

Mainstream SME SaaS platforms now ship generative AI features by default, which means owners are relying on outputs they did not configure and cannot directly inspect. The fix is a three-tier sampling protocol, sharper contract questions covering training data and audit logs, and an explicit veto list for actions that should never be auto-actioned regardless of what the vendor claims.

Key takeaways

- Embedded AI is now the default in CRM, accounting, HR and ATS platforms, so your AI evaluation surface is much wider than the tools you bought as AI tools. - Treat vendor-embedded AI as in scope for governance, with the same contract scrutiny you would apply to any third-party processor. - Run a three-tier sampling protocol, high-stakes outputs get checked every time, medium get random sampling, low rely on user feedback signals. - Build a written veto list of actions that must never be auto-actioned, payment release, contract execution, regulated client communications, hiring decisions. - Ask vendors six contract questions before renewal, training data use, output ownership, audit log access, opt-out mechanics, model version notifications, and indemnification scope.

A managing partner at a small accountancy practice noticed something odd last month. Her platform had pre-filled a “smart suggestion” on a client’s quarterly invoice, a recommended adjustment to a recurring fee line. It looked sensible. She nearly clicked accept. Then she paused and asked the obvious question, where did that suggestion come from. The platform’s release notes from six months earlier mentioned a new AI feature in passing. Nobody on her team had ever turned it off, because nobody had ever turned it on. When she dug into the help centre, she found three more “smart” features she had not realised were AI generated, each one quietly nudging her team in directions she had never authorised.

That is the surface a meaningful share of owner-managers have not yet started evaluating. The AI you knowingly bought, ChatGPT, Copilot, a domain-specific tool, sits inside a careful procurement process. The AI you did not knowingly buy, the suggestions and predictions now baked into your CRM, accounting platform, ATS and HR system, sits inside outputs your team treats as normal software behaviour. The evaluation question has changed shape and the playbook has not caught up. Worse, the surface keeps growing without your sign-off, because every quarterly product release tends to add another AI-flavoured feature that defaults on.

What does embedded AI inside your SaaS stack actually mean?

It means generative or predictive model outputs are being produced inside platforms you bought for other reasons, often with no obvious flag in the user interface. Salesforce Einstein, HubSpot Breeze, Xero’s analytics, Sage’s auto-categorisation, BambooHR’s writing assistants, Greenhouse’s screening predictions. The vendor shipped AI as a toggle that defaulted on, and the user sees a suggestion that looks like a rule but is actually a model output.

The inventory question matters because you cannot evaluate what you have not catalogued. Walk through each platform in your stack and ask the same three questions for every feature labelled smart, suggested, predicted, auto or assisted. Is this output generated by a model. Where does the training data come from. Can it be turned off without breaking the workflow my team depends on.

Why is vendor-embedded AI now a governance problem?

Because regulators, frameworks and your own clients have started treating it as one. JD Supra’s 2026 analysis on third-party AI risk is direct, vendor due diligence frameworks now include AI-specific questions, ongoing monitoring obligations, and supply chain provenance checks. The EU AI Act’s Article 50 and 55 push monitoring and logging requirements down through high-risk applications.

California’s AB 2013 creates training data disclosure obligations that cascade to UK SMEs serving California customers, which is a meaningful share of UK-based services firms. The Information Commissioner’s Office reinforced the direction of travel in its 2026 guidance, noting that documented calibration and monitoring processes are required for AI systems handling personal data, regardless of whether you built the AI or your vendor did.

A common assumption is that responsibility sits with the vendor once you have signed the licence agreement. It does not. If the output influenced a decision affecting one of your customers, employees or candidates, you are accountable for the decision quality, the data inputs and the audit trail behind it. The contract you signed gave you the licence to use the feature. It did not transfer the regulatory or fiduciary exposure that comes with acting on its outputs.

Where will you actually meet this in your business?

In four places, predictably. Finance and accounting platforms, where AI now suggests categorisation, flags anomalies and drafts commentary on management accounts. Customer-facing tools, CRMs drafting email replies, helpdesks auto-summarising tickets, sales platforms predicting deal probability. HR and hiring stacks, ATS systems scoring candidates and engagement tools producing sentiment analysis. Operations tools, scheduling systems optimising routes and inventory platforms predicting reorder points.

The pattern in each case is the same. The vendor positions the feature as a productivity gain. The user adopts it as normal software. The decision-quality question, was that output correct in this case, gets compressed into a one-click accept or reject, and the reject path is usually slower than the accept path. That asymmetry produces silent error accumulation. The Alan Turing Institute’s 2025 SME benchmarking work recommends establishing accuracy baselines on 50 representative outputs per feature before treating it as operationally trusted.

When should you push back versus when should you let it run?

Run a three-tier sampling protocol calibrated to the stakes of the output. Tier one is high-stakes, anything touching money out, regulated client communications, hiring decisions or contractual commitments, where every output gets human-checked before action. Tier two is medium-stakes, meeting summaries, draft replies, expense categorisations under a threshold, sampled at a rate matching your tolerance. Tier three is low-stakes, where user feedback signals do the monitoring.

Layer a written veto list on top of the sampling protocol. The list names actions that must never be auto-actioned by an embedded feature regardless of vendor claim. Payment release above a defined value. Contract execution. Regulated client communications going to professional bodies, HMRC, FCA or equivalent. Hiring decisions including automatic rejection. Anything where reversal is expensive, slow or impossible.

The veto list is short, written, and known to everyone with admin rights on the platform. It is the cheapest piece of governance you can put in place and the one many owner-managers do not have. Review it every six months as your vendors release new features, because the veto list ages faster than the contract underneath it.

Six contract questions catch a meaningful share of the risk, worth asking at the next renewal regardless of whether the vendor invites them. How is our data used to train your models. Who owns the outputs from our prompts. Can we access full audit logs with timestamps and model version. What is the opt-out mechanic. How are we notified of model changes. What is your indemnification scope if an AI output harms us.

The related governance topics worth working alongside this one are buying AI with a sharper diligence sheet, building a written AI use policy your team will actually read, and the question of when shadow AI inside your team should be tolerated, surfaced or shut down. Embedded vendor AI sits at the intersection of those three.

A practical first step is the stack inventory itself. Walk the seven or eight SaaS tools your firm runs day to day, name the AI features inside each one, and write a two-line statement against each describing what the feature does and whether you have changed any settings since the day it appeared. Half the work of governance is having a list. The other half is reading it twice a year.

If you want a sounding board on how to inventory your stack and write the veto list for your own business, Book a conversation.

Sources

- JD Supra (2026). Third-party AI risk, why vendor due diligence is different. Used to anchor the claim that vendor-embedded AI is now in scope for third-party risk frameworks and ongoing monitoring. https://www.jdsupra.com/legalnews/third-party-ai-risk-vendor-due-diligence/ - Zyte (2026). AI data compliance and provenance under the EU AI Act and California AB 2013. Source for the cascading data provenance obligations point in the contract questions section. https://www.zyte.com/blog/ai-data-compliance-2026/ - European Commission (2024). EU AI Act, Article 50 on monitoring requirements and Article 55 on high-risk systems. Anchors the regulatory backdrop for audit log and disclosure obligations on businesses using embedded AI features. https://digital-strategy.ec.europa.eu/en/library/proposal-ai-act - Information Commissioner's Office (2026). Guidance on AI and data protection. Cited for the requirement to document calibration and monitoring processes for AI systems processing personal data. https://ico.org.uk/for-organisations/ai-and-data-protection/ - National Cyber Security Centre (2025). AI security guidelines v3.1. Source for the three-phase verification framework that informs the sampling protocol section. https://www.ncsc.gov.uk/collection/ai-security - Information Commissioner's Office (2026). Enforcement notice EN-2026-045 on e-commerce pricing agents. Used to illustrate the consequences of running automated decisions without logged audit trails. https://ico.org.uk/action-weve-taken/enforcement-notices/2026/en-2026-045/ - UK AI Safety Institute (2026). Annual risk assessment report and incident database, SME sector reports. Source for the prevalence of agentic and embedded AI failure patterns in owner-managed businesses. https://www.gov.uk/government/organisations/ai-safety-institute - Chartered Institute of Management Accountants (2026). AI output verification guide. Anchors the tiered verification protocol structure adapted in the sampling section. https://www.cimaglobal.com/Research/Reports/AI-output-verification-guide/ - Alan Turing Institute (2025). SME AI evaluation benchmarks. Used to support the recommendation that owners establish baselines through 50 representative outputs before operational use. https://www.turing.ac.uk/sme-ai-benchmarking-2025

Frequently asked questions

How do I find out which features in my SaaS stack are actually using AI?

Start with your vendor's release notes and product changelog for the last 12 months, then check the help centre for any feature described as "smart", "suggested", "predicted" or "AI assisted". Cross-reference that with what your team actually clicks. The gap between what vendors disclose and what users notice is often where the risk sits, particularly for nudges that get accepted without a second thought.

Do I really need contract changes for AI features I did not specifically opt into?

Yes, if the outputs influence client work, financial decisions or hiring. Third-party risk frameworks now treat vendor-embedded AI as in scope for due diligence, and the EU AI Act plus rules like California's AB 2013 create data provenance obligations that cascade down the supply chain. If you cannot answer basic questions about training data or audit logs, neither can you answer them when a regulator or client asks.

What is the single highest-risk pattern with embedded AI?

Silent automation of actions the user thinks they triggered. A "smart suggestion" that auto-fills an invoice line, a "predicted" cost code on an expense, an automated screening rejection in an ATS. The user clicks accept, the action carries their authority, but the reasoning came from a model they cannot inspect. Build a written veto list for actions that must never be one-click regardless of vendor claim.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation