Testing AI inside your SaaS stack

A managing partner at a small accountancy practice noticed something odd last month. Her platform had pre-filled a “smart suggestion” on a client’s quarterly invoice, a recommended adjustment to a recurring fee line. It looked sensible. She nearly clicked accept. Then she paused and asked the obvious question, where did that suggestion come from. The platform’s release notes from six months earlier mentioned a new AI feature in passing. Nobody on her team had ever turned it off, because nobody had ever turned it on. When she dug into the help centre, she found three more “smart” features she had not realised were AI generated, each one nudging her team in directions she had never authorised.

That is the surface a meaningful share of owner-managers have not yet started evaluating. The AI you knowingly bought, ChatGPT, Copilot, a domain-specific tool, sits inside a careful procurement process. The AI you did not knowingly buy, the suggestions and predictions now baked into your CRM, accounting platform, ATS and HR system, sits inside outputs your team treats as normal software behaviour. The evaluation question has changed shape and the playbook has not caught up. Worse, the surface keeps growing without your sign-off, because every quarterly product release tends to add another AI-flavoured feature that defaults on.

What does embedded AI inside your SaaS stack actually mean?

It means generative or predictive model outputs are being produced inside platforms you bought for other reasons, often with no obvious flag in the user interface. Salesforce Einstein, HubSpot Breeze, Xero’s analytics, Sage’s auto-categorisation, BambooHR’s writing assistants, Greenhouse’s screening predictions. The vendor shipped AI as a toggle that defaulted on, and the user sees a suggestion that looks like a rule but is actually a model output.

You cannot evaluate what you have not catalogued. Walk through each platform in your stack and ask the same three questions for every feature labelled smart, suggested, predicted, auto or assisted. Is this output generated by a model. Where does the training data come from. Can it be turned off without breaking the workflow my team depends on.

Why is vendor-embedded AI now a governance problem?

Because regulators, frameworks and your own clients have started treating it as one. JD Supra’s 2026 analysis on third-party AI risk is direct, vendor due diligence frameworks now include AI-specific questions, ongoing monitoring obligations, and supply chain provenance checks. The EU AI Act’s Article 50 and 55 push monitoring and logging requirements down through high-risk applications.

California’s AB 2013 creates training data disclosure obligations that cascade to UK SMEs serving California customers, which is a meaningful share of UK-based services firms. The Information Commissioner’s Office reinforced the direction of travel in its 2026 guidance, noting that documented calibration and monitoring processes are required for AI systems handling personal data, regardless of whether you built the AI or your vendor did.

A common assumption is that responsibility sits with the vendor once you have signed the licence agreement. It does not. If the output influenced a decision affecting one of your customers, employees or candidates, you are accountable for the decision quality, the data inputs and the audit trail behind it. The contract you signed gave you the licence to use the feature. It did not transfer the regulatory or fiduciary exposure that comes with acting on its outputs.

Where will you actually meet this in your business?

In four places, predictably. Finance and accounting platforms, where AI now suggests categorisation, flags anomalies and drafts commentary on management accounts. Customer-facing tools, CRMs drafting email replies, helpdesks auto-summarising tickets, sales platforms predicting deal probability. HR and hiring stacks, ATS systems scoring candidates and engagement tools producing sentiment analysis. Operations tools, scheduling systems optimising routes and inventory platforms predicting reorder points.

The pattern in each case is the same. The vendor positions the feature as a productivity gain. The user adopts it as normal software. The decision-quality question, was that output correct in this case, gets compressed into a one-click accept or reject, and the reject path is usually slower than the accept path. That asymmetry produces silent error accumulation. The Alan Turing Institute’s 2025 SME benchmarking work recommends establishing accuracy baselines on 50 representative outputs per feature before treating it as operationally trusted.

When should you push back versus when should you let it run?

Run a three-tier sampling protocol calibrated to the stakes of the output. Tier one is high-stakes, anything touching money out, regulated client communications, hiring decisions or contractual commitments, where every output gets human-checked before action. Tier two is medium-stakes, meeting summaries, draft replies, expense categorisations under a threshold, sampled at a rate matching your tolerance. Tier three is low-stakes, where user feedback signals do the monitoring.

Layer a written veto list on top of the sampling protocol. The list names actions that must never be auto-actioned by an embedded feature regardless of vendor claim. Payment release above a defined value. Contract execution. Regulated client communications going to professional bodies, HMRC, FCA or equivalent. Hiring decisions including automatic rejection. Anything where reversal is expensive, slow or impossible.

The veto list is short, written, and known to everyone with admin rights on the platform. It is the cheapest piece of governance you can put in place and the one many owner-managers do not have. Review it every six months as your vendors release new features, because the veto list ages faster than the contract underneath it.

Six contract questions catch a meaningful share of the risk, worth asking at the next renewal regardless of whether the vendor invites them. How is our data used to train your models. Who owns the outputs from our prompts. Can we access full audit logs with timestamps and model version. What is the opt-out mechanic. How are we notified of model changes. What is your indemnification scope if an AI output harms us.

The related governance topics worth working alongside this one are buying AI with a sharper diligence sheet, building a written AI use policy your team will actually read, and the question of when shadow AI inside your team should be tolerated, surfaced or shut down. Embedded vendor AI sits at the intersection of those three.

A practical first step is the stack inventory itself. Walk the seven or eight SaaS tools your firm runs day to day, name the AI features inside each one, and write a two-line statement against each describing what the feature does and whether you have changed any settings since the day it appeared. Half the work of governance is having a list. The other half is reading it twice a year.

If you want a sounding board on how to inventory your stack and write the veto list for your own business, Book a conversation.

Testing AI inside the SaaS you already use

Key takeaways

What does embedded AI inside your SaaS stack actually mean?

Why is vendor-embedded AI now a governance problem?

Where will you actually meet this in your business?

When should you push back versus when should you let it run?

Sources

Frequently asked questions

How do I find out which features in my SaaS stack are actually using AI?

Do I really need contract changes for AI features I did not specifically opt into?

What is the single highest-risk pattern with embedded AI?

Ready to talk it through?

If any of this sounds familiar, let's talk.

Testing AI inside the SaaS you already use

Key takeaways

What does embedded AI inside your SaaS stack actually mean?

Why is vendor-embedded AI now a governance problem?

Where will you actually meet this in your business?

When should you push back versus when should you let it run?

Related concepts and what to ask your vendor next

Sources

Frequently asked questions

How do I find out which features in my SaaS stack are actually using AI?

Do I really need contract changes for AI features I did not specifically opt into?

What is the single highest-risk pattern with embedded AI?

Ready to talk it through?

Related reading

AI theatre or real progress: how a founder tells the difference

How safe is AI for business use, and where do the risks sit?

How accurate is AI translation for business documents?

If any of this sounds familiar, let's talk.