A 30-staff marketing agency I spoke to recently had built an AI customer-support tool that could call lookup_account, check_subscription, and process_refund. The vendor demo had been clean. A customer asks for a refund, the system checks the policy, processes it, and confirms. Six weeks in, an internal audit showed the AI had quietly processed fourteen refunds without human approval because the function permissions were never properly scoped.
The refunds were small, the customers were genuine, the loss was modest. The shock was not the money. It was the realisation that function calling without governance is just unattended automation with a polite tone of voice. The owner now had to redesign the permission model retrospectively, with live customers, while the function-calling pipeline was already in production. Every “agentic AI” pitch on the desk in 2026 has function calling underneath it. The conversation worth having is about which functions, against which data, with what oversight.
What is function calling?
Function calling lets an AI model recognise it needs external data or an action, output a structured request for that action, and incorporate the result into its response. Your application defines the available functions, the model picks which to call, your code runs it and returns the result, and the model then composes a final answer for the user. Same pattern across OpenAI, Anthropic, and Google Gemini.
A function definition has a name like lookup_customer, a short description, and a list of parameters the model needs to fill in. The detail that catches owners off guard is that the AI does not execute the function itself. It writes a JSON request and hands it back to your application. Your code is what touches the database, the payment processor, or the email service. That separation is where governance lives.
Why does it matter for your business?
Without function calling, an LLM is trapped in its training data. With function calling, a customer-support agent can resolve “where is my order” end-to-end in seconds rather than minutes, with no human in the loop for routine queries. Vendors deploying function calling for support commonly report thirty to forty per cent reductions in ticket volume and seventy to eighty per cent of routine queries handled entirely by the system.
The hidden cost is the token tax. Every function definition you register is sent to the model on every call: name, description, parameters, examples. A single function consumes 550 to 1,400 tokens of overhead. Register twenty functions and you are shipping 11,000 to 28,000 tokens of overhead per request before the user’s question is processed. Worse, accuracy drops as the registry grows. Five to ten tools gives over ninety per cent selection accuracy. Past fifty tools it falls to roughly forty-nine per cent, which is essentially a coin flip. Semantic tool routing, where a lightweight classifier picks the relevant five to ten tools per query, is the standard mitigation in 2026.
Where will you actually meet it?
You will meet function calling first in vendor pitches, often without the term being used. Any demo where the AI looks something up in a CRM, checks an order, drafts an invoice, or triggers a workflow is function calling under the bonnet. Salesforce Einstein, Google Cloud Vertex AI Agent Builder, Microsoft Copilot Studio, Zapier Agents, and Make all expose it as the foundation of their agentic features.
You will meet it inside your own operations the moment your team starts building an internal tool that reads from Slack and acts on a stock level, an invoice, or a customer record. The customer-support flow is the shape an SME meets first: a customer asks where their order is, the agent calls get_order_status, receives the dispatch date and tracking link, and answers in plain English. If the customer asks for a refund, the agent calls check_return_policy, then initiate_return, and confirms. The same pattern shows up in billing dispute handling, subscription management, and internal knowledge retrieval.
The procurement question to put on the table is not “do you support function calling” but “how do you control which functions the AI can call, against which data, and with what oversight.” If the vendor cannot answer that cleanly, the demo is the demo and the audit is yours to write later.
When to authorise it, when to defer
Function calling earns its keep on high-volume repetitive workflows where you have a structured data source you can safely expose. If half your support tickets are “where is my order” or “what is my invoice total,” and your data lives in a system you control, the case is straightforward. If you handle hundreds or thousands of interactions a day, the per-call overhead amortises. Below that, setup and governance probably outweighs the saving.
Defer function calling, or keep it read-only, in three situations. The first is workflows that need human judgement. An agent can look up a refund policy, but deciding whether to make an exception is still a person’s call. The second is messy or distributed data, where there is no single source of truth. Function calling will amplify the inconsistency rather than resolve it. The third is when your team cannot resource the authorisation layer. Every function call should pass through a check that asks: does the agent have permission for this action, does the user behind the agent have permission, and are we within operational limits like a refund cap or a time window. The layer returns DENY, REQUIRE_APPROVAL, or MASK before the function executes. Without that layer, you are running unattended automation with a chatty interface, which is how the agency in the opening anecdote ended up paying out fourteen unapproved refunds.
The pragmatic starting point for a typical services-led SME in the £1m to £10m range is an orchestration platform like Zapier or Make: low engineering overhead, vendor-maintained connectors, and a quick first proof. Once a workflow has earned it, move to a custom integration or to an MCP server.
Related concepts
An AI agent is the consuming layer above function calling. The agent decides which function to call, in what order, and when to stop. Function calling is the primitive; the agent is the system that uses the primitive to get something done. An agent without function calling is trapped in a text interface. An agent with function calling can act on your live systems, which is exactly why the governance maths is different.
The Model Context Protocol is the cross-vendor standard for tool integration that has converged through 2026. Define a tool once in MCP and it works with Claude, Gemini, OpenAI models, and open-source frameworks. AWS Bedrock, Azure AI, and Google Cloud Vertex now offer managed MCP endpoints. Asking a vendor “do you support MCP” is fair procurement language because it signals the vendor expects you might switch model providers without rewriting integrations.
Tool poisoning is the security category to know about. If an attacker controls the description or output of a function the AI calls, they can embed hidden instructions that manipulate behaviour. The mitigation is to treat tool metadata as low-trust unless it comes from a signed source, and to sanitise descriptions and outputs from any external tool before they reach the model. The wider category, prompt injection, shows up in tool outputs as well as in user input, so the same threat model applies.
An API is the underlying contract any two systems use to talk to each other. A function-calling system typically calls one or more APIs underneath each function. The MCP server, where you have one, sits between the AI client and those APIs and exposes them in the shape an AI client expects.
The honest version of an agentic AI pitch in 2026 names the function registry, shows you the authorisation layer, and offers an audit trail. The marketing version skips all three.



