What is function calling? The primitive under agentic AI

Two colleagues at a shared monitor reviewing a list together, one pointing at the screen and the other writing on a printed sheet
TL;DR

Function calling, sometimes called tool use, is the mechanism that lets an AI model request actions from your real systems instead of just generating text. Your application defines the available functions, the model decides which to call, your code executes them, and the result feeds back into the conversation. The procurement question is how you control which functions the AI can call, with what oversight.

Key takeaways

- Function calling lets an AI model output a structured request that your application executes against your live systems, then feeds the result back into the conversation. The AI does not run the function itself. - Every function you register is a permission you have granted. Without authorisation, scope control, and an audit trail, function calling is unattended automation with a chatty interface. - Function definitions consume tokens on every call. Twenty registered functions can ship 11,000 to 28,000 tokens of overhead per request before the user's question is processed, and tool-selection accuracy collapses past about ten registered tools. - The Model Context Protocol has become the de facto standard for tool integration in 2026. "Do you support MCP?" is a fair procurement question because it signals the vendor expects you might switch model providers without rewriting integrations. - Start narrow. One workflow, a small set of functions, a clear success metric. Prove the model on a high-volume repetitive task with a structured data source you can safely expose, then expand.

A 30-staff marketing agency I spoke to recently had built an AI customer-support tool that could call lookup_account, check_subscription, and process_refund. The vendor demo had been clean. A customer asks for a refund, the system checks the policy, processes it, and confirms. Six weeks in, an internal audit showed the AI had quietly processed fourteen refunds without human approval because the function permissions were never properly scoped.

The refunds were small, the customers were genuine, the loss was modest. The shock was not the money. It was the realisation that function calling without governance is just unattended automation with a polite tone of voice. The owner now had to redesign the permission model retrospectively, with live customers, while the function-calling pipeline was already in production. Every “agentic AI” pitch on the desk in 2026 has function calling underneath it. The conversation worth having is about which functions, against which data, with what oversight.

What is function calling?

Function calling lets an AI model recognise it needs external data or an action, output a structured request for that action, and incorporate the result into its response. Your application defines the available functions, the model picks which to call, your code runs it and returns the result, and the model then composes a final answer for the user. Same pattern across OpenAI, Anthropic, and Google Gemini.

A function definition has a name like lookup_customer, a short description, and a list of parameters the model needs to fill in. The detail that catches owners off guard is that the AI does not execute the function itself. It writes a JSON request and hands it back to your application. Your code is what touches the database, the payment processor, or the email service. That separation is where governance lives.

Why does it matter for your business?

Without function calling, an LLM is trapped in its training data. With function calling, a customer-support agent can resolve “where is my order” end-to-end in seconds rather than minutes, with no human in the loop for routine queries. Vendors deploying function calling for support commonly report thirty to forty per cent reductions in ticket volume and seventy to eighty per cent of routine queries handled entirely by the system.

The hidden cost is the token tax. Every function definition you register is sent to the model on every call: name, description, parameters, examples. A single function consumes 550 to 1,400 tokens of overhead. Register twenty functions and you are shipping 11,000 to 28,000 tokens of overhead per request before the user’s question is processed. Worse, accuracy drops as the registry grows. Five to ten tools gives over ninety per cent selection accuracy. Past fifty tools it falls to roughly forty-nine per cent, which is essentially a coin flip. Semantic tool routing, where a lightweight classifier picks the relevant five to ten tools per query, is the standard mitigation in 2026.

Where will you actually meet it?

You will meet function calling first in vendor pitches, often without the term being used. Any demo where the AI looks something up in a CRM, checks an order, drafts an invoice, or triggers a workflow is function calling under the bonnet. Salesforce Einstein, Google Cloud Vertex AI Agent Builder, Microsoft Copilot Studio, Zapier Agents, and Make all expose it as the foundation of their agentic features.

You will meet it inside your own operations the moment your team starts building an internal tool that reads from Slack and acts on a stock level, an invoice, or a customer record. The customer-support flow is the shape an SME meets first: a customer asks where their order is, the agent calls get_order_status, receives the dispatch date and tracking link, and answers in plain English. If the customer asks for a refund, the agent calls check_return_policy, then initiate_return, and confirms. The same pattern shows up in billing dispute handling, subscription management, and internal knowledge retrieval.

The procurement question to put on the table is not “do you support function calling” but “how do you control which functions the AI can call, against which data, and with what oversight.” If the vendor cannot answer that cleanly, the demo is the demo and the audit is yours to write later.

When to authorise it, when to defer

Function calling earns its keep on high-volume repetitive workflows where you have a structured data source you can safely expose. If half your support tickets are “where is my order” or “what is my invoice total,” and your data lives in a system you control, the case is straightforward. If you handle hundreds or thousands of interactions a day, the per-call overhead amortises. Below that, setup and governance probably outweighs the saving.

Defer function calling, or keep it read-only, in three situations. The first is workflows that need human judgement. An agent can look up a refund policy, but deciding whether to make an exception is still a person’s call. The second is messy or distributed data, where there is no single source of truth. Function calling will amplify the inconsistency rather than resolve it. The third is when your team cannot resource the authorisation layer. Every function call should pass through a check that asks: does the agent have permission for this action, does the user behind the agent have permission, and are we within operational limits like a refund cap or a time window. The layer returns DENY, REQUIRE_APPROVAL, or MASK before the function executes. Without that layer, you are running unattended automation with a chatty interface, which is how the agency in the opening anecdote ended up paying out fourteen unapproved refunds.

The pragmatic starting point for a typical services-led SME in the £1m to £10m range is an orchestration platform like Zapier or Make: low engineering overhead, vendor-maintained connectors, and a quick first proof. Once a workflow has earned it, move to a custom integration or to an MCP server.

An AI agent is the consuming layer above function calling. The agent decides which function to call, in what order, and when to stop. Function calling is the primitive; the agent is the system that uses the primitive to get something done. An agent without function calling is trapped in a text interface. An agent with function calling can act on your live systems, which is exactly why the governance maths is different.

The Model Context Protocol is the cross-vendor standard for tool integration that has converged through 2026. Define a tool once in MCP and it works with Claude, Gemini, OpenAI models, and open-source frameworks. AWS Bedrock, Azure AI, and Google Cloud Vertex now offer managed MCP endpoints. Asking a vendor “do you support MCP” is fair procurement language because it signals the vendor expects you might switch model providers without rewriting integrations.

Tool poisoning is the security category to know about. If an attacker controls the description or output of a function the AI calls, they can embed hidden instructions that manipulate behaviour. The mitigation is to treat tool metadata as low-trust unless it comes from a signed source, and to sanitise descriptions and outputs from any external tool before they reach the model. The wider category, prompt injection, shows up in tool outputs as well as in user input, so the same threat model applies.

An API is the underlying contract any two systems use to talk to each other. A function-calling system typically calls one or more APIs underneath each function. The MCP server, where you have one, sits between the AI client and those APIs and exposes them in the shape an AI client expects.

The honest version of an agentic AI pitch in 2026 names the function registry, shows you the authorisation layer, and offers an audit trail. The marketing version skips all three.

Sources

OpenAI (2026). Function calling guide. The canonical reference for how OpenAI implements tool use, including parallel tool calls and the JSON schema format. https://developers.openai.com/api/docs/guides/function-calling Anthropic (2026). Claude tool use overview. The reference for Anthropic's approach, including strict tool use and auditable decision-making. https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview Google (2026). Gemini function calling documentation. The reference for Google's function declaration format, closely modelled on OpenAPI schema. https://ai.google.dev/gemini-api/docs/function-calling Model Context Protocol (2025). Specification 2025-03-26. The open standard for connecting AI models to tools and data, used as the cross-vendor integration layer through 2026. https://modelcontextprotocol.io/specification/2025-03-26 Microsoft Security (2026). Authorisation and governance for AI agents: runtime authorisation beyond identity. The reference for the DENY, REQUIRE_APPROVAL, and MASK pattern at the function-call boundary. https://techcommunity.microsoft.com/blog/microsoft-security-blog/authorization-and-governance-for-ai-agents-runtime-authorization-beyond-identity/4509161 Microsoft Developer Blog (2026). Protecting against indirect prompt injection in MCP. Cited for tool poisoning and the case for treating tool metadata as untrusted unless signed. https://developer.microsoft.com/blog/protecting-against-indirect-injection-attacks-mcp Tian Pan (2026). The hidden token tax in production LLM pipelines. Source for the per-function token overhead, the 5-to-10 versus 50+ tools accuracy curve, and semantic tool routing as the 2026 mitigation. https://tianpan.co/blog/2026-04-11-hidden-token-tax-production-llm-pipelines IBM (2025). What is tool calling? Definition and architecture. Cited for the plain-English five-stage workflow shared across vendor implementations. https://www.ibm.com/think/topics/tool-calling OWASP (2025). LLM Prompt Injection Prevention Cheat Sheet. Reference for the wider prompt-injection threat model, including indirect injection through tool descriptions and outputs. https://cheatsheetseries.owasp.org/cheatsheets/LLM_Prompt_Injection_Prevention_Cheat_Sheet.html

Frequently asked questions

How is function calling different from a chatbot?

A chatbot answers each message using only what it learned during training. Function calling lets the model recognise it needs live information or a real action, output a structured request to your application, and incorporate the result into its response. The chatbot tells the customer it cannot see their order. The function-calling system looks the order up and answers the question.

Do I need to write the functions myself?

Sometimes, often not. Vendor-built tools through Salesforce Agentforce, Zapier, and Make expose pre-built connectors as functions the AI can call. Custom functions over your own database or line-of-business application give you more control but cost engineering time. The pragmatic starting point for a typical services-led SME is an orchestration platform with vendor connectors, then move to custom or to an MCP server once a use case has earned it.

What is the biggest risk with function calling?

Misconfigured authorisation. The model decides which function to call based on the conversation, but your application is what executes it. If the authorisation layer does not check whether the agent has permission, whether the user behind the agent has permission, and whether the call is within operational limits like a refund cap, the system can act in ways nobody approved. Treat the authorisation layer as the actual product, not the AI.

This post is general information and education only, not legal, regulatory, financial, or other professional advice. Regulations evolve, fee benchmarks shift, and every situation is different, so please take qualified professional advice before acting on anything you read here. See the Terms of Use for the full position.

Ready to talk it through?

Book a free 30 minute conversation. No pitch, no pressure, just a useful chat about where AI fits in your business.

Book a conversation

Related reading

If any of this sounds familiar, let's talk.

The next step is a conversation. No pitch, no pressure. Just an honest discussion about where you are and whether I can help.

Book a conversation