Section 23.1: Function Calling Across Providers

"Give a model a tool and it will use it. Give it the wrong JSON schema and it will use it creatively."
Pip, Schema-Validated AI Agent

Big Picture

Function calling is the bridge between language and action. Without it, an LLM can only describe what tools to use in natural language, forcing brittle regex parsing on the application side. With function calling, the model produces structured JSON that specifies the exact function, arguments, and types, turning unreliable text parsing into reliable API dispatch. This section compares function calling implementations across OpenAI, Anthropic, Google, and open-source providers, covering schema design, multi-tool handling, and streaming behavior. The agent loop from Chapter 22 depends entirely on this mechanism for its action step.

Prerequisites

This section builds on agent foundations from Chapter 22 and LLM API basics from Chapter 10.

A friendly robot wearing a utility belt full of colorful tools, standing at a workbench and selecting the right tool while glowing JSON schemas float above each tool like labels — **Figure 23.1.1**: An AI agent selects from a structured toolkit rather than improvising. Each tool comes with a schema that defines its interface, enabling reliable function calling.

1. The Function Calling Interface

Function calling is the mechanism that transforms an LLM from a text generator into a tool-using agent. Rather than generating natural language descriptions of what tools to use, the model produces structured JSON that specifies which function to call and with what arguments. The application code then executes the function, returns the result to the model, and the model incorporates the result into its next response. This structured interface eliminates the fragile parsing of natural language tool invocations that plagued early agent systems.

Every major LLM provider now supports function calling, but the implementations differ in important ways. Understanding these differences is essential for building portable agent systems and for selecting the right provider for your use case. The core pattern is the same across all providers: define tool schemas, pass them to the model alongside the user message, handle tool call responses, execute the tools, and return results. The details of schema format, multi-tool handling, and streaming behavior vary.

OpenAI was the first major provider to ship function calling (June 2023), and their format has become the de facto standard that many open-source frameworks adopt. Anthropic's tool use implementation adds explicit thinking before tool calls. Google's Gemini API supports function calling with automatic function execution in some modes. Open-source models accessed through frameworks like Ollama or vLLM increasingly support function calling, though the reliability varies by model size and training data.

Common Misconception: The Model "Calls" Functions

The name "function calling" is misleading. The model does not execute any code or call any API directly. It produces a structured JSON object that requests a function call. Your application code is responsible for parsing that request, executing the actual function, and returning the result. The model has no access to the network, no ability to run code, and no awareness of whether its requested function call was actually executed. This means all security boundaries, input validation, and rate limiting must be implemented in your application layer, not delegated to the model. Treating the model's output as a suggestion to be validated, rather than a command to be blindly executed, is the foundation of safe tool use.

A circular conveyor belt connecting three stations: a robot sends a package, a mechanical arm processes it through a tool machine, and the robot receives the result, illustrating the iterative function calling loop — **Figure 23.1.2**: The function calling loop in action. The model sends a structured request, the application executes the tool, and the result feeds back into the next model turn. The cycle repeats until the model has enough information to respond.

Algorithm: Function Calling Loop


Input: user message M, tool schemas {T₁, ..., T_n}, LLM model, max iterations K
Output: final text response

1. messages = [system_prompt, M]
2. for i = 1 to K:
 a. response = LLM(messages, tools={T₁, ..., T_n})
 b. if response has no tool_calls:
 return response.content // final answer
 c. Append assistant message (with tool_calls) to messages
 d. for each tool_call in response.tool_calls:
 i. name = tool_call.function.name
 ii. args = parse_json(tool_call.function.arguments)
 iii. result = execute(name, args)
 iv. Append tool result message (role="tool", id=tool_call.id) to messages
3. return "Max iterations reached"

Pseudocode 23.1.1: Function calling loop

OpenAI Function Calling

This snippet defines a tool schema and processes function-call responses using the OpenAI function calling API.

from openai import OpenAI

client = OpenAI()

tools = [
 {
 "type": "function",
 "function": {
 "name": "get_weather",
 "description": "Get the current weather for a location",
 "parameters": {
 "type": "object",
 "properties": {
 "location": {
 "type": "string",
 "description": "City and state, e.g. San Francisco, CA",
 },
 "unit": {
 "type": "string",
 "enum": ["celsius", "fahrenheit"],
 "description": "Temperature unit",
 },
 },
 "required": ["location"],
 },
 },
 }
]

response = client.chat.completions.create(
 model="gpt-4o",
 messages=[{"role": "user", "content": "What is the weather in Paris?"}],
 tools=tools,
 tool_choice="auto",
)

# Handle the tool call
tool_call = response.choices[0].message.tool_calls[0]
print(f"Function: {tool_call.function.name}")
print(f"Arguments: {tool_call.function.arguments}")

Code Fragment 23.1.2: Working with openai, OpenAI

Library Shortcut: PydanticAI in Practice

Skip the manual JSON schema with PydanticAI (pip install pydantic-ai), which infers tool schemas from type hints:


from pydantic_ai import Agent

agent = Agent("openai:gpt-4o")

@agent.tool_plain
def get_weather(location: str, unit: str = "celsius") -> str:
 """Get the current weather for a location."""
 return f"72F, partly cloudy in {location}"

result = agent.run_sync("What is the weather in Paris?")
print(result.data)

Tool: get_weather, Input: {'location': 'Paris, France', 'unit': 'celsius'}

Code Fragment 23.1.3: Implementation of get_weather

Anthropic Tool Use

This snippet defines tools and handles tool-use responses using the Anthropic messages API.

import anthropic

client = anthropic.Anthropic()

tools = [
 {
 "name": "get_weather",
 "description": "Get the current weather for a location",
 "input_schema": {
 "type": "object",
 "properties": {
 "location": {
 "type": "string",
 "description": "City and state, e.g. San Francisco, CA",
 },
 "unit": {
 "type": "string",
 "enum": ["celsius", "fahrenheit"],
 "description": "Temperature unit (default: celsius)",
 },
 },
 "required": ["location"],
 },
 }
]

response = client.messages.create(
 model="claude-sonnet-4-20250514",
 max_tokens=1024,
 tools=tools,
 messages=[{"role": "user", "content": "What is the weather in Paris?"}],
)

# Anthropic returns tool_use blocks within the content array
for block in response.content:
 if block.type == "tool_use":
 print(f"Tool: {block.name}, Input: {block.input}")

Code Fragment 23.1.4: Working with anthropic

Library Shortcut: PydanticAI in Practice

Swap providers with one string using PydanticAI, which abstracts away the per-provider schema differences:


from pydantic_ai import Agent

# Same code works with any provider; just change the model string
agent = Agent("anthropic:claude-sonnet-4-20250514") # or "openai:gpt-4o"

@agent.tool_plain
def get_weather(location: str, unit: str = "celsius") -> str:
 """Get the current weather for a location."""
 return f"22C, sunny in {location}"

result = agent.run_sync("What is the weather in Paris?")

Code Fragment 23.1.5: Anthropic tool use with the Messages API. The same get_weather tool is registered using Anthropic's input_schema format, and the response is parsed by iterating over content blocks to find tool_use entries containing the tool name and structured input.

Key Insight

The quality of your tool descriptions matters more than the schema structure. Models select tools based on the description field, not just the function name. A description that says "Get weather" will be called less reliably than one that says "Get the current temperature, humidity, and conditions for a specific city. Returns real-time data from a weather API." Include when to use the tool, what it returns, and common parameter values in the description.

To understand why function calling is architecturally significant (and not just a convenience feature), consider what happens without it. Before structured function calling, agents had to embed tool invocations in natural language ("I should search for 'weather Paris'"), and application code had to parse these free-form strings with regex or heuristics. This was fragile: minor rephrasing broke the parser, and the model had no formal contract specifying valid actions. Structured function calling introduces a type system for agent actions. The JSON schema defines exactly what the model can do, with what parameters, and in what format. This transforms agent tool use from "string parsing with fingers crossed" into a well-defined API contract, enabling reliable orchestration, automatic validation, and composable tool chains as explored in Section 23.2 (MCP).

2. Multi-Tool Orchestration

Real agents need multiple tools working together. A research agent might search the web, extract content from URLs, store findings in a database, and generate a report. The model must decide not only which tool to call but in what order, and it must handle the data flow between tool calls. Modern APIs support parallel tool calling, where the model can request multiple tool executions in a single response, significantly reducing the number of round trips for independent operations.

The agent loop for multi-tool orchestration follows a standard pattern: send the user message with all tool definitions, receive the model's response (which may contain one or more tool calls), execute all requested tools, return all results in a single follow-up message, and repeat until the model produces a final text response without tool calls. Managing this loop correctly, especially handling errors from individual tool calls without derailing the entire conversation, is a core engineering challenge.

Real-World Scenario: Travel Planning with Multi-Tool Coordination

Who: A product engineer at an online travel agency building an AI trip-planning assistant.

Situation: The assistant needed to handle complex requests like "Plan a 3-day trip to Tokyo, find flights from SFO, and recommend hotels near Shibuya under $200/night." Each request required data from multiple independent APIs (flights, hotels, attractions, itinerary builder).

Problem: The initial implementation called tools sequentially: search flights, then search hotels, then get attractions, then build itinerary. This took 12 seconds per request because each API call waited for the previous one to complete, even when the calls had no data dependencies.

Decision: The team enabled parallel function calling in the API configuration and wrote clear tool descriptions that specified input/output dependencies. The model learned to call search_flights and search_hotels in parallel (independent), then get_attractions sequentially (depends on flight dates), then create_itinerary (depends on all prior results).

Result: Average request latency dropped from 12 seconds to 6 seconds. The model correctly identified parallelizable calls in 94% of requests without any explicit orchestration logic.

Lesson: Clear tool descriptions with explicit input/output specifications let models discover parallelism naturally, often eliminating the need for hand-coded orchestration graphs.

3. Open-Source Function Calling

Open-source models have rapidly closed the gap in function calling capability. Models like Llama 3.1 (with tool use training), Mistral's function calling models, and Qwen 2.5 support structured tool interactions. These models can be served through vLLM, Ollama, or TGI with OpenAI-compatible API endpoints, making them drop-in replacements for many use cases. The trade-off is typically reliability: frontier models handle complex multi-tool scenarios more robustly, while open-source models may require more careful prompt engineering and schema design.

For teams that need to keep data on-premises or require custom fine-tuning for domain-specific tools, open-source function calling models provide a viable path. Fine-tuning on examples of your specific tool schemas and usage patterns can bring open-source models to near-frontier reliability for a constrained tool set. The ToolACE and Gorilla projects have demonstrated that targeted training on tool-use data can produce highly capable tool-using models from relatively small base models.

Warning

Not all "function calling" implementations are equal. Some open-source models format tool calls as JSON within their text output rather than as structured API responses. This means you need a reliable JSON parser that handles malformed output, partial responses, and edge cases like nested quotes. Always test your tool calling pipeline with adversarial inputs that are likely to produce malformed JSON.

Exercises

Exercise 23.1.1: Function Schema Design Conceptual

Write a JSON schema for a search_products function that takes a query string, an optional category filter, and a maximum number of results (default 10). Follow the OpenAI function calling format.

Answer Sketch

Use {"name": "search_products", "parameters": {"type": "object", "properties": {"query": {"type": "string"}, "category": {"type": "string"}, "max_results": {"type": "integer", "default": 10}}, "required": ["query"]}}. The description field should clearly explain what the function does so the model can decide when to call it.

Exercise 23.1.2: Multi-Provider Function Calling Coding

Implement the same tool (a weather lookup) using both the OpenAI and Anthropic function calling APIs. Compare the request/response formats and identify the key differences.

Answer Sketch

OpenAI uses tools with function type in the request and returns tool_calls in the response. Anthropic uses tools with input_schema and returns tool_use content blocks. Key differences: Anthropic returns tool calls as content blocks within the message; OpenAI uses a separate tool_calls field. Both require sending tool results back in subsequent messages.

Exercise 23.1.3: Parallel Tool Calls Coding

Write code that handles parallel tool calls from an LLM response. The model returns three tool calls simultaneously; your code should execute all three concurrently using asyncio.gather() and return the results.

Answer Sketch

Parse all tool calls from the response. Create async wrapper functions for each tool execution. Use results = await asyncio.gather(*[execute_tool(tc) for tc in tool_calls]). Map results back to their tool call IDs and send them all in the next message as separate tool result entries.

Exercise 23.1.4: Open-Source Function Calling Conceptual

Compare function calling capabilities between proprietary models (GPT-4, Claude) and open-source models (Llama, Mistral). What are the main challenges when using open-source models for tool use?

Answer Sketch

Open-source models may not natively support structured tool call output, requiring custom prompt formatting and output parsing. They may hallucinate tool names or produce malformed JSON arguments. Fine-tuned variants (e.g., Gorilla, NexusRaven) improve reliability but may lag behind proprietary models in handling complex multi-tool scenarios. Testing and validation are more important with open-source models.

Exercise 23.1.5: Tool Call Error Handling Conceptual

An agent calls a tool and receives an error response. Describe two strategies for handling this: one where the agent retries, and one where it adapts its approach. When is each strategy appropriate?

Answer Sketch

Retry: appropriate for transient errors (network timeouts, rate limits). The agent waits and retries with the same arguments. Adapt: appropriate for semantic errors (invalid arguments, resource not found). The agent interprets the error message, adjusts its approach (e.g., tries a different search query), and calls a different tool or the same tool with modified arguments.

Tip: Validate Tool Inputs Before Execution

Always validate the parameters the model generates before calling the actual tool. Check types, ranges, and required fields. Models frequently produce plausible but invalid inputs (wrong date formats, out-of-range values, missing required fields).

Key Takeaways

Function calling provides schema-guaranteed structured output, unlike raw JSON generation which can produce malformed results.
OpenAI and Anthropic implement the same concept (structured tool invocation) with different API shapes.
Always validate tool call arguments server-side, even though function calling enforces schemas; defense in depth applies.

Self-Check

Q1: What is function calling, and how does it differ from asking the LLM to output JSON directly?

Show Answer

Function calling is a provider-native mechanism where the model outputs structured tool invocations that are guaranteed to match a declared schema. Unlike raw JSON output, function calling uses constrained decoding to ensure valid schemas, handles parameter types, and integrates tool results back into the conversation automatically.

Q2: What is the main architectural difference between OpenAI's function calling and Anthropic's tool use?

Show Answer

OpenAI uses a dedicated 'tools' parameter with 'function' type definitions and returns tool calls in a special message role. Anthropic uses a 'tools' array with 'input_schema' definitions and returns tool use in content blocks within the assistant message. The core concept is identical, but the API shapes differ.

What Comes Next

In the next section, Model Context Protocol (MCP), we examine how MCP standardizes the connection between agents and tools, enabling a shared ecosystem of tool servers that any agent can use.

References and Further Reading

Function Calling and Tool Use

Schick, T., Dwivedi-Yu, J., Dessi, R., et al. (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools." NeurIPS 2023.

The foundational paper on self-supervised tool use, where an LLM learns to decide when and how to call external APIs by generating tool-call tokens during text generation.

Paper

Qin, Y., Liang, S., Ye, Y., et al. (2023). "ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs." NeurIPS 2023.

Introduces a framework for training LLMs to use a massive collection of real-world APIs, including the ToolBench benchmark for evaluating tool-use capability at scale.

Paper

Patil, S.G., Zhang, T., Wang, X., et al. (2023). "Gorilla: Large Language Model Connected with Massive APIs." arXiv preprint.

Demonstrates a fine-tuned LLM for accurate API call generation with retrieval-augmented training, achieving high accuracy across cloud provider APIs.

Paper

Multi-Tool Orchestration

OpenAI (2024). "Function Calling Guide." OpenAI Platform Documentation.

The official OpenAI documentation for function calling, covering parallel tool calls, structured outputs, and best practices for tool schema design.

Documentation

Anthropic (2024). "Tool Use (Function Calling)." Anthropic Documentation.

Anthropic's guide to implementing tool use with Claude, covering JSON schema definitions, tool choice modes, and error handling patterns.

Documentation

Shen, Y., Song, K., Tan, X., et al. (2023). "HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face." NeurIPS 2023.

Uses an LLM as a controller to orchestrate multiple specialized models from Hugging Face, demonstrating multi-tool orchestration for complex AI task pipelines.

Paper