LlamaIndex: Data Indexing and Query Engines is not only a retrieval framework. It also provides a full-featured
agent layer that lets LLMs reason, use tools, and execute multi-step plans.
This section covers the ReActAgent for tool-using agents, the
FunctionTool and QueryEngineTool abstractions for wrapping
capabilities, and the Workflows API for building event-driven, multi-step
pipelines with explicit control flow. Together, these components let you move from simple
question-answering to sophisticated agentic applications.
O.5.1 The Agent Abstraction
An agent in LlamaIndex: Data Indexing and Query Engines is an LLM that can decide which tools to call, interpret the results, and iterate until it has enough information to produce a final answer. Unlike a query engine (which follows a fixed retrieve-then-synthesize pipeline), an agent has autonomy: it chooses its own actions based on the task at hand. This makes agents well-suited for open-ended tasks that require multiple steps, conditional logic, or interaction with external systems.
LlamaIndex: Data Indexing and Query Engines implements agents using the ReAct (Reasoning + Acting) paradigm, where the LLM alternates between a "thought" step (reasoning about what to do next) and an "action" step (calling a tool). The loop continues until the agent decides it has a final answer.
O.5.2 FunctionTool: Wrapping Python Functions
The simplest way to give an agent a capability is to wrap a Python function with
FunctionTool. The agent sees the function's name, docstring, and parameter types,
and can call it with appropriate arguments. This is ideal for integrating APIs, databases,
calculators, and any other programmatic capability.
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
def get_weather(city: str) -> str:
"""Get the current weather for a given city. Returns a weather summary string."""
# In production, this would call a real weather API
weather_data = {
"New York": "72F, partly cloudy",
"London": "58F, light rain",
"Tokyo": "80F, sunny",
}
return weather_data.get(city, f"Weather data not available for {city}")
def convert_temperature(fahrenheit: float) -> float:
"""Convert a temperature from Fahrenheit to Celsius."""
return round((fahrenheit - 32) * 5 / 9, 1)
# Wrap functions as tools
weather_tool = FunctionTool.from_defaults(fn=get_weather)
temp_tool = FunctionTool.from_defaults(fn=convert_temperature)
# Create a ReAct agent with these tools
agent = ReActAgent.from_tools(
tools=[weather_tool, temp_tool],
llm=OpenAI(model="gpt-4o"),
verbose=True, # print the reasoning trace
)
response = agent.chat("What's the weather in Tokyo? Give me the temperature in Celsius.")
print(response)
With verbose=True, you can observe the agent's reasoning trace. For the query above,
you would see the agent first call get_weather("Tokyo"), parse the result to extract
80F, then call convert_temperature(80.0) to get 26.7C, and finally compose a
natural language response combining both pieces of information.
Write clear, specific docstrings for your tool functions. The agent relies on these descriptions to decide when and how to use each tool. A vague docstring like "does stuff with data" will lead to poor tool selection. Include the parameter types, return format, and any constraints.
O.5.3 QueryEngineTool: RAG as a Tool
One of LlamaIndex: Data Indexing and Query Engines's most powerful patterns is wrapping a query engine as an agent tool. This lets the agent decide when to search your knowledge base and what query to send. Combined with other tools (calculators, APIs, code execution), this creates agents that can both retrieve knowledge and take actions.
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata, FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
# Build a knowledge base
documents = SimpleDirectoryReader("./data/company_policies").load_data()
index = VectorStoreIndex.from_documents(documents)
# Wrap the query engine as an agent tool
policy_tool = QueryEngineTool(
query_engine=index.as_query_engine(similarity_top_k=3),
metadata=ToolMetadata(
name="company_policies",
description=(
"Search the company policy database. Use this for questions about "
"HR policies, vacation, benefits, expense reports, and compliance."
),
),
)
# Add a calculator tool for numerical questions
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression and return the result as a string."""
try:
result = eval(expression, {"__builtins__": {}})
return str(result)
except Exception as e:
return f"Error: {e}"
calc_tool = FunctionTool.from_defaults(fn=calculate)
# Create an agent with both tools
agent = ReActAgent.from_tools(
tools=[policy_tool, calc_tool],
llm=OpenAI(model="gpt-4o"),
verbose=True,
)
# The agent decides which tool to use based on the query
response = agent.chat("How many vacation days do I get? If I use 5, how many remain?")
print(response)
The agent will first query the policy knowledge base to find the vacation day allowance, then use the calculator to subtract 5 from that number. This combination of retrieval and computation is a common pattern in real-world agentic applications.
O.5.4 Agent Configuration and Memory
Agents support conversational memory out of the box. Each call to agent.chat()
appends the exchange to a chat history, so follow-up questions work naturally. You can also
configure the agent's system prompt, maximum iterations (to prevent runaway loops), and the
underlying LLM.
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
agent = ReActAgent.from_tools(
tools=[policy_tool, calc_tool],
llm=OpenAI(model="gpt-4o", temperature=0),
system_prompt=(
"You are an HR assistant for Acme Corp. Always cite the specific policy "
"section when answering questions. If you are unsure, say so."
),
max_iterations=10, # safety limit on reasoning loops
verbose=True,
)
# Multi-turn conversation
response1 = agent.chat("What is the parental leave policy?")
print(response1)
response2 = agent.chat("Does that apply to adoptive parents as well?")
print(response2)
# Reset memory if needed
agent.reset()
Always set max_iterations on production agents. Without a limit, a confused
agent can enter an infinite loop of tool calls, consuming API credits rapidly. A value between
5 and 15 is reasonable for most use cases. If the agent hits the limit, it returns a message
indicating it could not complete the task.
O.5.5 The Workflows API
While agents are flexible, their autonomous nature makes them unpredictable. For production
systems that require deterministic control flow with clear error handling, LlamaIndex: Data Indexing and Query Engines provides
the Workflows API. A workflow is a directed graph of steps
connected by events. Each step is a Python function decorated with
@step, and it receives an event, performs some work, and emits a new event that
triggers the next step.
from llama_index.core.workflow import Workflow, StartEvent, StopEvent, step, Event
class QueryEvent(Event):
"""Carries the user's query to the retrieval step."""
query: str
class RetrievalResultEvent(Event):
"""Carries retrieved context to the synthesis step."""
query: str
context: str
class RAGWorkflow(Workflow):
@step
async def classify(self, ev: StartEvent) -> QueryEvent:
"""Classify and validate the incoming query."""
query = ev.get("query", "")
# Add any validation or preprocessing here
return QueryEvent(query=query)
@step
async def retrieve(self, ev: QueryEvent) -> RetrievalResultEvent:
"""Retrieve relevant documents for the query."""
# In practice, call your index retriever here
context = f"[Retrieved context for: {ev.query}]"
return RetrievalResultEvent(query=ev.query, context=context)
@step
async def synthesize(self, ev: RetrievalResultEvent) -> StopEvent:
"""Synthesize a final response from the query and context."""
response = f"Answer to '{ev.query}' based on: {ev.context}"
return StopEvent(result=response)
Each step declares its input and output event types in the function signature. The workflow engine uses these type annotations to wire the steps together automatically. This creates an explicit, inspectable data flow that is easier to test and debug than an autonomous agent.
Running a Workflow
Workflows are asynchronous by design, which makes them efficient for I/O-bound operations
like API calls and database queries. You run a workflow by calling its .run()
method with the initial parameters.
import asyncio
async def main():
workflow = RAGWorkflow()
result = await workflow.run(query="What is the return policy?")
print(result)
asyncio.run(main())
O.5.6 Workflows with Shared Context
Steps in a workflow often need to share state, such as accumulated results, configuration, or
references to indexes and LLMs. The workflow Context object provides a shared,
typed store that persists across all steps within a single workflow execution.
from llama_index.core.workflow import (
Workflow, StartEvent, StopEvent, step, Event, Context
)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
class ValidatedQueryEvent(Event):
query: str
is_valid: bool
class AnswerEvent(Event):
answer: str
class ProductionRAGWorkflow(Workflow):
@step
async def setup(self, ctx: Context, ev: StartEvent) -> ValidatedQueryEvent:
"""Initialize shared resources and validate the query."""
# Store the index in context for later steps
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
await ctx.set("index", index)
await ctx.set("query_count", 0)
query = ev.get("query", "")
is_valid = len(query.strip()) > 0
return ValidatedQueryEvent(query=query, is_valid=is_valid)
@step
async def answer(self, ctx: Context, ev: ValidatedQueryEvent) -> StopEvent:
"""Retrieve and synthesize the answer."""
if not ev.is_valid:
return StopEvent(result="Please provide a valid query.")
index = await ctx.get("index")
engine = index.as_query_engine()
response = engine.query(ev.query)
count = await ctx.get("query_count")
await ctx.set("query_count", count + 1)
return StopEvent(result=str(response))
The Context is scoped to a single workflow run. Each invocation of
workflow.run() creates a fresh context. For state that must persist across
multiple runs (such as conversation history), store it externally and pass it in via
the StartEvent.
O.5.7 Multi-Agent Workflows
For complex applications, you can compose multiple agents into a single workflow. Each agent
specializes in a different domain or task, and the workflow coordinates their interactions.
LlamaIndex: Data Indexing and Query Engines supports this through its AgentWorkflow class, which manages
handoffs between agents based on their declared capabilities.
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.openai import OpenAI
# Define specialized agents within a workflow
agent_workflow = AgentWorkflow.from_tools_or_functions(
tools_or_functions=[weather_tool, policy_tool, calc_tool],
llm=OpenAI(model="gpt-4o"),
system_prompt=(
"You are a helpful assistant that can check weather, "
"look up company policies, and perform calculations."
),
)
# Run the multi-tool agent workflow
import asyncio
async def main():
response = await agent_workflow.run(
user_msg="What's the weather in London, and how many vacation days do I have left if I started with 20 and used 7?"
)
print(response)
asyncio.run(main())
Use agents when the task requires autonomous decision-making and you cannot predict the sequence of steps in advance. Use workflows when you need deterministic, auditable pipelines with explicit error handling. For many production systems, the best approach is to embed agents as steps within a workflow: the workflow provides the overall structure, while individual steps may use agents for sub-tasks that require flexible reasoning.
Build a research assistant workflow. Create a Workflow with
three steps: (1) a planning step that decomposes a research question into sub-questions,
(2) a retrieval step that queries a VectorStoreIndex for each sub-question, and
(3) a synthesis step that combines the results into a structured research brief. Use the
Context to pass the sub-questions and intermediate results between steps. Test
with a complex, multi-faceted question and verify that each sub-question receives relevant
context.