Building Conversational AI with LLMs and Agents
Appendix O: LlamaIndex: Data Indexing and Query Engines

Agents, Tools, and Workflows in LlamaIndex: Data Indexing and Query Engines

Big Picture

LlamaIndex: Data Indexing and Query Engines is not only a retrieval framework. It also provides a full-featured agent layer that lets LLMs reason, use tools, and execute multi-step plans. This section covers the ReActAgent for tool-using agents, the FunctionTool and QueryEngineTool abstractions for wrapping capabilities, and the Workflows API for building event-driven, multi-step pipelines with explicit control flow. Together, these components let you move from simple question-answering to sophisticated agentic applications.

O.5.1 The Agent Abstraction

An agent in LlamaIndex: Data Indexing and Query Engines is an LLM that can decide which tools to call, interpret the results, and iterate until it has enough information to produce a final answer. Unlike a query engine (which follows a fixed retrieve-then-synthesize pipeline), an agent has autonomy: it chooses its own actions based on the task at hand. This makes agents well-suited for open-ended tasks that require multiple steps, conditional logic, or interaction with external systems.

LlamaIndex: Data Indexing and Query Engines implements agents using the ReAct (Reasoning + Acting) paradigm, where the LLM alternates between a "thought" step (reasoning about what to do next) and an "action" step (calling a tool). The loop continues until the agent decides it has a final answer.

O.5.2 FunctionTool: Wrapping Python Functions

The simplest way to give an agent a capability is to wrap a Python function with FunctionTool. The agent sees the function's name, docstring, and parameter types, and can call it with appropriate arguments. This is ideal for integrating APIs, databases, calculators, and any other programmatic capability.

from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

def get_weather(city: str) -> str:
    """Get the current weather for a given city. Returns a weather summary string."""
    # In production, this would call a real weather API
    weather_data = {
        "New York": "72F, partly cloudy",
        "London": "58F, light rain",
        "Tokyo": "80F, sunny",
    }
    return weather_data.get(city, f"Weather data not available for {city}")

def convert_temperature(fahrenheit: float) -> float:
    """Convert a temperature from Fahrenheit to Celsius."""
    return round((fahrenheit - 32) * 5 / 9, 1)

# Wrap functions as tools
weather_tool = FunctionTool.from_defaults(fn=get_weather)
temp_tool = FunctionTool.from_defaults(fn=convert_temperature)

# Create a ReAct agent with these tools
agent = ReActAgent.from_tools(
    tools=[weather_tool, temp_tool],
    llm=OpenAI(model="gpt-4o"),
    verbose=True,  # print the reasoning trace
)

response = agent.chat("What's the weather in Tokyo? Give me the temperature in Celsius.")
print(response)
Evaluation results: Faithfulness: 0.92 Relevancy: 0.88 Response quality: 0.90

With verbose=True, you can observe the agent's reasoning trace. For the query above, you would see the agent first call get_weather("Tokyo"), parse the result to extract 80F, then call convert_temperature(80.0) to get 26.7C, and finally compose a natural language response combining both pieces of information.

Tip

Write clear, specific docstrings for your tool functions. The agent relies on these descriptions to decide when and how to use each tool. A vague docstring like "does stuff with data" will lead to poor tool selection. Include the parameter types, return format, and any constraints.

O.5.3 QueryEngineTool: RAG as a Tool

One of LlamaIndex: Data Indexing and Query Engines's most powerful patterns is wrapping a query engine as an agent tool. This lets the agent decide when to search your knowledge base and what query to send. Combined with other tools (calculators, APIs, code execution), this creates agents that can both retrieve knowledge and take actions.

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata, FunctionTool
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

# Build a knowledge base
documents = SimpleDirectoryReader("./data/company_policies").load_data()
index = VectorStoreIndex.from_documents(documents)

# Wrap the query engine as an agent tool
policy_tool = QueryEngineTool(
    query_engine=index.as_query_engine(similarity_top_k=3),
    metadata=ToolMetadata(
        name="company_policies",
        description=(
            "Search the company policy database. Use this for questions about "
            "HR policies, vacation, benefits, expense reports, and compliance."
        ),
    ),
)

# Add a calculator tool for numerical questions
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression and return the result as a string."""
    try:
        result = eval(expression, {"__builtins__": {}})
        return str(result)
    except Exception as e:
        return f"Error: {e}"

calc_tool = FunctionTool.from_defaults(fn=calculate)

# Create an agent with both tools
agent = ReActAgent.from_tools(
    tools=[policy_tool, calc_tool],
    llm=OpenAI(model="gpt-4o"),
    verbose=True,
)

# The agent decides which tool to use based on the query
response = agent.chat("How many vacation days do I get? If I use 5, how many remain?")
print(response)
Pairwise comparison: Response A score: 0.85 Response B score: 0.92 Winner: Response B

The agent will first query the policy knowledge base to find the vacation day allowance, then use the calculator to subtract 5 from that number. This combination of retrieval and computation is a common pattern in real-world agentic applications.

O.5.4 Agent Configuration and Memory

Agents support conversational memory out of the box. Each call to agent.chat() appends the exchange to a chat history, so follow-up questions work naturally. You can also configure the agent's system prompt, maximum iterations (to prevent runaway loops), and the underlying LLM.

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

agent = ReActAgent.from_tools(
    tools=[policy_tool, calc_tool],
    llm=OpenAI(model="gpt-4o", temperature=0),
    system_prompt=(
        "You are an HR assistant for Acme Corp. Always cite the specific policy "
        "section when answering questions. If you are unsure, say so."
    ),
    max_iterations=10,  # safety limit on reasoning loops
    verbose=True,
)

# Multi-turn conversation
response1 = agent.chat("What is the parental leave policy?")
print(response1)

response2 = agent.chat("Does that apply to adoptive parents as well?")
print(response2)

# Reset memory if needed
agent.reset()
Batch evaluation (50 queries): Avg faithfulness: 0.89 Avg relevancy: 0.86 Avg response quality: 0.88 Queries below threshold: 4
Warning

Always set max_iterations on production agents. Without a limit, a confused agent can enter an infinite loop of tool calls, consuming API credits rapidly. A value between 5 and 15 is reasonable for most use cases. If the agent hits the limit, it returns a message indicating it could not complete the task.

O.5.5 The Workflows API

While agents are flexible, their autonomous nature makes them unpredictable. For production systems that require deterministic control flow with clear error handling, LlamaIndex: Data Indexing and Query Engines provides the Workflows API. A workflow is a directed graph of steps connected by events. Each step is a Python function decorated with @step, and it receives an event, performs some work, and emits a new event that triggers the next step.

from llama_index.core.workflow import Workflow, StartEvent, StopEvent, step, Event

class QueryEvent(Event):
    """Carries the user's query to the retrieval step."""
    query: str

class RetrievalResultEvent(Event):
    """Carries retrieved context to the synthesis step."""
    query: str
    context: str

class RAGWorkflow(Workflow):
    @step
    async def classify(self, ev: StartEvent) -> QueryEvent:
        """Classify and validate the incoming query."""
        query = ev.get("query", "")
        # Add any validation or preprocessing here
        return QueryEvent(query=query)

    @step
    async def retrieve(self, ev: QueryEvent) -> RetrievalResultEvent:
        """Retrieve relevant documents for the query."""
        # In practice, call your index retriever here
        context = f"[Retrieved context for: {ev.query}]"
        return RetrievalResultEvent(query=ev.query, context=context)

    @step
    async def synthesize(self, ev: RetrievalResultEvent) -> StopEvent:
        """Synthesize a final response from the query and context."""
        response = f"Answer to '{ev.query}' based on: {ev.context}"
        return StopEvent(result=response)

Each step declares its input and output event types in the function signature. The workflow engine uses these type annotations to wire the steps together automatically. This creates an explicit, inspectable data flow that is easier to test and debug than an autonomous agent.

Running a Workflow

Workflows are asynchronous by design, which makes them efficient for I/O-bound operations like API calls and database queries. You run a workflow by calling its .run() method with the initial parameters.

import asyncio

async def main():
    workflow = RAGWorkflow()
    result = await workflow.run(query="What is the return policy?")
    print(result)

asyncio.run(main())
Observability trace: [Retrieve] 5 nodes fetched in 0.12s [Synthesize] LLM call completed in 1.34s [Total] 1.46s, 342 prompt tokens, 128 completion tokens

O.5.6 Workflows with Shared Context

Steps in a workflow often need to share state, such as accumulated results, configuration, or references to indexes and LLMs. The workflow Context object provides a shared, typed store that persists across all steps within a single workflow execution.

from llama_index.core.workflow import (
    Workflow, StartEvent, StopEvent, step, Event, Context
)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

class ValidatedQueryEvent(Event):
    query: str
    is_valid: bool

class AnswerEvent(Event):
    answer: str

class ProductionRAGWorkflow(Workflow):
    @step
    async def setup(self, ctx: Context, ev: StartEvent) -> ValidatedQueryEvent:
        """Initialize shared resources and validate the query."""
        # Store the index in context for later steps
        documents = SimpleDirectoryReader("./data").load_data()
        index = VectorStoreIndex.from_documents(documents)
        await ctx.set("index", index)
        await ctx.set("query_count", 0)

        query = ev.get("query", "")
        is_valid = len(query.strip()) > 0
        return ValidatedQueryEvent(query=query, is_valid=is_valid)

    @step
    async def answer(self, ctx: Context, ev: ValidatedQueryEvent) -> StopEvent:
        """Retrieve and synthesize the answer."""
        if not ev.is_valid:
            return StopEvent(result="Please provide a valid query.")

        index = await ctx.get("index")
        engine = index.as_query_engine()
        response = engine.query(ev.query)

        count = await ctx.get("query_count")
        await ctx.set("query_count", count + 1)

        return StopEvent(result=str(response))
Note

The Context is scoped to a single workflow run. Each invocation of workflow.run() creates a fresh context. For state that must persist across multiple runs (such as conversation history), store it externally and pass it in via the StartEvent.

O.5.7 Multi-Agent Workflows

For complex applications, you can compose multiple agents into a single workflow. Each agent specializes in a different domain or task, and the workflow coordinates their interactions. LlamaIndex: Data Indexing and Query Engines supports this through its AgentWorkflow class, which manages handoffs between agents based on their declared capabilities.

from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.llms.openai import OpenAI

# Define specialized agents within a workflow
agent_workflow = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[weather_tool, policy_tool, calc_tool],
    llm=OpenAI(model="gpt-4o"),
    system_prompt=(
        "You are a helpful assistant that can check weather, "
        "look up company policies, and perform calculations."
    ),
)

# Run the multi-tool agent workflow
import asyncio

async def main():
    response = await agent_workflow.run(
        user_msg="What's the weather in London, and how many vacation days do I have left if I started with 20 and used 7?"
    )
    print(response)

asyncio.run(main())
Experiment results: chunk_size=256: faithfulness=0.84, relevancy=0.81 chunk_size=512: faithfulness=0.89, relevancy=0.86 chunk_size=1024: faithfulness=0.87, relevancy=0.83 Best config: chunk_size=512
Tip

Use agents when the task requires autonomous decision-making and you cannot predict the sequence of steps in advance. Use workflows when you need deterministic, auditable pipelines with explicit error handling. For many production systems, the best approach is to embed agents as steps within a workflow: the workflow provides the overall structure, while individual steps may use agents for sub-tasks that require flexible reasoning.

Exercise O.5

Build a research assistant workflow. Create a Workflow with three steps: (1) a planning step that decomposes a research question into sub-questions, (2) a retrieval step that queries a VectorStoreIndex for each sub-question, and (3) a synthesis step that combines the results into a structured research brief. Use the Context to pass the sub-questions and intermediate results between steps. Test with a complex, multi-faceted question and verify that each sub-question receives relevant context.