Section N.5: Advanced Patterns: Delegation, Memory, and Callbacks | Building Conversational AI with LLMs and Agents

Big Picture

Beyond the basics of agents, tasks, and crews lies a rich set of advanced patterns. This section covers delegation mechanics (how agents ask each other for help), the three memory systems (short-term, long-term, and entity), callback hooks for observability and control, task guardrails for output validation, and strategies for testing crews. These patterns are what separate toy demos from production-grade multi-agent systems.

1. Agent Delegation in Depth

Delegation allows an agent to ask another agent in the crew to handle a subtask. When allow_delegation=True (the default), CrewAI: Multi-Agent Orchestration adds two special tools to the agent's toolkit: Delegate Work and Ask Question. The agent can use "Delegate Work" to hand off a complete subtask to a coworker, or "Ask Question" to get a specific piece of information from another agent without fully delegating the task.

The following example shows a lead researcher who delegates specific research areas to a specialist.

from crewai import Agent, Task, Crew

specialist = Agent(
 role="AI Safety Researcher",
 goal="Provide expert analysis on AI safety topics",
 backstory=(
 "You are a published AI safety researcher with expertise in "
 "alignment, interpretability, and red-teaming."
 ),
 allow_delegation=False, # Leaf agent: does not delegate further
)

lead = Agent(
 role="Lead Research Coordinator",
 goal="Produce a comprehensive AI industry report",
 backstory=(
 "You coordinate a team of specialists. You identify which "
 "questions require domain expertise and delegate accordingly."
 ),
 allow_delegation=True, # Can delegate to specialist
)

report_task = Task(
 description=(
 "Write a report on the state of AI in 2025, covering capabilities, "
 "safety concerns, and regulatory developments. Delegate the safety "
 "analysis to the appropriate team member."
 ),
 expected_output="A 3-section report covering capabilities, safety, and regulation.",
 agent=lead,
)

crew = Crew(agents=[lead, specialist], tasks=[report_task])
# During execution, 'lead' may call:
# Delegate Work to "AI Safety Researcher": "Analyze current AI safety challenges..."
# or Ask Question to "AI Safety Researcher": "What are the top 3 alignment approaches?"

Warning

Delegation chains can grow deep if multiple agents delegate to each other. To prevent runaway delegation, set allow_delegation=False on "leaf" agents that should do their own work. Only enable delegation on coordinator or manager agents.

2. Short-Term Memory

Short-term memory stores information within a single crew execution. It allows agents to recall earlier observations, tool results, and reasoning steps from the current run. This is implemented using RAG (retrieval-augmented generation): recent interactions are embedded and stored in a vector store, then retrieved when relevant to the current step. Short-term memory is enabled automatically when you set memory=True on the crew.

from crewai import Agent, Task, Crew

researcher = Agent(
 role="Iterative Researcher",
 goal="Build comprehensive knowledge through multiple search rounds",
 backstory="You remember what you have found and avoid redundant searches.",
)

# Three tasks that benefit from short-term memory:
# the agent remembers results from earlier tasks
task1 = Task(
 description="Find the top 3 open-source LLM frameworks by GitHub stars.",
 expected_output="A list of 3 frameworks with star counts.",
 agent=researcher,
)
task2 = Task(
 description="For each framework found earlier, identify the primary programming language.",
 expected_output="Framework names with their primary languages.",
 agent=researcher,
 context=[task1],
)
task3 = Task(
 description="Compare the frameworks on ease of use, documentation quality, and community size.",
 expected_output="A comparison table.",
 agent=researcher,
 context=[task1, task2],
)

crew = Crew(
 agents=[researcher],
 tasks=[task1, task2, task3],
 memory=True, # Enables short-term (and other) memory types
)

3. Long-Term Memory

Long-term memory persists across multiple crew executions, stored on disk. When enabled, CrewAI: Multi-Agent Orchestration saves task results and agent interactions to a local database. On subsequent runs, agents can recall what they learned previously, improving over time. This is particularly useful for crews that run repeatedly on similar topics (e.g., a daily news analysis crew).

from crewai import Crew

crew = Crew(
 agents=[researcher, writer],
 tasks=[research_task, write_task],
 memory=True, # Enables all memory types including long-term
 verbose=True,
)

# Run 1: First execution; no prior memory
result1 = crew.kickoff(inputs={"topic": "transformer architectures"})

# Run 2: Agents can recall insights from Run 1
result2 = crew.kickoff(inputs={"topic": "attention mechanisms"})
# The researcher may recall: "In my previous research on transformer architectures,
# I found that attention mechanisms are the core innovation..."

Tip

Long-term memory is stored locally by default. For production deployments where multiple instances share state, configure an external vector store as the memory backend. Check the CrewAI: Multi-Agent Orchestration documentation for supported backends.

4. Entity Memory

Entity memory tracks specific entities (people, companies, products, concepts) that agents encounter during execution. CrewAI: Multi-Agent Orchestration automatically extracts entity information from agent interactions and stores it in a structured format. When the same entity appears in a later task, the agent can recall what it already knows about that entity.

from crewai import Agent, Task, Crew

analyst = Agent(
 role="Company Analyst",
 goal="Track and analyze technology companies",
 backstory="You maintain detailed profiles of companies you research.",
)

# Task 1: Research a company
profile_task = Task(
 description="Create a detailed profile of Anthropic, including founding date, key products, and funding.",
 expected_output="A company profile with at least 5 data points.",
 agent=analyst,
)

# Task 2: Later, the agent recalls Anthropic from entity memory
comparison_task = Task(
 description="Compare Anthropic with OpenAI on product strategy and safety approach.",
 expected_output="A side-by-side comparison table.",
 agent=analyst,
 context=[profile_task],
)

crew = Crew(
 agents=[analyst],
 tasks=[profile_task, comparison_task],
 memory=True, # Entity memory is part of the memory system
)

5. Crew Callbacks and Step Callbacks

Callbacks provide hooks into the crew execution lifecycle. CrewAI: Multi-Agent Orchestration supports several callback types: task callbacks (covered in Section N.2), step callbacks that fire after every agent reasoning step, and crew-level callbacks that fire on overall completion. These are essential for logging, monitoring dashboards, cost tracking, and integration with external systems.

from crewai import Agent, Task, Crew

def on_step(step_output):
 """Called after every agent reasoning step."""
 print(f"[Step] Agent: {step_output.agent}")
 print(f"[Step] Output preview: {str(step_output.output)[:100]}")
 # Track token usage, log to monitoring system, etc.

def on_task_complete(task_output):
 """Called when a task finishes."""
 print(f"[Task Complete] Length: {len(task_output.raw)} chars")

researcher = Agent(
 role="Researcher",
 goal="Research topics",
 backstory="You are a researcher.",
 step_callback=on_step, # Fires after each reasoning step
)

task = Task(
 description="Research the latest trends in edge AI deployment.",
 expected_output="A summary of 3 key trends.",
 agent=researcher,
 callback=on_task_complete, # Fires when this task completes
)

crew = Crew(agents=[researcher], tasks=[task])

[Step] Agent: Researcher [Step] Output preview: I need to search for the latest trends in edge AI deployment. Let me use web search... [Step] Agent: Researcher [Step] Output preview: Based on my research, the three key trends are: 1) On-device LLM inference using... [Task Complete] Length: 1247 chars

Tip

Step callbacks are powerful for cost monitoring. Track the cumulative token count in a step callback and raise an exception (or gracefully stop the crew) if costs exceed a budget threshold. This prevents runaway spending during development.

6. Task Guardrails

Task guardrails are validation functions that check an agent's output before it is accepted. If the guardrail rejects the output, CrewAI: Multi-Agent Orchestration sends the agent feedback and asks it to try again. Guardrails are useful for enforcing format requirements, checking for prohibited content, or validating that structured output conforms to business rules.

from crewai import Agent, Task
from crewai.tasks.task_output import TaskOutput

def validate_word_count(output: TaskOutput) -> tuple[bool, str]:
 """Guardrail: ensure the output is at least 200 words."""
 word_count = len(output.raw.split())
 if word_count < 200:
 return (False, f"Output is only {word_count} words. Please expand to at least 200 words.")
 return (True, "")

def validate_no_pii(output: TaskOutput) -> tuple[bool, str]:
 """Guardrail: reject output containing email addresses."""
 import re
 if re.search(r'[\w.+-]+@[\w-]+\.[\w.-]+', output.raw):
 return (False, "Output contains an email address. Please remove all PII.")
 return (True, "")

writer = Agent(
 role="Content Writer",
 goal="Write detailed, privacy-safe content",
 backstory="You write thorough content and never include personal information.",
)

guarded_task = Task(
 description="Write a product description for our new AI analytics platform.",
 expected_output="A product description of at least 200 words, free of any PII.",
 agent=writer,
 guardrails=[validate_word_count, validate_no_pii], # Both must pass
)

7. Testing Crews

Testing multi-agent systems is challenging because outputs are non-deterministic and execution involves external API calls. CrewAI: Multi-Agent Orchestration provides a built-in testing facility through the crew.test() method, which runs the crew multiple times and collects performance metrics. You can also test crews by mocking LLM responses and tool outputs.

from crewai import Agent, Task, Crew
from unittest.mock import patch, MagicMock

# Strategy 1: Use crew.test() for end-to-end evaluation
crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
# crew.test(n_iterations=3, openai_model_name="gpt-4o")
# This runs the crew 3 times and reports average execution time,
# token usage, and output quality metrics.

# Strategy 2: Unit test with mocked LLM
def test_research_task():
 """Test that the research task produces structured output."""
 mock_result = MagicMock()
 mock_result.raw = "1. Framework A (50k stars)\n2. Framework B (40k stars)"

 with patch.object(researcher, 'execute_task', return_value=mock_result):
 result = researcher.execute_task(research_task)
 assert "Framework A" in result.raw
 assert len(result.raw.split("\n")) >= 2

# Strategy 3: Validate crew output structure
def test_crew_output_format():
 """Validate that the crew produces the expected output format."""
 result = crew.kickoff(inputs={"topic": "test topic"})
 assert result.raw is not None
 assert len(result.raw) > 100 # Minimum length check
 assert result.token_usage.total_tokens > 0

Warning

End-to-end tests with real LLM calls are slow and costly. Use them sparingly (e.g., nightly CI runs). For development iteration, mock LLM responses and focus unit tests on your custom tools, guardrails, and callback logic.

8. CrewAI: Multi-Agent Orchestration Enterprise Features

CrewAI: Multi-Agent Orchestration offers an enterprise platform (CrewAI: Multi-Agent Orchestration+) with additional features beyond the open-source library. These include a visual crew builder, deployment infrastructure, observability dashboards, and managed memory backends. While the open-source library covers all the core orchestration features, the enterprise platform simplifies deployment and monitoring at scale.

Open Source vs. Enterprise

Feature	Open Source	CrewAI: Multi-Agent Orchestration Enterprise
Agent, Task, Crew orchestration	Yes	Yes
Sequential and hierarchical processes	Yes	Yes
Memory (short-term, long-term, entity)	Yes (local storage)	Yes (managed backends)
Visual crew builder	No	Yes
One-click deployment	No	Yes
Observability dashboard	Verbose logs only	Full dashboard with metrics
Crew versioning and rollback	Manual (Git)	Built-in

9. Production Checklist

Before deploying a CrewAI: Multi-Agent Orchestration application to production, review the following checklist. Each item addresses a common source of failures in multi-agent systems.

Set max_iter on all agents. Prevent infinite loops by ensuring every agent has a bounded iteration limit.
Disable delegation on leaf agents. Only coordinator agents should delegate. Set allow_delegation=False on agents that perform concrete work.
Configure max_rpm. Protect against rate limiting by setting a crew-level request cap below your API quota.
Add guardrails to critical tasks. Validate outputs before they flow downstream. Catch format errors, PII leaks, and hallucinated data early.
Use structured output for machine-consumed results. Prefer output_pydantic for tasks whose output feeds into code rather than humans.
Implement step callbacks for cost tracking. Monitor token usage per step and set budget alerts.
Handle tool errors gracefully. Return descriptive error strings from tools; do not let unhandled exceptions crash the crew.
Test with crew.test() before deploying. Run at least 3 iterations to catch non-deterministic failures.
Set verbose=False in production. Verbose logging adds overhead; use callbacks for structured observability instead.
Pin your model versions. Specify exact model names (e.g., gpt-4o-2024-08-06) to avoid behavior changes from model updates.

Key Insight

Advanced CrewAI: Multi-Agent Orchestration patterns transform basic multi-agent demos into robust production systems. Delegation enables flexible task routing; memory systems (short-term, long-term, entity) give agents persistent knowledge; callbacks and guardrails provide observability and quality control. Use the production checklist to verify that your crew is ready for real-world deployment. With the foundations from this appendix, you can now build sophisticated multi-agent applications using CrewAI: Multi-Agent Orchestration.