Beyond the basics of agents, tasks, and crews lies a rich set of advanced patterns. This section covers delegation mechanics (how agents ask each other for help), the three memory systems (short-term, long-term, and entity), callback hooks for observability and control, task guardrails for output validation, and strategies for testing crews. These patterns are what separate toy demos from production-grade multi-agent systems.
1. Agent Delegation in Depth
Delegation allows an agent to ask another agent in the crew to handle a subtask. When
allow_delegation=True (the default), CrewAI: Multi-Agent Orchestration adds two special tools to the
agent's toolkit: Delegate Work and Ask Question. The
agent can use "Delegate Work" to hand off a complete subtask to a coworker, or "Ask
Question" to get a specific piece of information from another agent without fully
delegating the task.
The following example shows a lead researcher who delegates specific research areas to a specialist.
from crewai import Agent, Task, Crew
specialist = Agent(
role="AI Safety Researcher",
goal="Provide expert analysis on AI safety topics",
backstory=(
"You are a published AI safety researcher with expertise in "
"alignment, interpretability, and red-teaming."
),
allow_delegation=False, # Leaf agent: does not delegate further
)
lead = Agent(
role="Lead Research Coordinator",
goal="Produce a comprehensive AI industry report",
backstory=(
"You coordinate a team of specialists. You identify which "
"questions require domain expertise and delegate accordingly."
),
allow_delegation=True, # Can delegate to specialist
)
report_task = Task(
description=(
"Write a report on the state of AI in 2025, covering capabilities, "
"safety concerns, and regulatory developments. Delegate the safety "
"analysis to the appropriate team member."
),
expected_output="A 3-section report covering capabilities, safety, and regulation.",
agent=lead,
)
crew = Crew(agents=[lead, specialist], tasks=[report_task])
# During execution, 'lead' may call:
# Delegate Work to "AI Safety Researcher": "Analyze current AI safety challenges..."
# or Ask Question to "AI Safety Researcher": "What are the top 3 alignment approaches?"
Delegation chains can grow deep if multiple agents delegate to each other. To prevent runaway delegation, set allow_delegation=False on "leaf" agents that should do their own work. Only enable delegation on coordinator or manager agents.
2. Short-Term Memory
Short-term memory stores information within a single crew execution. It allows agents
to recall earlier observations, tool results, and reasoning steps from the current run.
This is implemented using RAG (retrieval-augmented generation): recent interactions are
embedded and stored in a vector store, then retrieved when relevant to the current step.
Short-term memory is enabled automatically when you set memory=True on the
crew.
from crewai import Agent, Task, Crew
researcher = Agent(
role="Iterative Researcher",
goal="Build comprehensive knowledge through multiple search rounds",
backstory="You remember what you have found and avoid redundant searches.",
)
# Three tasks that benefit from short-term memory:
# the agent remembers results from earlier tasks
task1 = Task(
description="Find the top 3 open-source LLM frameworks by GitHub stars.",
expected_output="A list of 3 frameworks with star counts.",
agent=researcher,
)
task2 = Task(
description="For each framework found earlier, identify the primary programming language.",
expected_output="Framework names with their primary languages.",
agent=researcher,
context=[task1],
)
task3 = Task(
description="Compare the frameworks on ease of use, documentation quality, and community size.",
expected_output="A comparison table.",
agent=researcher,
context=[task1, task2],
)
crew = Crew(
agents=[researcher],
tasks=[task1, task2, task3],
memory=True, # Enables short-term (and other) memory types
)
3. Long-Term Memory
Long-term memory persists across multiple crew executions, stored on disk. When enabled, CrewAI: Multi-Agent Orchestration saves task results and agent interactions to a local database. On subsequent runs, agents can recall what they learned previously, improving over time. This is particularly useful for crews that run repeatedly on similar topics (e.g., a daily news analysis crew).
from crewai import Crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task],
memory=True, # Enables all memory types including long-term
verbose=True,
)
# Run 1: First execution; no prior memory
result1 = crew.kickoff(inputs={"topic": "transformer architectures"})
# Run 2: Agents can recall insights from Run 1
result2 = crew.kickoff(inputs={"topic": "attention mechanisms"})
# The researcher may recall: "In my previous research on transformer architectures,
# I found that attention mechanisms are the core innovation..."
Long-term memory is stored locally by default. For production deployments where multiple instances share state, configure an external vector store as the memory backend. Check the CrewAI: Multi-Agent Orchestration documentation for supported backends.
4. Entity Memory
Entity memory tracks specific entities (people, companies, products, concepts) that agents encounter during execution. CrewAI: Multi-Agent Orchestration automatically extracts entity information from agent interactions and stores it in a structured format. When the same entity appears in a later task, the agent can recall what it already knows about that entity.
from crewai import Agent, Task, Crew
analyst = Agent(
role="Company Analyst",
goal="Track and analyze technology companies",
backstory="You maintain detailed profiles of companies you research.",
)
# Task 1: Research a company
profile_task = Task(
description="Create a detailed profile of Anthropic, including founding date, key products, and funding.",
expected_output="A company profile with at least 5 data points.",
agent=analyst,
)
# Task 2: Later, the agent recalls Anthropic from entity memory
comparison_task = Task(
description="Compare Anthropic with OpenAI on product strategy and safety approach.",
expected_output="A side-by-side comparison table.",
agent=analyst,
context=[profile_task],
)
crew = Crew(
agents=[analyst],
tasks=[profile_task, comparison_task],
memory=True, # Entity memory is part of the memory system
)
5. Crew Callbacks and Step Callbacks
Callbacks provide hooks into the crew execution lifecycle. CrewAI: Multi-Agent Orchestration supports several callback types: task callbacks (covered in Section N.2), step callbacks that fire after every agent reasoning step, and crew-level callbacks that fire on overall completion. These are essential for logging, monitoring dashboards, cost tracking, and integration with external systems.
from crewai import Agent, Task, Crew
def on_step(step_output):
"""Called after every agent reasoning step."""
print(f"[Step] Agent: {step_output.agent}")
print(f"[Step] Output preview: {str(step_output.output)[:100]}")
# Track token usage, log to monitoring system, etc.
def on_task_complete(task_output):
"""Called when a task finishes."""
print(f"[Task Complete] Length: {len(task_output.raw)} chars")
researcher = Agent(
role="Researcher",
goal="Research topics",
backstory="You are a researcher.",
step_callback=on_step, # Fires after each reasoning step
)
task = Task(
description="Research the latest trends in edge AI deployment.",
expected_output="A summary of 3 key trends.",
agent=researcher,
callback=on_task_complete, # Fires when this task completes
)
crew = Crew(agents=[researcher], tasks=[task])
Step callbacks are powerful for cost monitoring. Track the cumulative token count in a step callback and raise an exception (or gracefully stop the crew) if costs exceed a budget threshold. This prevents runaway spending during development.
6. Task Guardrails
Task guardrails are validation functions that check an agent's output before it is accepted. If the guardrail rejects the output, CrewAI: Multi-Agent Orchestration sends the agent feedback and asks it to try again. Guardrails are useful for enforcing format requirements, checking for prohibited content, or validating that structured output conforms to business rules.
from crewai import Agent, Task
from crewai.tasks.task_output import TaskOutput
def validate_word_count(output: TaskOutput) -> tuple[bool, str]:
"""Guardrail: ensure the output is at least 200 words."""
word_count = len(output.raw.split())
if word_count < 200:
return (False, f"Output is only {word_count} words. Please expand to at least 200 words.")
return (True, "")
def validate_no_pii(output: TaskOutput) -> tuple[bool, str]:
"""Guardrail: reject output containing email addresses."""
import re
if re.search(r'[\w.+-]+@[\w-]+\.[\w.-]+', output.raw):
return (False, "Output contains an email address. Please remove all PII.")
return (True, "")
writer = Agent(
role="Content Writer",
goal="Write detailed, privacy-safe content",
backstory="You write thorough content and never include personal information.",
)
guarded_task = Task(
description="Write a product description for our new AI analytics platform.",
expected_output="A product description of at least 200 words, free of any PII.",
agent=writer,
guardrails=[validate_word_count, validate_no_pii], # Both must pass
)
7. Testing Crews
Testing multi-agent systems is challenging because outputs are non-deterministic and
execution involves external API calls. CrewAI: Multi-Agent Orchestration provides a built-in testing facility
through the crew.test() method, which runs the crew multiple times and
collects performance metrics. You can also test crews by mocking LLM responses and
tool outputs.
from crewai import Agent, Task, Crew
from unittest.mock import patch, MagicMock
# Strategy 1: Use crew.test() for end-to-end evaluation
crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task])
# crew.test(n_iterations=3, openai_model_name="gpt-4o")
# This runs the crew 3 times and reports average execution time,
# token usage, and output quality metrics.
# Strategy 2: Unit test with mocked LLM
def test_research_task():
"""Test that the research task produces structured output."""
mock_result = MagicMock()
mock_result.raw = "1. Framework A (50k stars)\n2. Framework B (40k stars)"
with patch.object(researcher, 'execute_task', return_value=mock_result):
result = researcher.execute_task(research_task)
assert "Framework A" in result.raw
assert len(result.raw.split("\n")) >= 2
# Strategy 3: Validate crew output structure
def test_crew_output_format():
"""Validate that the crew produces the expected output format."""
result = crew.kickoff(inputs={"topic": "test topic"})
assert result.raw is not None
assert len(result.raw) > 100 # Minimum length check
assert result.token_usage.total_tokens > 0
End-to-end tests with real LLM calls are slow and costly. Use them sparingly (e.g., nightly CI runs). For development iteration, mock LLM responses and focus unit tests on your custom tools, guardrails, and callback logic.
8. CrewAI: Multi-Agent Orchestration Enterprise Features
CrewAI: Multi-Agent Orchestration offers an enterprise platform (CrewAI: Multi-Agent Orchestration+) with additional features beyond the open-source library. These include a visual crew builder, deployment infrastructure, observability dashboards, and managed memory backends. While the open-source library covers all the core orchestration features, the enterprise platform simplifies deployment and monitoring at scale.
| Feature | Open Source | CrewAI: Multi-Agent Orchestration Enterprise |
|---|---|---|
| Agent, Task, Crew orchestration | Yes | Yes |
| Sequential and hierarchical processes | Yes | Yes |
| Memory (short-term, long-term, entity) | Yes (local storage) | Yes (managed backends) |
| Visual crew builder | No | Yes |
| One-click deployment | No | Yes |
| Observability dashboard | Verbose logs only | Full dashboard with metrics |
| Crew versioning and rollback | Manual (Git) | Built-in |
9. Production Checklist
Before deploying a CrewAI: Multi-Agent Orchestration application to production, review the following checklist. Each item addresses a common source of failures in multi-agent systems.
- Set
max_iteron all agents. Prevent infinite loops by ensuring every agent has a bounded iteration limit. - Disable delegation on leaf agents. Only coordinator agents should delegate. Set
allow_delegation=Falseon agents that perform concrete work. - Configure
max_rpm. Protect against rate limiting by setting a crew-level request cap below your API quota. - Add guardrails to critical tasks. Validate outputs before they flow downstream. Catch format errors, PII leaks, and hallucinated data early.
- Use structured output for machine-consumed results. Prefer
output_pydanticfor tasks whose output feeds into code rather than humans. - Implement step callbacks for cost tracking. Monitor token usage per step and set budget alerts.
- Handle tool errors gracefully. Return descriptive error strings from tools; do not let unhandled exceptions crash the crew.
- Test with
crew.test()before deploying. Run at least 3 iterations to catch non-deterministic failures. - Set
verbose=Falsein production. Verbose logging adds overhead; use callbacks for structured observability instead. - Pin your model versions. Specify exact model names (e.g.,
gpt-4o-2024-08-06) to avoid behavior changes from model updates.
Advanced CrewAI: Multi-Agent Orchestration patterns transform basic multi-agent demos into robust production systems. Delegation enables flexible task routing; memory systems (short-term, long-term, entity) give agents persistent knowledge; callbacks and guardrails provide observability and quality control. Use the production checklist to verify that your crew is ready for real-world deployment. With the foundations from this appendix, you can now build sophisticated multi-agent applications using CrewAI: Multi-Agent Orchestration.