Section 26.7: Supply-Chain Security for Agent Sandboxes

"Your agent is only as secure as the least audited dependency in its container image."
Guard, Supply-Chain-Aware AI Agent

Big Picture

Agent sandboxes are only as secure as the software running inside them. Section 26.2 covered how to isolate agent execution with containers and microVMs. This section addresses a complementary threat: the software supply chain within those sandboxes. When an agent installs packages, pulls base images, or executes code from external sources, it inherits every vulnerability and backdoor in those dependencies. Supply-chain security tools (SBOM generation, vulnerability scanning, image signing, provenance attestation) provide the verification layer that ensures sandbox contents are trustworthy, not just isolated.

Prerequisites

This section builds on the sandboxed execution environments from Section 26.2 and the agent safety foundations in Section 26.1. Familiarity with Docker, container images, and basic CI/CD pipeline concepts is assumed. Understanding of tool use patterns from Chapter 22 provides helpful context for why agents need to install and execute arbitrary code.

1. Why Agent Sandboxes Need Supply-Chain Hardening

Agentic AI systems differ from traditional software in a critical way: the agent decides at runtime which packages to install, which APIs to call, and which code to execute. A coding agent asked to "build a data visualization" might install matplotlib, pandas, seaborn, and a dozen transitive dependencies. A research agent might pull a specialized NLP library the developer never anticipated. Each installed package is an attack surface: it might contain a known vulnerability (CVE), a malicious post-install script, or a dependency confusion payload that hijacks a private package name.

Traditional applications lock their dependencies at build time, audit them once, and deploy a fixed artifact. Agent sandboxes, by contrast, assemble their dependency set dynamically. This means the security posture of an agent sandbox can change between one task and the next. A sandbox that was safe for yesterday's task might be compromised by today's task if the agent installs a package with a newly disclosed vulnerability. Supply-chain hardening provides the tooling to detect, prevent, and recover from these risks.

The threat is not theoretical. In 2024, the Trivy project itself experienced a CI compromise where a malicious pull request attempted to inject code into the build pipeline. Dependency confusion attacks (where an attacker publishes a malicious package to PyPI or npm using the same name as a private internal package) have targeted organizations including Apple, Microsoft, and Tesla. For agent systems that install packages on behalf of users, these attacks are especially dangerous because the agent, not the developer, is making the installation decision.

Key Insight

The fundamental tension in agent sandbox security is between flexibility and control. Agents need to install packages to accomplish tasks. But every installed package is a potential vector for supply-chain attacks. The solution is not to prohibit package installation (that would cripple the agent) but to verify, scan, and attest every component that enters the sandbox. Think of supply-chain security as the "immigration checkpoint" for your sandbox: everything that enters gets inspected, documented, and verified before it is allowed to execute.

2. SBOM Generation with Syft

A Software Bill of Materials (SBOM) is a complete inventory of every software component in a container image or filesystem. It lists each package, its version, its source, and its license. SBOMs serve two purposes in agent security: (1) they provide the input for vulnerability scanners (you cannot scan what you have not inventoried), and (2) they create an audit trail of exactly what software was present in the sandbox when a task executed.

Syft (from Anchore) is the standard open-source tool for generating SBOMs. It supports multiple output formats (SPDX, CycloneDX, Syft JSON) and can scan container images, directories, and archives. Syft detects packages from multiple ecosystems: APK (Alpine), APT (Debian/Ubuntu), RPM (RHEL/Fedora), pip (Python), npm (Node.js), Go modules, Rust crates, and Java JARs.

#!/usr/bin/env bash
# Generate an SBOM for the agent-runner container image

# Install Syft (one-time setup)
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

# Scan the container image and produce a CycloneDX JSON SBOM
syft packages registry.example.com/agent-runner:latest \
 -o cyclonedx-json \
 > sbom-agent-runner.cdx.json

# Inspect the SBOM: count packages by ecosystem
echo "Package counts by type:"
cat sbom-agent-runner.cdx.json | \
 python3 -c "
import json, sys, collections
bom = json.load(sys.stdin)
types = collections.Counter(
 c.get('purl', '').split('/')[0].replace('pkg:', '')
 for c in bom.get('components', [])
 if c.get('purl')
)
for t, count in types.most_common():
 print(f' {t}: {count}')
"

Package counts by type: pypi: 142 deb: 89 golang: 23 npm: 12 binary: 4

Code Fragment 26.7.1: !/usr/bin/env bash

For agent sandboxes with dynamic dependencies, generate the SBOM after the agent installs its packages but before the agent executes the task code. This captures the complete dependency set for that specific task. Store the SBOM alongside the task execution log so you can retrospectively audit which packages were present when a particular task ran.

3. Vulnerability Scanning with Trivy

Trivy (from Aqua Security) is a comprehensive vulnerability scanner for containers, filesystems, git repositories, and Kubernetes clusters. Given an SBOM or a container image, Trivy matches each package against vulnerability databases (NVD, GitHub Advisory Database, OS-specific advisories) and reports known CVEs with severity ratings (CRITICAL, HIGH, MEDIUM, LOW). Trivy also detects misconfigurations (e.g., running as root, exposed secrets) and license violations.

#!/usr/bin/env bash
# Scan the agent-runner image for vulnerabilities

# Scan container image directly
trivy image registry.example.com/agent-runner:latest \
 --severity CRITICAL,HIGH \
 --format json \
 --output trivy-report.json

# Alternatively, scan from an existing SBOM
trivy sbom sbom-agent-runner.cdx.json \
 --severity CRITICAL,HIGH \
 --format table

# Scan a filesystem (useful for scanning the sandbox after package install)
trivy fs /sandbox/workspace \
 --scanners vuln,secret,misconfig \
 --severity CRITICAL,HIGH

# Exit code 1 if vulnerabilities found (useful for CI gates)
trivy image registry.example.com/agent-runner:latest \
 --exit-code 1 \
 --severity CRITICAL

Code Fragment 26.7.2: Trivy scanning an agent-runner container image and filesystem for vulnerabilities. The exit-code flag enables CI gate enforcement: the build fails if critical vulnerabilities are found. The fs scanner can be run inside the sandbox after the agent installs packages.

For agent sandboxes, integrate Trivy scanning at two points. First, scan the base image during CI to ensure it ships without known vulnerabilities. Second, scan the sandbox filesystem after the agent installs packages but before task execution. If the post-install scan finds a critical vulnerability, the sandbox runtime can block execution and return an error to the user, explaining that a required package has a known security issue.

import subprocess
import json

def scan_sandbox(workspace_path: str) -> dict:
 """Scan the agent sandbox workspace for vulnerabilities after package install."""
 result = subprocess.run(
 [
 "trivy", "fs", workspace_path,
 "--scanners", "vuln,secret",
 "--severity", "CRITICAL,HIGH",
 "--format", "json",
 "--quiet",
 ],
 capture_output=True,
 text=True,
 )

 report = json.loads(result.stdout) if result.stdout else {"Results": []}

 # Aggregate findings
 critical = []
 high = []
 for target in report.get("Results", []):
 for vuln in target.get("Vulnerabilities", []):
 entry = {
 "id": vuln["VulnerabilityID"],
 "package": vuln["PkgName"],
 "installed": vuln["InstalledVersion"],
 "fixed": vuln.get("FixedVersion", "not yet"),
 "title": vuln.get("Title", ""),
 }
 if vuln["Severity"] == "CRITICAL":
 critical.append(entry)
 else:
 high.append(entry)

 return {
 "safe": len(critical) == 0,
 "critical_count": len(critical),
 "high_count": len(high),
 "critical": critical,
 "high": high,
 }

# Usage in the sandbox runtime
scan = scan_sandbox("/sandbox/workspace")
if not scan["safe"]:
 raise SecurityError(
 f"Sandbox blocked: {scan['critical_count']} critical vulnerabilities found. "
 f"First: {scan['critical'][0]['id']} in {scan['critical'][0]['package']}"
 )

Code Fragment 26.7.3: Python wrapper for Trivy filesystem scanning, designed to run inside the agent sandbox after package installation. If critical vulnerabilities are found, task execution is blocked with a descriptive error message.

4. Image Signing and Verification with Cosign

Vulnerability scanning tells you whether a container image contains known vulnerabilities. Image signing tells you whether the image is the one you built, or whether it was tampered with in transit or at rest. Cosign (from the Sigstore project) provides keyless signing and verification of container images using short-lived certificates issued by Fulcio and recorded in the Rekor transparency log. "Keyless" means there is no long-lived signing key to manage; instead, Cosign uses OIDC identity (e.g., your CI system's workload identity) to obtain an ephemeral signing certificate.

#!/usr/bin/env bash
# Sign and verify the agent-runner container image with Cosign

# Sign the image (keyless mode, uses OIDC identity from CI)
cosign sign registry.example.com/agent-runner:latest

# Verify the signature, checking that it was signed by the expected identity
cosign verify registry.example.com/agent-runner:latest \
 --certificate-identity "https://github.com/yourorg/agent-runner/.github/workflows/build.yml@refs/heads/main" \
 --certificate-oidc-issuer "https://token.actions.githubusercontent.com"

# Attach the SBOM as an attestation to the image
cosign attest \
 --predicate sbom-agent-runner.cdx.json \
 --type cyclonedx \
 registry.example.com/agent-runner:latest

# Verify the SBOM attestation
cosign verify-attestation \
 --type cyclonedx \
 --certificate-identity "https://github.com/yourorg/agent-runner/.github/workflows/build.yml@refs/heads/main" \
 --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
 registry.example.com/agent-runner:latest

Code Fragment 26.7.4: Cosign keyless signing, verification, and SBOM attestation for an agent-runner container image. The certificate-identity flag ensures only images signed by the expected CI workflow are accepted. The SBOM attestation binds the inventory to the specific image digest.

In the agent sandbox deployment pipeline, configure the container runtime (Kubernetes admission controller, Docker content trust, or a custom policy engine) to reject any image that lacks a valid Cosign signature. This prevents deployment of unsigned or tampered images, even if an attacker gains write access to the container registry.

Key Insight

Image signing and SBOM attestation together answer two distinct questions. The signature answers "who built this image, and has it been modified since?" The SBOM attestation answers "what is inside this image?" By requiring both, you establish a chain of trust from the CI build to the production sandbox. An attacker who replaces the image is caught by the signature check. An attacker who injects a vulnerable dependency is caught by the SBOM scan. Neither control alone is sufficient; together they provide defense in depth.

5. The SLSA Framework: Provenance and Build Integrity

SLSA (Supply-chain Levels for Software Artifacts, pronounced "salsa") is a framework for ensuring the integrity of software artifacts throughout the build and delivery process. SLSA defines four levels of increasing assurance:

SLSA Level 1: Build process is documented (provenance exists but is not verified).
SLSA Level 2: Build is executed on a hosted platform with signed provenance (tamper-evident).
SLSA Level 3: Build is executed on a hardened platform with non-falsifiable provenance (tamper-resistant). The build environment is ephemeral and isolated.
SLSA Level 4: Hermetic builds with two-person review of all changes. The highest assurance level, requiring reproducible builds and verified source provenance.

For agent sandbox images, SLSA Level 3 is a practical target. GitHub Actions and Google Cloud Build both support SLSA Level 3 provenance generation through the slsa-github-generator and slsa-verifier tools. The provenance document records which source repository was built, which commit was checked out, which builder was used, and which build steps were executed. This document is signed and attached to the container image as an in-toto attestation.

# .github/workflows/build-agent-runner.yml
# GitHub Actions workflow with SLSA Level 3 provenance

name: Build Agent Runner
on:
 push:
 branches: [main]

jobs:
 build:
 runs-on: ubuntu-latest
 permissions:
 contents: read
 packages: write
 id-token: write # Required for keyless signing

 steps:
 - uses: actions/checkout@v4

 - name: Build container image
 run: |
 docker build -t registry.example.com/agent-runner:${{ github.sha }} .

 - name: Generate SBOM
 uses: anchore/sbom-action@v0
 with:
 image: registry.example.com/agent-runner:${{ github.sha }}
 format: cyclonedx-json
 output-file: sbom.cdx.json

 - name: Scan for vulnerabilities
 uses: aquasecurity/trivy-action@master
 with:
 image-ref: registry.example.com/agent-runner:${{ github.sha }}
 severity: CRITICAL,HIGH
 exit-code: "1"

 - name: Push image
 run: docker push registry.example.com/agent-runner:${{ github.sha }}

 - name: Sign image and attach SBOM
 uses: sigstore/cosign-installer@v3
 - run: |
 cosign sign registry.example.com/agent-runner:${{ github.sha }}
 cosign attest --predicate sbom.cdx.json --type cyclonedx \
 registry.example.com/agent-runner:${{ github.sha }}

 provenance:
 needs: build
 uses: slsa-framework/slsa-github-generator/.github/workflows/generator_container_slsa3.yml@v2.0.0
 with:
 image: registry.example.com/agent-runner
 digest: ${{ needs.build.outputs.digest }}

Code Fragment 26.7.5: GitHub Actions workflow that builds an agent-runner image with SLSA Level 3 provenance. The pipeline generates an SBOM (Syft via sbom-action), scans for vulnerabilities (Trivy), signs the image (Cosign), attaches the SBOM as an attestation, and generates SLSA provenance.

6. CI Hardening: Immutable Images and Pinned Dependencies

Beyond scanning and signing, the CI pipeline itself must be hardened against supply-chain attacks. Key practices for agent sandbox images include:

Immutable base images. Pin the base image to a specific digest, not a mutable tag. FROM python:3.12-slim@sha256:abcd1234... ensures that every build uses the exact same base image. A tag like python:3.12-slim can be updated by the image publisher at any time, silently introducing new packages or vulnerabilities.

Pinned dependencies. Lock all Python dependencies with pip freeze or a lockfile tool (pip-tools, Poetry, uv). Lock all system packages with version-pinned apt-get install commands. This ensures reproducible builds: the same Dockerfile always produces the same image, regardless of when or where it is built.

Minimal images. Use distroless or Alpine-based images that contain only the runtime dependencies. Fewer packages mean a smaller attack surface. An agent-runner image does not need a compiler, a shell, or system administration tools. If the agent needs to install packages at runtime, provide a curated set of allowed packages rather than unrestricted pip access.

# Dockerfile for a hardened agent-runner sandbox
# Stage 1: Build dependencies in a full image
FROM python:3.12-slim@sha256:f5a1c3e5b2d8... AS builder

WORKDIR /build
COPY requirements.lock .
RUN pip install --no-cache-dir --target=/deps -r requirements.lock

# Stage 2: Runtime with minimal base
FROM gcr.io/distroless/python3-debian12@sha256:a9b8c7d6...

# Copy only the installed packages, no build tools
COPY --from=builder /deps /deps
ENV PYTHONPATH=/deps

# Copy the sandbox runtime code
COPY sandbox_runtime/ /app/

# Run as non-root
USER nonroot:nonroot

# No shell available in distroless; exec form only
ENTRYPOINT ["python3", "/app/main.py"]

Code Fragment 26.7.6: Multi-stage Dockerfile for a hardened agent sandbox. The build stage installs pinned dependencies; the runtime stage uses a distroless base image with no shell, no package manager, and no build tools. The image is pinned by digest to prevent silent base image updates.

Real-World Scenario: Dependency Allowlist for Agent Sandboxes

Who: A security engineer at a data analytics platform where AI agents executed user-requested Python analyses in sandboxed containers.

Situation: Agents frequently needed to install Python packages (pandas, scikit-learn, matplotlib) to fulfill analysis requests. The sandbox had unrestricted pip install access, allowing agents to install any package from PyPI at runtime.

Problem: A security audit revealed that an agent had installed a typosquatted package (reqeusts instead of requests) that contained a credential-harvesting payload. Although the sandbox network restrictions prevented exfiltration, the incident exposed a supply-chain attack surface that could bypass other defenses.

Decision: The team implemented a dependency allowlist: the sandbox runtime intercepted all pip install requests and checked each package name and version against a curated list of 340 approved packages. Packages not on the list were rejected with an explanatory message suggesting the closest approved alternative. The security team reviewed and updated the allowlist monthly.

Result: Zero supply-chain incidents in the six months following deployment. Agents successfully completed 97% of analysis requests using only approved packages. The remaining 3% were escalated to human analysts who could approve new packages after review.

Lesson: A curated dependency allowlist balances agent flexibility with supply-chain security by ensuring that no unvetted code runs in production sandboxes.

7. Security Scanning on Model Hubs

Supply-chain security for agent systems extends beyond code packages to the models themselves. Model files, particularly those serialized with Python's pickle format, can contain arbitrary executable code that runs during deserialization. A malicious model file downloaded from a public hub can execute a reverse shell, install a backdoor, or exfiltrate environment variables the moment it is loaded with torch.load() or pickle.load(). This risk applies to any agent system that downloads models at runtime or uses community-contributed model weights.

Hugging Face's malware scanning infrastructure provides the first line of defense. Every file uploaded to the Hugging Face Hub is scanned for known malware signatures, suspicious pickle opcodes, and embedded executable payloads. Models flagged as unsafe are quarantined and marked with a warning banner. Hugging Face also promotes the safetensors format, which stores only tensor data (no executable code) and is immune to pickle-based deserialization attacks. When possible, always prefer safetensors over pickle-based formats (.bin, .pt, .pkl) for model weights.

For defense beyond what the hub provides, third-party scanning tools offer deeper inspection and can be integrated into your own CI pipeline or sandbox runtime:

Model Security Scanner Comparison

Tool	What It Detects	Integration Point
HF Hub Scanner	Malware signatures, unsafe pickle opcodes, known attack patterns	Automatic on upload; warnings displayed on model cards
ModelScan (ProtectAI)	Malicious code in serialized ML models (pickle, SavedModel, ONNX)	CI pipeline, pre-deployment gate, sandbox ingestion
Fickling	Decompiles pickle files to reveal embedded Python code; detects code injection	Manual audit, CI pipeline for pickle-format models
Trivy (container scan)	OS and language-package CVEs in the model-serving container	CI pipeline, runtime scanning of serving infrastructure
Grype (Anchore)	Vulnerabilities in container images and SBOMs; complements Trivy	CI pipeline, admission controller policy

Dependency hygiene for model deployments. Pin model versions by commit hash or revision ID, not by mutable branch names like main. Verify file integrity with SHA-256 checksums after download. Maintain a Software Bill of Materials (SBOM) that includes model files alongside code dependencies, documenting the exact model revision, its source repository, and the scanner results at ingestion time. For organizations with strict compliance requirements, generate hash attestations for every model artifact entering production.

Continuous monitoring post-deployment. Model drift detection serves a dual purpose: it flags both performance degradation and potential security compromise. If a model's output distribution shifts unexpectedly (without corresponding changes in input distribution or model version), this may indicate that the model file was tampered with, that a dependency was silently updated, or that the serving infrastructure was compromised. Integrate drift detection metrics into your observability stack alongside traditional security monitoring. Alert on both statistical drift (KL divergence, PSI) and behavioral drift (unexpected tool calls, new output patterns) as potential security signals.

Warning

Never load a pickle-format model file from an untrusted source without scanning it first. The torch.load() function executes arbitrary Python code embedded in the file during deserialization. A single call to torch.load(untrusted_model.pt) can compromise your entire system. Use safetensors format when available, scan with ModelScan or Fickling before loading pickle files, and run model loading inside an isolated sandbox with no network access.

8. Threat Model: Real-World Supply-Chain Incidents

Understanding real-world attacks helps motivate the defenses covered in this section. The following incidents illustrate the diversity of supply-chain threats that affect agent sandboxes.

Dependency confusion (2021). Security researcher Alex Birsan demonstrated that major companies (Apple, Microsoft, PayPal, Tesla, and others) were vulnerable to dependency confusion attacks. The attack exploits package managers that check public registries before private ones. By publishing a malicious package to PyPI with the same name as an internal package, an attacker can trick pip into installing the public (malicious) version. For agent systems that install packages by name, this attack is especially relevant: the agent may request a package that matches an internal name, and the public registry returns a malicious payload.

Event-Stream compromise (2018). An attacker gained maintainer access to the popular npm package event-stream (downloaded 2 million times per week) and injected a targeted backdoor that stole cryptocurrency wallet credentials. The malicious code was hidden in a new dependency (flatmap-stream) added in a minor version update. This incident demonstrates why SBOM tracking and dependency auditing are essential: the malicious dependency appeared as a routine transitive addition that would not trigger any version-pinning violation.

Codecov CI compromise (2021). Attackers modified the Codecov Bash Uploader script (used in thousands of CI pipelines) to exfiltrate environment variables, including CI secrets, API tokens, and signing keys. This is a direct threat to agent sandbox build pipelines: if the CI pipeline is compromised, the attacker can inject malicious code into the agent-runner image, and the signed provenance will attest to the compromised build as legitimate.

Key Insight

Supply-chain attacks target trust relationships. Dependency confusion exploits trust in package names. The event-stream compromise exploited trust in maintainers. The Codecov attack exploited trust in CI tooling. Each defense in this section addresses a different trust boundary: SBOMs inventory what you trust, Trivy verifies that trust is warranted, Cosign ensures trust has not been violated in transit, and SLSA provenance verifies trust in the build process itself. A complete supply-chain security posture requires coverage at every trust boundary.

Lab: Hardened Agent-Runner Container

This lab produces a hardened agent-runner container image with a complete supply-chain security pipeline: SBOM generation, vulnerability scanning, image signing, and verification.

Steps:

Build the image using the multi-stage Dockerfile from Code Fragment 26.7.4, with all dependencies pinned by version and the base image pinned by digest.
Generate the SBOM with Syft, producing a CycloneDX JSON document that inventories every package in the image.
Scan for vulnerabilities with Trivy, failing the pipeline if any CRITICAL-severity CVEs are found.
Sign the image with Cosign using keyless signing backed by your CI system's OIDC identity.
Attach the SBOM as a Cosign attestation, binding the inventory to the image digest.
Verify everything from the deployment side: check the Cosign signature, verify the SBOM attestation, and re-scan with Trivy as a defense-in-depth measure.

Verification script:

#!/usr/bin/env bash
# verify-agent-runner.sh
# Run this before deploying the agent-runner to production

set -euo pipefail

IMAGE="registry.example.com/agent-runner:latest"
EXPECTED_IDENTITY="https://github.com/yourorg/agent-runner/.github/workflows/build.yml@refs/heads/main"
EXPECTED_ISSUER="https://token.actions.githubusercontent.com"

echo "Step 1: Verify image signature..."
cosign verify "$IMAGE" \
 --certificate-identity "$EXPECTED_IDENTITY" \
 --certificate-oidc-issuer "$EXPECTED_ISSUER"
echo "Signature verified."

echo "Step 2: Verify SBOM attestation..."
cosign verify-attestation "$IMAGE" \
 --type cyclonedx \
 --certificate-identity "$EXPECTED_IDENTITY" \
 --certificate-oidc-issuer "$EXPECTED_ISSUER" \
 | jq -r '.payload' | base64 -d | jq '.predicate' > verified-sbom.json
echo "SBOM attestation verified. $(jq '.components | length' verified-sbom.json) components found."

echo "Step 3: Scan for vulnerabilities..."
trivy image "$IMAGE" --severity CRITICAL,HIGH --exit-code 1
echo "No critical or high vulnerabilities found."

echo "All checks passed. Image is safe for deployment."

Code Fragment 26.7.7: Deployment verification script that checks the Cosign signature, verifies the SBOM attestation, and runs a fresh Trivy scan before allowing the agent-runner image into production. All three checks must pass; any failure blocks deployment.

The complete lab code, including the Dockerfile, GitHub Actions workflow, verification script, and a local testing setup using act (for running GitHub Actions locally), is available in the companion repository under labs/chapter-26/hardened-agent-runner/.

Exercises

Exercise 26.7.1: SBOM Interpretation Conceptual

Explain what information an SBOM contains and why it is a prerequisite for vulnerability scanning. What are the two main SBOM formats, and when would you choose each?

Answer Sketch

An SBOM lists every software component (package name, version, source, license) in an artifact. It is a prerequisite for vulnerability scanning because the scanner needs to know which packages are installed to match them against CVE databases. The two main formats are SPDX (originated at the Linux Foundation, often used for license compliance) and CycloneDX (originated at OWASP, optimized for security use cases). Choose SPDX when the primary concern is license compliance; choose CycloneDX when the focus is vulnerability management and security attestation.

Exercise 26.7.2: Dependency Confusion Defense Coding

Write a Python function safe_install(package_name, allowlist) that checks a package name against an allowlist before installation. The function should also verify the package exists on PyPI and check its download count (packages with very few downloads may be suspicious).

Answer Sketch

Check package_name against the allowlist dict (mapping name to allowed version range). If not in the allowlist, reject with an explanatory message. If in the allowlist, query the PyPI JSON API (https://pypi.org/pypi/{name}/json) to verify the package exists and check its recent download count via the pypistats API. Flag packages with fewer than 1,000 weekly downloads as suspicious. If all checks pass, run pip install with the pinned version from the allowlist.

Exercise 26.7.3: CI Pipeline Design Analysis

Design a CI pipeline for an agent-runner image that achieves SLSA Level 2. List the required steps and explain which SLSA requirements each step satisfies.

Answer Sketch

SLSA Level 2 requires: (1) version-controlled source (satisfied by Git), (2) build executed on a hosted service (satisfied by GitHub Actions or similar), (3) signed provenance (satisfied by Cosign or slsa-github-generator). Pipeline steps: checkout source, build image, generate SBOM (Syft), scan vulnerabilities (Trivy), push image, sign image (Cosign), generate provenance attestation. The provenance document records the source repo, commit SHA, builder identity, and build steps. The signature makes the provenance tamper-evident.

Exercise 26.7.4: Runtime Scanning Trade-offs Conceptual

Scanning the agent sandbox after package installation adds latency to task execution. Analyze the trade-off between security and performance. Propose a tiered approach that balances both.

Answer Sketch

Tier 1 (fast, always): check installed packages against a pre-computed allowlist with cached scan results. Latency: milliseconds. Tier 2 (moderate, for new packages): run Trivy filesystem scan on newly installed packages only, using a local vulnerability database updated daily. Latency: 2 to 5 seconds. Tier 3 (thorough, for sensitive tasks): full Trivy scan of the entire workspace, including secret detection and misconfiguration scanning. Latency: 10 to 30 seconds. Assign tiers based on task sensitivity: routine code execution gets Tier 1, tasks accessing production data get Tier 2, tasks with network access get Tier 3.

Exercise 26.7.5: Incident Response Analysis

A critical CVE is disclosed in a package that your agent sandbox installed for 500 tasks over the past week. Using the SBOM audit trail, design an incident response plan. What data do you need, and what actions do you take?

Answer Sketch

Query the SBOM archive for all task executions in the past week that included the vulnerable package. For each affected task: (1) check whether the task had network access (potential data exfiltration), (2) check whether the task accessed sensitive data (potential data exposure), (3) review the task output for anomalies. Immediate actions: update the allowlist to require the patched version, rebuild the base image, re-scan all running sandbox instances. Notification: alert users whose tasks were affected with a summary of the risk and remediation status. Long-term: add the CVE to the regression test suite so future scans catch similar patterns.

Key Takeaways

Supply-chain hardening protects agent runtimes from compromised dependencies, base images, and models.
SBOMs provide a complete dependency inventory, enabling automated vulnerability scanning and compliance auditing.
Combine Trivy for vulnerability scanning, Cosign for image signing, and SBOM attestation for end-to-end supply-chain verification.

Research Frontier

Formal safety proofs for agentic systems. Can we mathematically guarantee that an LLM agent will never take certain dangerous actions? Current sandbox and guardrail approaches are empirical; researchers are exploring constrained decoding, verified action filters, and hybrid neuro-symbolic architectures that provide provable bounds on agent behavior.

Runtime monitoring and anomaly detection. Detecting when an agent is "going off the rails" in real time requires behavioral baselines and anomaly detectors that operate over action sequences, not just individual outputs. Sequence-level monitoring that flags unusual tool call patterns or resource access is an active area, especially for long-running autonomous agents.

Capability control and elicitation. As models grow more capable, understanding exactly what an agent can do (and preventing it from doing things it should not) becomes harder. Research on capability elicitation (systematically testing what an agent is able to accomplish) and capability control (reliably restricting an agent's effective abilities) is critical for safe deployment.

Sandboxing guarantees under adversarial inputs. Current sandboxes assume the agent acts within expected parameters, but prompt injection and jailbreak attacks can cause agents to attempt sandbox escapes. Quantifying and hardening sandbox boundaries against adversarial inputs from both users and retrieved content is an open problem.

ToolEmu (Ruan et al., 2023): Emulation framework for evaluating LLM agent safety by simulating tool execution and measuring rates of unsafe actions across risk categories.
R-Judge (Yuan et al., 2024): Benchmark for evaluating safety-aware reasoning in LLM agents across 27 risk scenarios, revealing that even strong models frequently fail to identify unsafe action plans.
AgentDojo (Debenedetti et al., 2024): Evaluation framework for testing agent robustness against prompt injection attacks in realistic tool-use environments, providing a standardized adversarial benchmark.
Machiavelli Benchmark (Pan et al., 2023): Measures harmful behaviors (deception, manipulation, resource acquisition) in LLM agents interacting with text-based game environments, quantifying the trade-off between agent performance and ethical behavior.

Self-Check

Q1: What is supply-chain hardening, and why do agent sandboxes need it?

Show Answer

Supply-chain hardening secures the software dependencies (libraries, base images, models) that the agent runtime relies on. Agent sandboxes need it because a compromised dependency inside the sandbox can escape isolation, exfiltrate data, or compromise the agent's behavior from within.

Q2: What role do SBOMs (Software Bills of Materials) play in agent supply-chain security?

Show Answer

SBOMs provide a complete inventory of every software component in the agent's runtime environment. They enable vulnerability scanning (checking all dependencies against CVE databases), compliance auditing, and rapid response when a new vulnerability is disclosed in any dependency.

What Comes Next

This section completes the chapter on agent safety and production operations. For the broader discussion of AI safety, ethics, and regulation, see Chapter 32: Safety, Ethics & Regulation. For practical deployment patterns, see Chapter 30: Observability & Monitoring.

References & Further Reading

Aqua Security. Trivy: Comprehensive Security Scanner.

Open-source vulnerability scanner for containers, filesystems, git repositories, and Kubernetes. Supports multiple vulnerability databases and produces reports in JSON, table, and SARIF formats. The standard tool for container security scanning in CI/CD pipelines.

Tool

Anchore. Syft: SBOM Generation Tool.

Generates Software Bills of Materials from container images, filesystems, and archives. Supports SPDX, CycloneDX, and native Syft JSON output formats. Detects packages from 15+ ecosystems including pip, npm, APK, APT, and RPM.

Tool

Sigstore. Cosign: Container Signing, Verification, and Storage.

Keyless signing and verification for container images and other OCI artifacts. Uses Fulcio for ephemeral certificates and Rekor for transparency logging. Integrates with GitHub Actions, GitLab CI, and other CI platforms for automated signing.

Tool

SLSA: Supply-chain Levels for Software Artifacts, v1.0 Specification.

Framework for ensuring integrity and provenance of software artifacts. Defines four levels of assurance from basic provenance documentation to hermetic, reproducible builds. Adopted by Google, GitHub, and the OpenSSF.

Specification

Birsan, A. (2021). "Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies."

The original disclosure of dependency confusion attacks, demonstrating how public package registries can be exploited to inject malicious code into private build systems. Essential reading for understanding why agent sandbox package installation requires supply-chain controls.

Article

CISA. Software Bill of Materials (SBOM) Resources.

U.S. Cybersecurity and Infrastructure Security Agency resources on SBOM standards, tooling, and best practices. Includes guidance on SBOM generation, consumption, and integration into vulnerability management workflows.

Reference