"On-premises serving, curated equipment corpus, always-cite-retrieval, never-execute-control. The four rules of a 2026 plant-floor LLM copilot."
Rag, Plant-Floor-RAG-Architect AI Agent
Eight architectural choices define the dominant 2026 plant-floor maintenance copilot: on-premises or VPC-isolated serving, a curated and versioned equipment corpus, mandatory source citation, refusal outside the corpus, structured-output handoffs to MES and CMMS, per-site retrieval against a shared base model, continuous evaluation against equipment-specific eval sets, and multimodal input across voice, text, and image. Together these choices honor the IT/OT boundary, the safety-case obligations, and the multi-site operational realities described in Section 73.1 through Section 73.3, while still delivering the productivity gains that justify the deployment. This section is the reference architecture; later sections turn to the named-vendor cases and the postmortems.
Prerequisites
This section assumes the manufacturing regulatory framework from Section 73.3, the RAG fundamentals from Section 32.1, and the LLM-container patterns from Section 65.1.
Eight Architectural Choices
NVIDIA NIM (NVIDIA Inference Microservices) launched at GTC 2024 as a packaging format for on-premises LLM serving; the launch demo featured Jensen Huang holding a NIM container above his head like a sacred object. The package format is essentially a Docker image with vLLM, the model weights, and a thin OpenAI-compatible HTTP layer; the brand-name premium is real, but the underlying components are open source.
- On-premises or VPC-isolated model serving. Open-weight models (Llama-3, Qwen 2.5, Mistral, Phi-3, NVIDIA NIM-packaged variants) run on plant or regional GPU infrastructure, with a latency budget under 2 seconds for shop-floor user experience. Cloud frontier APIs are used only in non-regulated, non-air-gapped contexts where the latency and data-egress story can be justified.
- RAG over a curated equipment corpus. OEM manuals, internal SOPs, historical work orders, training videos transcribed, training presentations, deviation reports, and the manufacturer's PLM-of-record. Versioned per equipment model and per site; refreshed on a controlled schedule with explicit change control. The corpus is the product more than the model is.
- Always-cite-source UI. Every answer links to the manual page, work-order ID, or training-document section it came from. Technicians verify before acting on anything safety-related. The citation is the safety-case artifact.
- Refusal outside corpus. When retrieval returns nothing relevant, the system says "I do not have information on this; please consult X or escalate to Y" rather than improvising. The refusal vocabulary is part of the system prompt and is tested explicitly in evaluation.
- Structured-output handoff to MES and CMMS. Any action the LLM proposes (open a work order, log an inspection, update a spare-parts count, request a maintenance window) goes through a structured, signed, human-approved write to the system of record. No free-text writes to OT. The CMMS or MES API contract is the conduit per IEC 62443.
- Per-site retrieval, shared model. One base model, N sites, N retrieval corpora. Sites own their content; the central platform team owns the model, the prompts, the evaluation harness, and the upgrade cadence. This pattern dominates because it scales without recreating the per-site fragmentation that Section 73.2 warned about.
- Continuous evaluation against equipment-specific eval sets. Representative technician questions with ground-truth answers from senior engineers, refreshed quarterly. Regressions block deployments. Evaluation is the same kind of gate as a quality release for the underlying manufacturing process.
- Voice, text, and image input. Hands-free voice (Whisper-class) for shop-floor users wearing gloves or PPE; text for office and planning users; image input for photo-based fault diagnosis where vision models earn their keep against torn labels, corrosion patterns, and indicator-light states.
Reference Stack and Vendor Choices
The 2026 reference stack converged faster than most observers expected. Six components show up in almost every deployment we have seen:
- Inference: vLLM, NVIDIA Triton with TensorRT-LLM, or NVIDIA NIM containers for the air-gapped deployments; Hugging Face Text Generation Inference and Together AI's enterprise plane remain common for the cloud-friendly cases.
- Embedding models: a single shared deployment of BGE-M3 or NVIDIA E5 variants, with reranking handled by a small cross-encoder (BGE reranker or Cohere Rerank where cloud is acceptable).
- Vector stores: pgvector inside the manufacturer's enterprise Postgres or a managed Qdrant/Weaviate cluster; OpenSearch with k-NN is common where IT teams already operate it.
- Orchestration: LangChain, LlamaIndex, or a small in-house framework.
- Observability: OpenTelemetry shipping to the manufacturer's existing logging stack.
- MES and CMMS handoff: most often an idempotent REST or AS2 call to SAP, Oracle, Infor, Siemens Opcenter, or IBM Maximo, with the signed action payload retained in the application audit log.
The OT-Safe Pattern Table
Table 73.4.1a summarizes the OT-safe LLM deployment patterns that have stabilized across major industrial-software vendors and large manufacturers by mid-2026. The common thread is that the LLM lives in the IT zone (Purdue Level 4 or 5) and influences OT only through audited, human-approved channels per the IEC 62443 zones-and-conduits model.
| Pattern | LLM zone | OT interaction | Typical use cases | Risk-tier |
|---|---|---|---|---|
| Read-only maintenance copilot | IT (Purdue Level 4-5) | Reads CMMS, OEM manuals, historian extracts; writes nothing to OT | Equipment Q&A, fault-diagnostic checklists, hands-free voice queries | Low |
| Predictive-maintenance triage advisor | IT (Level 4) | Reads sensor history and asset logs; outputs work-order recommendation | Anomaly-triage briefings for on-call technician | Low-Medium |
| Work-order drafting with structured handoff | IT (Level 4) | Drafts work-order text; human approves and dispatches via CMMS | Pharma deviation reports, aerospace nonconformance, automotive change orders | Medium (GxP and IEC 61508 documentation is regulated) |
| Supply-chain disruption advisor | IT (Level 5, enterprise) | Reads news, sanctions, ERP; outputs risk brief for procurement leadership | Supplier-risk briefings, geopolitical-disruption response | Medium (commercial impact only; no OT exposure) |
| Air-gapped plant assistant | Plant IT, isolated from corporate network | Local models only; no telemetry egress; offline corpus updates | Defense-industrial-base, classified manufacturing, regulated pharma | Low (technically) and High (oversight) |
| Direct PLC/SCADA write | Not recommended in 2026 | LLM writes setpoints, recipes, or safety logic | Effectively no public deployments at scale | Out of scope |
Bosch publicly described its 2025 plant-floor copilot rollout across multiple European and Asian sites. The deployment uses an open-weight Llama-derivative served on plant GPU appliances, with a per-site corpus of OEM manuals (Bosch-owned and third-party), shift-handover notes, and the local CMMS history. Every answer cites at least one source; out-of-corpus questions trigger a refusal and an escalation prompt to the shift engineer. Action handoffs flow through Bosch's internal MES via a signed REST payload. The reported productivity outcome on covered asset classes is a 20-25% reduction in mean-time-to-repair and a roughly 40% reduction in shift-handover documentation time. The architectural lesson Bosch emphasizes is that the platform team owns the model and the evaluation harness, sites own the content, and the safety case is built per asset class rather than per site, so adding a new plant is incremental rather than greenfield.
Integration with Robotics and Embodied Systems
For manufacturers running robotic cells, the copilot's role is informational, not commanding. Chapter 24 covers the embodied-AI and world-model stack in depth; the manufacturing-side discipline is to keep the LLM out of the low-level control loop and let it act only as an information layer above the robotic controllers. Specifically: the LLM may explain a robotic-cell fault to a technician, summarize a teach-pendant programming session, or generate the human-readable rationale that accompanies a structured handoff to the robot's controller, but the LLM does not write trajectories or motion plans. The control authority for the robot remains with the certified controller, the manufacturer's safety PLC, and the human operator at the teach pendant. ABB, FANUC, KUKA, and Yaskawa all align with this discipline in their 2025 and 2026 AI partner announcements; their integrations expose retrieval-only or structured-handoff surfaces, not raw motion authority. The cross-reference to Chapter 33 on cross-modal reasoning is the natural starting point for teams that want to push beyond information layers.
Regulatory Posture Wrapped In
The eight architectural choices above are not arbitrary; each one maps directly onto one or more of the seven frameworks described in Section 73.3. On-premises serving satisfies ITAR and CMMC; refusal outside corpus and source citation satisfy GxP CSV validation and the EU Machinery Regulation's risk-assessment expectation; structured handoff to CMMS or MES satisfies the IEC 62443 conduit model; continuous evaluation satisfies the ISO 42001 management-system audit; per-site retrieval with shared model satisfies the EU AI Act's transparency and human-oversight expectations for Annex III workforce-management cases. Treating regulation as the lens through which architecture is reviewed (rather than as a separate compliance checklist) is the discipline that makes deployments ship.
The architecture above defends against the five failure modes catalogued in Section 73.2, but only when the eight choices are applied together. Skipping refusal-outside-corpus reintroduces the hallucinated-torque-spec problem; skipping per-site retrieval reintroduces multi-site drift; skipping the structured handoff reintroduces the OT-write risk. The choices are interlocking. Treat any architectural shortcut as a safety-case change that requires explicit review, not as a minor optimization.
Practitioner Checklist
- Confirm that the LLM endpoint lives in an IT zone (Purdue Level 4 or 5) and never originates a write across an OT conduit.
- Audit the equipment corpus for version control, change management, and traceability to OEM source documents.
- Test refusal-outside-corpus in every release, with adversarial prompts and ablation of the retrieval index.
- Verify that every CMMS or MES action carries a signed payload, an audit trail, and a human approval before dispatch.
- Run the equipment-specific evaluation set on every model and prompt change; gate deployment on no regression.
- Document the safety case per asset class, not per site, so that adding a new site is incremental.
- Maintain the regulatory mapping (which architectural choice satisfies which framework) as a living document attached to the deployment.
What Comes Next
Section 73.5 turns to the postmortems and named-vendor cases (Foxconn Foxbrain, Siemens Industrial Copilot, the 2024 plant-floor copilot that hallucinated torque specs, the supply-chain-disruption agent pilots that paused) and to a brief inventory of the cross-references to other chapters that practitioners should keep open.
- Eight architectural choices are interlocking, not optional: on-premises serving, curated and versioned corpus, mandatory citation, refusal outside corpus, structured handoff to MES and CMMS, per-site retrieval over shared model, continuous evaluation, and voice-text-image multimodal input together defeat the manufacturing failure modes.
- The corpus is the product, not the model: OEM manuals, internal SOPs, work-order history, training videos, and the manufacturer's PLM-of-record are versioned per equipment and per site, with the site team owning content and the platform team owning the model.
- Citation is the safety-case artifact: every answer links to manual page, work-order ID, or training-document section, and out-of-corpus questions trigger refusal plus escalation prompts rather than improvisation.
- Structured handoff to MES and CMMS is the only conduit to OT: idempotent signed REST or AS2 calls to SAP, Oracle, Infor, Siemens Opcenter, or IBM Maximo carry every LLM-proposed action with human approval and audit-log retention.
- Bosch's 2025 multi-site rollout is the reference deployment: 20-25 percent reduction in MTTR and ~40 percent reduction in shift-handover documentation time, with the safety case built per asset class so adding a new plant is incremental rather than greenfield.
- Each architectural choice maps to a specific regulation: ITAR/CMMC drives on-premises, GxP CSV drives citation and refusal, IEC 62443 drives structured handoff, ISO 42001 drives continuous evaluation, EU AI Act Annex III drives per-site retrieval, so architecture review is the lens for compliance rather than a parallel checklist.
What's Next?
In the next section, Section 73.5: Postmortems and Named-Vendor Cases, we build on the material covered here.