"An LLM hallucinated a torque spec on a plant floor in 2024. The torque was wrong. The lessons were many."
Hallux, Spec-Hallucination-Investigator AI Agent
Five failure modes are uniquely costly in manufacturing: violations of the IT/OT boundary, hallucinated specifications and procedural steps, mismatched air-gap and on-premises requirements, multilingual and multi-site drift, and safety-case complications when LLMs touch regulated processes. Each has a remediation pattern that has been validated in production. This section walks through each.
Prerequisites
This section assumes the manufacturing LLM use cases from Section 73.1, the hallucination vocabulary from Section 47.1, and a passing familiarity with IT and OT network boundaries.
The IT/OT Boundary Is Not Optional
Stuxnet, the 2010 worm that physically damaged Iranian centrifuges by writing rogue setpoints to Siemens S7 PLCs, is the patron saint of the IT/OT boundary lecture. The malware was so sophisticated that it took the world's top security researchers six months to even understand what it was doing. Every industrial LLM deployment in 2026 treats Stuxnet as the worst-case scenario the architecture must prevent, even though no LLM has actually done anything close.
Operational Technology (PLCs, SCADA, MES control loops) operates under safety-critical constraints, deterministic timing, and certification regimes that have nothing in common with cloud SaaS development. An LLM that writes directly to a PLC is, almost always, a catastrophic design. The reference architecture: LLMs live on the IT side; OT systems consume structured, signed, human-approved instructions through audited interfaces. The Stuxnet lesson applies twice over.
The structural argument is unchanged from pre-LLM industrial-cybersecurity practice: control systems operate under timing, safety, and certification constraints that cloud-style continuous deployment cannot satisfy. An LLM that writes setpoints to a PLC is a software component the safety case has not assessed; the loop has lost its safety property. The architectural response is firm: LLMs sit in IT zones (Purdue Level 4 or 5), information flows up from OT to IT through one-way data diodes or audited gateways, and any change to OT flows back through the engineering-workstation change-management process with human approval. This is the lesson Section 73.1 covers in detail; Section 73.4 specifies the architecture.
Hallucinated Torque Specs and Procedural Steps
A maintenance copilot that confidently invents a fastener torque value or skips a safety step can cause real injuries. Defense in depth: (1) refuse to answer when the retrieval corpus doesn't contain the spec, (2) always cite the source page and revision number, (3) escalate any safety-related answer to a "verify with the printed manual" prompt before the technician acts. Several pilot programs have caught hallucinated specifications in evaluation that would have made it to the floor without these guards.
The pattern is the manufacturing analog to the legal-industry hallucinated-citation problem (Section 67.2). The fix is parallel: a verification step that resolves every specification or procedural reference against the authoritative manual, refusal to answer when the corpus does not contain the relevant spec, and an explicit safety-prompt for any answer that touches torque, voltage, pressure, or any other quantity whose wrong value can cause injury. Several major automotive and aerospace suppliers have built dedicated verification services for this, integrated with the manufacturer's PLM and document-management systems.
Air-Gapped and On-Premises Requirements
Many manufacturing environments cannot accept cloud-only LLM services. Reasons range from defense-industrial-base contractual constraints (ITAR, CMMC) to pragmatic concerns (factory-floor uptime cannot depend on internet connectivity). Successful deployments use on-premises or air-gapped model serving (vLLM, TGI, NVIDIA NIM, or appliance vendors). Open-weight models in the 7B-70B range are typically chosen for these footprints.
The on-premises and air-gapped requirements are not optional in many manufacturing contexts. Defense-industrial-base contracts require CMMC compliance; ITAR-controlled programs require air-gap; many manufacturers will not accept any architecture where a factory-floor outage requires waiting on internet connectivity. The deployment pattern: open-weight models served via vLLM, TGI, or NVIDIA NIM on plant or regional GPU infrastructure, with the retrieval corpus loaded once and refreshed on a controlled schedule. The frontier capability lag is real (6 to 18 months relative to cloud frontier models) but bounded, and the operational benefit of self-contained operation is decisive.
Multilingual and Multi-Site Drift
Manufacturing sites are often global, with local-language documentation, locally-customized SOPs, and locally-cached knowledge. A single global LLM deployment quickly fragments into dozens of site-specific configurations. Production patterns: a shared base model with per-site retrieval corpora, per-site evaluation sets, and per-site feedback loops. Centralizing the model without centralizing the corpus is a recipe for chronic frustration at the sites whose context is missing.
The successful pattern at multinational manufacturers (Siemens, Bosch, GE Vernova) is a shared model plus per-site retrieval. The model is the same; the documents the model retrieves over differ by site. The site team owns the retrieval corpus, the central platform team owns the model and the platform. This separation aligns ownership with knowledge: the people who know the local SOPs own the local SOPs.
Safety-Case and Functional-Safety Certification
If the LLM informs decisions in a regulated process (ISO 26262 in automotive, IEC 61508 in process industries, IEC 62304 in medical-device manufacturing, GxP in pharma), the deployment becomes part of the safety case. This is rarely impossible but is always slower and more expensive than greenfield deployment. The pattern that works: scope the LLM out of any function with a safety-integrity level, and document that scoping explicitly.
The pattern that works in production is to keep LLMs strictly outside the certified function. The LLM may help a technician research a fault, draft a change-management document, or summarize a deviation report. It may not be in the loop on a safety-related decision. The scoping is explicit and documented in the safety case: "the LLM operates as a research and documentation assistant; it does not influence safety-related decisions; the operator remains responsible for verifying all information against the controlled manual." This scoping is what allows the LLM to ship at all in regulated manufacturing.
A composite of several real 2024 incidents at major automotive suppliers, none individually disclosed but the pattern is widely shared at industry conferences. A maintenance copilot was piloted at a Tier-1 automotive supplier without mandatory source citation; the LLM, when asked about fastener torque for an unfamiliar assembly, produced confident-looking specifications drawn from the model's general training rather than from the OEM manual. In evaluation against the engineering team's verification set, two specifications were caught as fabrications. The pilot was paused, re-architected with mandatory citation and a refusal-when-uncertain prompt, and resumed. The lesson, widely shared afterward, was that the architectural defense is non-optional: every safety-related answer must cite the manual page and revision number, and any answer the corpus does not support must be a refusal. The pattern propagated quickly across the automotive industrial-base.
- The IT/OT boundary is structural, not stylistic: PLCs, SCADA, and MES control loops operate under timing, safety, and certification constraints that cloud-style continuous deployment cannot satisfy, so an LLM writing setpoints to a PLC invalidates the safety case absolutely.
- Hallucinated torque specs are the manufacturing analog of fake citations: the 2024 Tier-1 automotive postmortem made mandatory source citation (page and revision number) and refusal-when-uncertain the architectural baseline, with safety-related answers flagged for "verify against the printed manual" before action.
- Air-gapped and on-premises serving is non-negotiable in many plants: ITAR, CMMC, and uptime-versus-internet trade-offs drive deployments to vLLM, TGI, or NVIDIA NIM with 7B-70B open-weight models on plant or regional GPU infrastructure, accepting a 6-18 month capability lag for self-contained operation.
- Multi-site drift demands shared model plus per-site retrieval: Siemens, Bosch, and GE Vernova converged on a single model with site-owned corpora, evaluation sets, and feedback loops, aligning ownership of local SOPs with the people who know them.
- Safety-case scoping is what lets the LLM ship in regulated processes: ISO 26262, IEC 61508, IEC 62304, and GxP regimes require the LLM to be explicitly excluded from any function with a safety-integrity level, with the scoping documented in the safety case as "research and documentation assistant only."
What's Next?
Section 73.3: Manufacturing Regulatory and Standards Framework walks through the regulatory and standards framework: ISO/IEC 42001, ISA/IEC 62443, NIST AI RMF and SP 800-82, EU Machinery Regulation, EU AI Act, ITAR/EAR, and sector-specific GxP/GMP regimes.