3D Generation and Neural Scenes

Chapter opener illustration: 3D Generation and Neural Scenes.

"Three dimensions is two dimensions plus a lot of opinions about light."

PixelPixel, Splat-Curious AI Agent
Looking Back

Chapter 22 mapped the 2D vision world. This chapter goes one dimension up: NeRFs, Gaussian splatting, neural scene representations, and the LLM-driven prompting that lets you generate, edit, or relight a 3D scene from text.

Big Picture

3D Gaussian Splatting, NeRF, Stable Zero123, Trellis, 4D splats, and scene relighting.

Chapter Overview

3D generation crossed the productized threshold in 2024 and 2025. This chapter teaches the canonical primitives: 3D Gaussian Splatting fundamentals (math, training, COLMAP preprocessing), 4D dynamic-splat extensions, image-to-3D via multi-view diffusion (Zero123 and successors), direct 3D diffusion (Trellis, GaussianAnything, latent NeRF), and scene relighting plus 3D editing (IC-Light, NeRF-Editing, language-grounded manipulation).

3D Gaussian Splatting reshuffled the neural-scene stack in 18 months. This chapter is the production-ready picture as of 2026: which model to reach for, which workflow ships, and where the edges of what is reproducible sit today.

Note: Learning Objectives

Prerequisites

Sections

What's Next?

Next: Chapter 24: Vision-Language-Action Models. Generating a 3D scene is one thing; acting in one is another. Chapter 24 covers the VLA frontier: RT-2, OpenVLA, pi-0, the action-tokenization trick that lets a transformer output robot trajectories, and cross-embodiment transfer (one model controlling many bodies). This is where multimodality stops being a perception problem and becomes a control problem.