The DFT and FFT of Section G.2 work on finite, bounded sequences, which covers nearly every signal an audio model will ever see. But the underlying theory of why these transforms diagonalize convolution, why some filters are stable and others diverge, and why resampling produces the spectral artifacts it does, is most cleanly expressed in the Z-transform. This section gives the minimum vocabulary needed to read the digital-filter literature and to understand the design choices behind audio resampling, denoising, and effects.
Definition
For a discrete-time sequence $x[n]$ defined on the integers, the two-sided Z-transform is
where $z$ is a complex variable. The Z-transform embeds a discrete sequence into a continuous complex-valued function of $z$, which is exactly what lets continuous-function machinery (polynomial roots, contour integrals, factorization) be brought to bear on inherently discrete operations. It is the discrete-time analogue of the Laplace transform: the substitution $z = e^{s T_s}$, where $T_s$ is the sample period, converts the Laplace transform of a sampled signal directly into the Z-transform.
Region of Convergence
The series above does not converge for every $z$. The set of complex numbers for which it does is the region of convergence (ROC), and it is always an annulus $r_1 < |z| < r_2$ centered on the origin. The ROC is part of the data: the same algebraic expression $X(z)$ can correspond to two different sequences with different ROCs (one right-sided, one left-sided). For causal sequences (those with $x[n] = 0$ for $n < 0$), the ROC is the exterior of a disk, $|z| > r_1$; for finite-length sequences, the ROC is the entire $z$-plane except possibly $z = 0$ or $z = \infty$.
Pole-Zero Plots
The Z-transform of any rational LTI system is a ratio of polynomials in $z^{-1}$:
The roots of $B(z)$ are zeros (values of $z$ for which $H(z) = 0$) and the roots of $A(z)$ are poles (values where $H(z) \to \infty$). Plotting these on the complex plane gives the pole-zero diagram, the single most useful visualization in digital-filter design. Zeros carve notches into the frequency response; poles create resonant peaks. Resonant audio effects (vowel-formant synthesis, EQ peaks, wah-wah pedals) are literally pole placements near the unit circle.
The Unit Circle Is the DTFT
Substitute $z = e^{j \omega}$ in the Z-transform definition. The result is
which is exactly the discrete-time Fourier transform (DTFT) of $x[n]$ at angular frequency $\omega$. The unit circle $|z| = 1$ on the $z$-plane is therefore the frequency axis: traversing it from $\omega = 0$ (the point $z = 1$) to $\omega = \pi$ (the point $z = -1$) sweeps the DTFT from DC to the Nyquist frequency. The DFT is just the DTFT sampled at $N$ equally spaced points on this circle, $\omega_k = 2 \pi k / N$. This is the precise sense in which "the FFT lives on the unit circle": the DFT bins are uniformly spaced samples of $H(z)$ along $|z| = 1$.
The unit circle also separates stable from unstable systems. A causal LTI filter is stable (bounded input gives bounded output, BIBO) if and only if all of its poles lie strictly inside the unit circle, $|z_{\text{pole}}| < 1$. A pole on the circle gives a marginally stable resonator (an undamped oscillator); a pole outside gives an exploding response. Filter designers therefore think of stability as a purely geometric condition on the pole positions, which is enormously easier to reason about than a time-domain stability proof.
Connection to Digital Filter Design and Audio Effects
Every digital audio effect in widespread use, low-pass and high-pass filters for resampling, parametric EQ for mixing, reverb and chorus, comb filters for pitch detection, anti-alias filters in front of the analog-to-digital converter, can be specified by where its designer places poles and zeros on the $z$-plane. The two big families are finite impulse response (FIR) filters, which have only zeros (their denominator is $A(z) = 1$) and are unconditionally stable but require many taps for sharp cutoffs, and infinite impulse response (IIR) filters, which have poles and zeros, achieve sharp cutoffs with few coefficients, and require pole-placement care to remain stable. The resampling stages that take 48 kHz audio down to 16 kHz for Whisper, the band-limiting filters that prevent aliasing in mel filter banks, and the low-pass reconstruction filters in neural vocoders are all FIR or IIR designs justified by exactly this $z$-plane geometry.
The stability and pole-placement language of this section underlies the resampling and anti-alias filtering used in the data pipelines of Section 20.0.1 (Audio Data) and the neural-vocoder design notes in Section 20.5 (Speech Recognition). The continuous-time Laplace background from which the Z-transform is derived is reviewed alongside complex analysis in Appendix A.
The Z-transform pole-zero geometry developed here is also the design language of neural audio codecs covered in Section 20.0.2. The analysis and synthesis filter banks inside EnCodec and SoundStream are stacks of FIR and IIR designs whose pole placements determine their reconstruction quality, latency, and stability; the same unit-circle picture that explains a wah-wah pedal's resonance also explains why a codec's analysis filter is causal and minimum-phase. Readers studying codec models in Chapter 20 will recognise every term used here.
The Z-transform embeds any discrete-time sequence into a complex-valued function whose pole-zero geometry on the $z$-plane fully describes any LTI filter: poles inside the unit circle give stability, poles on the circle give resonance, the unit circle itself is the DTFT, and the DFT bins are uniformly spaced samples of it. Every digital audio effect, resampler, and anti-alias filter in the audio pipelines of Chapter 20 is specified by exactly this geometry.
Objective. Make the pole-zero abstraction audible by designing two filters and listening to the difference.
Task. Generate 3 seconds of white noise at 16 kHz. Then: (a) design a 6th-order Butterworth low-pass filter with scipy.signal.butter(6, 1000, btype="low", fs=16000); (b) design a narrow band-pass with center 1000 Hz and Q = 30 via scipy.signal.iirpeak. Plot the pole-zero map of each with scipy.signal.zplane-style code (or use matplotlib on the returned (b, a) coefficients). Apply each filter to the noise via scipy.signal.lfilter, save as WAV, and listen. Describe in one sentence per filter how the pole geometry matches the sound.
Stretch. Move one of the band-pass poles slightly outside the unit circle and confirm that the filter output diverges over time, validating the stability rule.