XOR, Clifford, and Bottled Nonlinearity

Feb 16, 2026

Stabilizers, Magic States, and Why T Gates Are Expensive

Earlier today, a friend compared Toffoli gates to multiplications, and I was instantly nerdsniped.

As computer scientists, we know that XOR is not universal, but XOR+AND is. It is illuminating to think of XOR as addition modulo 2, and AND as multiplication modulo 2. We need both operations. And crucially, we cannot build multiplication out of addition alone.

Hearing that Toffoli is “multiplication” reminded me that this is exactly true in quantum computing.

This post is an attempt to explain why that analogy is so remarkably beautiful — and why cryptographers like me obsess over the number of Toffoli gates.

Stabilizers: start with one qubit

Before scaling anything up, let’s look carefully at a single qubit.

A quantum state is a generalization of a classical bit. A classical bit is either $0$ or $1$ . A qubit can be in a superposition of both.

For one qubit, a general state looks like

a\ket{0} + b\ket{1},

where $a$ and $b$ are complex numbers and $|a|^2 + |b|^2 = 1$ .
The squared magnitudes $|a|^2$ and $|b|^2$ determine the probabilities of measuring $0$ or $1$ .

One standard way to describe a quantum state is by listing these complex amplitudes.

If this already feels abstract or slightly uncomfortable, that is normal. Quantum states are mathematically richer than classical bits, and the notation can feel heavy at first. But we need just this much structure to get to the fun computer science part.

Now let’s discuss quantum gates. A gate is just a linear transformation applied to this state. It is chosen so that the total probability still sums to one. For example:

$I$ , the identity gate, which does nothing.
$X$ , which flips $\ket{0}$ and $\ket{1}$ . It is the quantum version of the classical NOT gate.
$Z$ , which leaves $\ket{0}$ unchanged and multiplies $\ket{1}$ by $-1$ . There is no classical analogue of the $Z$ gate.
$Y$ , which combines the effects of $X$ and $Z$ : it flips $\ket{0}$ and $\ket{1}$ and also introduces a sign change. You can think of $Y$ simply as $XZ$ .
The Hadamard gate $H$ sends $\ket{0} \mapsto \frac{\ket{0} + \ket{1}}{\sqrt{2}}, \quad \ket{1} \mapsto \frac{\ket{0} - \ket{1}}{\sqrt{2}}.$

There is also no classical analogue of the Hadamard gate. It creates superpositions, something classical bits simply cannot do.¹

If we track amplitudes directly, even this tiny example brings in square roots and complex numbers.

The stabilizer idea is to describe certain states in a completely different way.

Instead of listing amplitudes, we describe a state by the operators that leave it unchanged.

A state $\ket{\psi}$ is stabilized by an operator $P$ if

P\ket{\psi} = \ket{\psi}.

For example:

$\ket{0}$ is stabilized by $Z$ because $Z\ket{0} = \ket{0}$ .
$\ket{+} = H\ket{0} = \frac{\ket{0}+\ket{1}}{\sqrt{2}}$ is stabilized by $X$ .

Now here is the crucial simplification.

The Pauli operators $(I, X, Y, Z)$ generate a group. We ignore overall constant factors when working in the stabilizer formalism.²

Each Pauli operator on one qubit can be encoded using two bits:

$I$ corresponds to $(0,0)$
$X$ corresponds to $(1,0)$
$Z$ corresponds to $(0,1)$
$Y$ corresponds to $(1,1)$ (since $Y = XZ$ )

Multiplying Pauli operators corresponds to adding these bit pairs modulo 2.

We have reduced part of quantum mechanics to arithmetic on bits. That is the key insight.

A complete one-qubit example

Start with $\ket{0}$ . Its stabilizer is $Z$ , which we encode as $(0,1)$ .

Now apply a Hadamard gate $H$ .

We know that

H Z H = X.

So under conjugation by $H$ , the stabilizer $(0,1)$ becomes $(1,0)$ .

And indeed,

H\ket{0} = \ket{+},

which is stabilized by $X$ .

If we had tracked amplitudes, we would have computed

\ket{0} \xrightarrow{H} \frac{\ket{0}+\ket{1}}{\sqrt{2}}.

Instead, we just updated a two-bit description.

That is the stabilizer formalism in action.

Scaling to many qubits

For $n$ qubits, a completely general state requires $2^n$ complex amplitudes.

But an $n$ -qubit Pauli operator can be encoded using $2n$ bits — two bits per qubit indicating where $X$ and $Z$ appear.

A stabilizer state is defined by $n$ independent Pauli operators that all stabilize it. So instead of storing $2^n$ amplitudes, we store $n$ binary strings of length $2n$ .

That is only $O(n^2)$ bits — polynomial, not exponential.

Now comes the key structural fact:

Clifford gates (generated by $H$ (Hadamard), $S$ (a simple phase-rotation gate), and CNOT (the controlled-NOT gate, which flips one qubit depending on another)) map Pauli operators to Pauli operators.

So if you describe a quantum state by its stabilizers and apply only Clifford gates, you can track the entire computation using only binary linear algebra.

No amplitudes.
No exponential blowup.

This is the Gottesman–Knill theorem (see Gottesman (1998)), which shows that Clifford circuits can be simulated efficiently on a classical computer. If you want a clean and very readable presentation of how this simulation works in practice, see Aaronson and Gottesman (2004).

Clifford circuits are the XOR circuits of quantum computing.
They never leave the linear world.

Why Clifford is not universal

Clifford circuits can create entanglement. They can create Bell states. They can implement quantum teleportation. They are already surprisingly expressive.

But if you start with all zeros — or, as we like to say, $\ket{0}^{\otimes n}$ — and apply only Clifford gates, you never leave the stabilizer world.

And stabilizer states have compact descriptions: $O(n^2)$ bits suffice to describe them.

So this fragment of quantum computing is powerful. But it is not universal.

It is missing multiplication.

Enter the T gate: multiplication arrives

The T gate is

T = \begin{pmatrix} 1 & 0 \\ 0 & e^{i\pi/4} \end{pmatrix}.

It is not a Clifford gate.

Conjugating a Pauli operator $P$ by $T$ — meaning computing $TPT^{\dagger}$ — does not always produce another Pauli operator. The closed linear world of stabilizers breaks.

This is exactly the analogue of adding AND to XOR circuits.

In reversible classical computing, the Toffoli gate plays the role of multiplication. It computes $(x,y,z) \mapsto (x,y,z \oplus xy)$ . That $xy$ term is the product of two bits. Toffoli makes reversible circuits universal.

Clifford gates correspond to linear reversible transformations. T gates are what allow you to build Toffoli.

So in a very literal sense, the T-count (the number of $T$ gates in your circuit) measures how much “multiplication” your quantum circuit contains.

Once you add T gates,

the algebra is no longer linear,
the stabilizer simulator no longer applies,
and the gate set becomes universal.

Magic states: bottled multiplication

To understand why magic states matter, we need a tiny bit of fault-tolerance context. The leading approach to building large-scale quantum computers today is based on the surface code. In the surface code, logical qubits are encoded into large grids of physical qubits in a way that makes certain operations — notably Clifford gates — relatively natural and robust against noise.

In fault-tolerant quantum computing, Clifford gates are cheap.

T gates are expensive.

Why?

Many quantum error-correcting codes implement Clifford gates transversally — meaning you apply the same small gate ‘bitwise’ across the physical qubits of a logical qubit, without spreading errors in dangerous ways.

This limitation is formalized in the Eastin–Knill theorem (see Eastin and Knill (2009)), which shows that no quantum error-correcting code can implement a universal gate set using only transversal gates.

Instead, we prepare a special resource state

\ket{T} = T \ket{+}.

This is called a magic state (see Bravyi and Kitaev (2005)).

Magic states are not stabilizer states. They cannot be described using only the linear stabilizer machinery.
They sit just outside the linear world.

Using only Clifford gates plus magic states, you can implement $T$ gates via state injection (a fault-tolerant gadget that consumes a prepared magic state to enact the gate).

So magic states are pre-manufactured packets of multiplication.

They are the AND gates of fault-tolerant quantum computing.

The big picture

Both worlds share the same structure:

There is a linear subset.
That subset is efficiently simulable.
It is not universal.
Adding a small nonlinear ingredient makes it universal.

Classically, the nonlinear ingredient is multiplication of bits.

Quantumly, the nonlinear ingredient is the T gate — or equivalently, magic states.

The cost of classical simulation grows exponentially in the number of these nonlinear resources. That is why people obsess over T-count: it measures how much genuine nonlinearity your quantum circuit contains.

Nonlinearity is the scarce resource that turns simulable structure into universal computation.
And that is why we count T gates.

Superposition is not the same thing as probability. A classical bit can be $0$ with some probability and $1$ with some probability. A superposition is different: it is a single quantum state that can later interfere with itself. You can also have classical probability distributions over superpositions. The two notions are orthogonal. ↩
We ignore constant factors because multiplying a quantum state by a constant complex number $e^{i\theta}$ does not change any measurement probabilities. Because of this, such overall constants do not affect the physics and can be safely ignored. ↩

I want to write more posts like this. You can follow along by email.