Audio dithering
How randomization helps to alleviate the effects of quantization.
Dithering randoms the effects of quantization
As we know, digital audio may suffer from quantization error when we’re dealing with extremely low levels, where we’re storing information near the value of one LSB. If the level falls below 0.5LSB, the audio isn’t registered at all, because it’s quantized to 0. These low levels could be reverb tails, fade-outs or some other very low-level details in otherwise silent passages. Dither does not make sure there’s always enough bits to register the quietest parts; that’s what an upward compression is for. Dither is for randomizing the quantization error. The error is still as bad in the low levels, but it’s just so random we won’t register it as quantization error, or quantization distortion specifically at the low levels.
What dithering fights against is quantization distortion from bit truncation. At very low levels without dither, when we try to store values smaller than what discrete bits can represent precisely, the rounding errors become correlated with the signal itself, creating harmonic distortion. With dither, those errors become random noise instead.
In theory, dithering may not be needed when working entirely in 24-bit, since the quantization noise floor (-144 dBFS) is far below the noise floor of any recording equipment or playback environment. However, dither becomes essential when reducing bit depth from 24-bit to 16-bit for distribution, or when applying certain processing that can expose quantization artifacts.
At high levels, it’s impossible to notice the effect of dithering, but approaching very low levels, we start noticing the lack of quantization distortion. That’s the effect of dithering.
How dithering works in a 2-bit signed system. Blue shows the original waveform, red shows quantization without dither (all zeros = distortion), green shows quantization with dither (random pattern preserves signal). Dither randomizes quantization error instead of creating predictable distortion. Image courtesy of Collins Group.
How it’s done
The process of quantization is somewhat simple: we generate two random numbers between -0.5 and 0.5 for every sample. For example, number A = -0.2 and number B = 0.4. Our dither sample value is -0.2+0.4=0.2. This value gets added to the sample value before quantization. If the original sample value was 1234.466, and we didn’t use dither, this would be quantized to 1234.0. But if we added dither, the resulting sample value would be 1234.666, which quantizes to 1235.0. These random numbers are added to every sample, no matter what’s the original sample value.
An example of what dither sounds like can be heard below. The audio has been amplified by 70 dB.
A common tool used in mastering, an RX11 dithering tool. Image courtesy of Collins Group.
The beautiful math behind it
There are two types of generating the dither values: RPDF and TPDF. The first, rectangular probability density function is a fancy way of saying single number. The second, triangular probability density function is a more sophisticated way of producing the number, and it also makes sure the random numbers we’re adding result in values weighted towards the middle (0), rather than being all over the place. In simple terms, this works so that to achieve either extreme (-1 or 1), you need to get exactly 0.5 for both numbers. For zero you have much more different random values that result in zero. Over time, we see we average near zero, which means we’re more often not adding anything to the sample.
Left (RPDF): Flat distribution - every value between -0.5 and +0.5 LSB is equally likely. Right (TPDF): Triangular distribution - values near 0 LSB are most common, values near ±1 LSB are rare. The red line shows the theoretical triangle shape overlaid on the histogram. The RMS values confirm the theory: TPDF has about 0.41 LSB RMS even though it peaks at ±1 LSB, because most samples cluster near zero. Image courtesy of Collins Group.
The cost of dithering
Using dither comes with a price. Essentially, we’re sacrificing around 3 dB of S/N ratio, which comes from summing two sources whose RMS is around -96 dBFS. The first is the noise of quantization error (Equation 1). The second is dither, whose RMS is also at -96 dBFS (because of the random noise we’re adding at the level of the LSB). Summing these together means summing their power, which we do in Equation 2. For clarity, single sample could be nearer to -96 dBFS rather than -93 dBFS, but over time, thanks to the random nature of the dither, we get to -93 dBFS. Since the noise is uncorrelated, we use the formula for power addition.
| Input Bit Depth | Output Bit Depth | Process | Dithering Necessary? | Rationale / Notes |
|---|---|---|---|---|
| 32 | 24 | Bit Reduction | Yes* | Origin (32-bit) is higher than destination (24-bit). Dithering is used to mask the quantization noise. |
| 24 | 16 | Bit Reduction | Yes | Origin (24-bit) is higher than destination (16-bit). |
| 16 | 24 | Bit Expansion | No | Nothing to dither. The destination has a higher resolution than the source. |
| 24 | 32 | Bit Expansion | No | Nothing to dither. The destination has a higher resolution than the source. |
| 16 | 16 | No Processing | No | Nothing to dither. The bit depth remains the same. |
| 16 | 16 | Processing | Yes (Conditional) | Necessary, but only on the final output if processing was performed internally at a higher bit depth (e.g., 32-bit float). |
| 24 | 24 | No Processing | No | Not necessary. Noise floor already higher than the LSB. |
| 32 | 32 | No Processing | No | Not necessary. Noise floor already higher than the LSB. |
- Note: While not technically required, since no recording device has such signal-to-noise ratio that would benefit from dithering, in theory, dithering is always required at bit reduction.
EQ1:
$$\begin{aligned}S/E &= 20log(2^n) \\&= 20(n)log2 \\&= 6.0206n \\S/E_{16}&=20log(2^{16}) \\&=96.33\end{aligned}$$EQ2:
$$\begin{aligned}dB_{res16} &= 10log_{10}(10^{-96\over 10} + 10^{-96\over 10}) \\&= 10log_10(2\times 10^{-96\over 10} \\&=10\times log_{10}(2) + 10\times log_{10}(10^{-96\over 10}) \\&= 3.01 + (-96) \\&= -93 dBFS\end{aligned}$$Or to find the difference in dB:
$$\begin{aligned}\Delta dB&=10log(2) \\\Delta dB&=3.01\end{aligned}$$Dynamic Score
Making decisions on automated dynamic range compression requires a scoring system that takes three different measurements into consideration.
Audio encoding artifacts
How audio encoders use psychoacoustic masking to reduce file sizes, and why this process creates audible artifacts in compressed formats.
Loudness and normalization
Understanding what is loudness, how it's measured and why standards exist.
Bit depth in digital audio
Understanding bit depth, quantization and why float sample rates are needed.
Audio dithering
How randomization helps to alleviate the effects of quantization.