By Jan Muths
Digital Audio - there are 10 types of people, those who understand the binary system and those who don’t. Few topics are discussed online as passionately as digital audio and its underlying theory. I’ve followed countless discussions in pro audio groups, home-recording forums, on social media and in web forums for over 2 decades now, and one thing remains clear: digital audio has the power to ignite fierce arguments.
In this blog, I aim to provide a comprehensive summary of what we have learned, what the industry collectively agrees upon, and the aspects that still fuel disagreement and intense emotions. Digital audio keeps on evolving and changing rapidly, and there is no end in sight to this trajectory. It is an amazing and exciting technology to be involved in. It requires to keep an open mind and re-evaluate old view-points every once in a while to check if they still hold true as technology progresses.
Let’s get right into it and look at some of the hot-topic discussions.
- Thanks to the work of Shannon and Nyquist, we understand that the sample rate determines the frequency bandwidth of the recorded signal, particularly the upper frequency limit, known as the Nyquist frequency. The math is simple: the upper bandwidth limit is half the sample rate. For example, with a sample rate of 48kHz, signal frequencies up to 24kHz are recorded.
- Doubling the sample rate also doubles the file size for a given bit depth and recording time.
- Certain industries have their preferred sample rates (such as 48KHz in broadcast), and it's advisable to adhere to those standards.
- Unnecessary sample rate conversions should be avoided.
- To my knowledge, the world-population of consumers who demanded a refund because a sound was recorded at the wrong sampling rate is: zero. Correct me if I’m wrong.
This sums up the most important points, now let’s get to the juicy stuff.
Areas of Disagreement:
“Higher sample rate means higher audio resolution.” Sorry to split terminology hairs here, but the parameter that best describes audio resolution is bit depth, not sample rate. The sample rate really only affects the recording bandwidth. "What is the best sample rate?" Discussions that start like this turn into a heated bunfight in no time. And don’t expect a conclusive agreement at the end of it.
More interesting is why individuals arrive at vastly different conclusions. When highly competent professionals conduct sample rate comparison tests individually, there might not be an agreement at the end. And that’s ok, because sample rate is merely a small component in the extensive chain of devices and acoustics that come into play during critical listening.
A garage studio specialising in recording teenage punk bands will undoubtedly arrive at different conclusions compared to a studio built to capture orchestral music. Similarly, engineers who utilise their digital audio workstations only as tape machines may have different perspectives than those heavily reliant on virtual instruments and digital plug-in processing. In short: the industry doesn’t agree on the best sample rate.
“Frequencies above 20KHz are irrelevant anyway”. I know for a fact that I cannot hear sine waves above 20KHz. Actually, as a middle aged human in my mid forties, I'm relieved that my hearing still extends beyond 17kHz. However, dismissing the ultrasonic range as irrelevant oversimplifies the matter. If we followed that logic, we should only ever use 40KHz devices, and those aren’t available for very good reasons.
Converters aim to capture all frequencies up to the Nyquist frequency as accurately as possible while not allowing any frequencies above Nyquist, to avoid aliasing. To achieve that, some kind of brick wall filter with extremely steep filter slopes is
needed. Different converters use a different mix of technologies, such as analog LPFs, oversampling and digital brick wall filters. The process of removing high frequencies with a brick wall style slope can have an effect on nearby frequencies, even extending into the audible range.
While this was a significant concern in the early days of digital audio, modern converters have made remarkable progress in keeping phase ripples at a minimum. Still, a healthy distance between the filter’s processing area and the human hearing range is advisable, justifying the need for our gear to operate beyond the upper hearing limits.
To illustrate this, let's draw a parallel from the world of analogue audio. During my time at the Customseries75 factory, I learned that a very wide frequency response beyond the human hearing range is a common, undisputed goal in high-end analog audio design. For instance, a SSL XL9000K console I maintained for many years has an open frequency response up to about 100 KHz, ELI's renowned Distressor extends to 160 kHz, and Millenia's HV-3C preamp goes beyond 300 kHz.
So, does high end sound require a wide bandwidth extending into the ultra-sonic range? Or is a wide bandwidth a by-product of high-end audio design and the desire for a superior transient response? It's a bit of a chicken and egg scenario.
Let’s bring it back to digital audio, and let me risk to cop some stick for dishing out my experience and personal workflows. I record at 96KHz these days, and the main reason is that my computer can handle the extra workload with ease while hard-drive space is cheaper than ever. Also, my recording system has a slightly lower round-trip latency. These are the factual reasons behind my choice.
However, there are also subjective factors or personal preferences that admittedly lack scientific proof but are based on repeated listening blind tests, plus a healthy proportion of my personal taste: At higher sample rates, I’ve noticed that vocals processed with Melodyne appear to handle more manipulation before artefacts begin to grind my gears. And my favourite reverb processor tends to sound more realistic, spacious and 3D at 96K, and a little more 2D at 44.1KHz.
Well, there I said it - I’m wondering how much flak I’m going to receive for putting myself out here. If you’re boiling up with a cranky reply in all-caps, please remember my point above, the one about the world population being zero.
Digital dynamic range refers to how the bits available per sample are used to express the dynamic range (loud and quiet), and the greater the number of bits, the wider a dynamic range can be captured. In a linear PCM system, each bit represents approximately 6 dBs of dynamic range. For more precise figures, you may enjoy the equation S/E = 6.02 * n + 1.76.
Let’s briefly define ‘dynamic range’. In the analogue domain, the upper limit of the dynamic range is restricted by distortion, which can manifest abruptly (for instance, with transistor technology) or gradually (such as with transformer or tube technology). The upper limit of the dynamic range is typically denoted by around 0.5% total harmonic distortion (THD). Conversely, on the bottom end of the scale, the dynamic range is limited by a device's noise floor.
In the digital domain, the top of the dynamic range is the clipping point, where the binary number range has reached its maximum. Clipping from an AD-converter is known to be unpleasant to the ear, so it's usually best to avoid it, although I’ve seen it done for creative reasons. At the bottom of the scale, we encounter quantisation error, which can be seen as the digital equivalent to analog self-noise.
Pretty much every post-2000 converter is capable of recording in either 16 bit or 24 bit resolution, with a theoretical dynamic range of approximately 96dB or 144dB respectively. This is quite impressive, given that the human hearing range spans about 120 dB, from the threshold of hearing to the threshold of pain. However, it's important to note that these are just theoretical maximum figures. The actual dynamic range should be stated in your converter’s operating manual, and it will always be less. Many high-quality ADCs have a practical dynamic range of 120dB to 128dB, and it is my understanding that the analogue components inside the converter, such as line input stages or balanced line drivers, are the "bottle-neck" here. That being said, modern converters match or even exceed the human hearing range, so we don’t have to worry about quantisation error degrading our signal, if we handle gain-staging with care.
The choice between 16 or 24 bit is straightforward. Workflows in which 16bit is the best recording resolution are as rare as Yowie’s footprints. The vast majority of the industry agrees that capturing recordings in 24 bit is the way to go, even if the final product is going to be a 16 bit file. Digital processors, such as faders, pan-pots, and our beloved plugins, benefit from greater input values when applying their math. So, 24-bit should be all we'd ever need. Its dynamic range far exceeds the dynamic range of all analog sound sources I ever record in my studio, and my studio’s live rooms are very, very quiet!
However, it's not as simple as it seems. Recently, we've seen the emergence of 32-bit-float ADCs, introduced by companies like Sound Devices and Zoom. These ADCs promise an unbelievable theoretical dynamic range of 1500dB (no, that's not a typo!). I’ve heard stories where clipped recordings of a jet-fighter fly-over could be restored by simply reducing clip-gain, and all the above-zero content reappeared undistorted. I’ve used the magic of 32-bit float in production myself, but only ever with digitally processed files, within a DAW - not with a recording through the ADC. Frankly, I'm still trying to wrap my head around how this cutting-edge technology works in practical terms, especially concerning the analog preamp stages preceding the ADC. It's worth noting that 32-bit converters have also made their way into studio devices like AVID's Carbon interface, although the absence of "float" in the name suggests it may be a linear PCM converter. In any case, I predict that we'll see a growing presence of 32-bit converters in the future, which might necessitate a complete reevaluation of traditional gain-staging workflows.
Interestingly, 32bit-float and 64 bit-float resolution has been around for a long time, but typically only inside a DAW or digital processor unit - sandwiched between 24 bit ADCs and DACs.
Some DAWs even allow for 32bit-float recordings, which might be useful if the subsequent workflow benefits from a higher resolution, such as some time-stretching or pitch-shifting algorithms. However, it's crucial to point out that a 32-bit-float recording made from a 24-bit converter will ultimately result in a 24-bit resolution file, with 8 additional empty bits added (or an exponent of 1). This nuanced understanding of bit depth and resolution is essential for maximizing the quality of digital audio processing and recording.
ow, let’s get into some bit resolution topics that get audio peep’s knickers in a knot. Here are a couple of examples:
Areas of Disagreement:
From 2 to 16 Million: Debunking Volume Steps Misconception
I’m glad the following misunderstanding is no longer as common as it was 20 years ago, but let’s clear it up just for fun: “24 bits means 12 volume steps up (for the positive half-wave) and 12 down (for the negative half-wave)” - This bunkum is as far-fetched as a drop bear riding a kangaroo, and I’m happy to explain.
In a binary system, each bit has 2 possible states, either 0 or 1. To express the decimal number 2 in a binary system, a second bit is required. So, the decimal number 2 equals the binary number 10 (pronounced ‘one zero’ to avoid confusion). I’m sure you get the silly joke in the title now :-).
With 24 bits assigned to each sample, we have a total of 2^24 possible values. That’s 16,777,216 different resolution steps. Let’s take a moment to take this in. If you’re recording at 48K and 24-bit, your converter accurately finds the closest amplitude value among >16 million choices, and it does so 48,000 times every second. And now consider higher sampling rates, and did you know your converter is probably oversampling its input many times? Truly mind boggling.
Double Trouble! 32 vs. 64: The Battle of the Bits
In recent years, the numbers 32-bit and 64-bit have also been thrown around in computer technology. Some may recall Apple's transition to 64-bit with the Catalina OS, which discontinued support for 32-bit applications. However, it's crucial to note that we’re talking about the word-length of software code, not the resolution of audio processing. Nevertheless, audio professionals still benefit from 64-bit applications, particularly for resource-intensive workflows that demand substantial RAM.
An oldie, but a goodie: “Master faders sound bad” or “Master faders must remain at unity gain”. This old myth has been busted many times, but it keeps coming round like a boomerang.
This is a prime example where old workflows need to be questioned and re-learned when technology moves forward. The origins of this myth lie in the analog mixing era, where it held true that the master fader should remain at unity (or as close to unity gain as possible). Only then would you get the best signal-to-noise value out of Neve consoles from the 70s. Or Harrison consoles from the 80s. Or SSL consoles from the 90s.
When the first digital mixers appeared, the summing stage had the same bit resolution as the ADCs. Consequently, the more signals were mixed together, the more one had to lower the channel faders to prevent overloading the summing bus. Music creators complained, technology companies took action.
Next, summing stages began working with a few more bits internally, allowing adjustments on the master fader instead of multiple channel faders. This approach preserved the mix balance but impacted the sum's resolution due to accumulating rounding errors and poor (or total lack of) dithering. Back then, I want to say in the 90s, leaving the digital master fader at unity gain actually resulted in better sound.
Once more, music creators grumbled, and technology companies pushed technology ahead. Let’s jump forward a couple of years to Digidesign’s TDM mixer, which utilised
48-bit linear summing (+8 bit overflow). That’s when I first witnessed a demonstration in which the “master faders sound bad” myth was officially busted without any reasonable doubt.
Since then, we've witnessed the evolution to 64-bit-float summing with the introduction of Pro Tools 11 (similar summing technology is found in most DAWs today). Null-tests have repeatedly demonstrated that the master fader has no negative impact on the sound, not even a single bit (pun intended!). Deliberate attempts were made to overload the summing bus (a near-impossibility in 64-bit-float summing). To compensate, the master fader was set to a very low level, yet it did not affect the mix resolution. The output perfectly nulls against the source.
Old cobwebs cleared, myth busted - fair and square.
I’m sure you’ve heard of “Always record as hot as possible without clipping”. If you asked why that is, chances are the explanation was that you use the full bit resolution only when maxing out the signal to 0dBFS.
This is generally true, but I’d like to clarify that all 24 bits are in action when the amplitude reaches between -6dB FS and 0dBFS on the digital peak meter. Personally, I find the word “Always” in the statement problematic. Today, this workflow is more of a personal choice, rather than a requirement - and blindly following this workflow can have negative side-effects in certain situations.
This practice originated in the early days of digital recorders, which were jittery and poorly dithered by today’s standards (if dithered at all). With these grainy-sounding 12-bit or 16-bit converters, preserving every bit of resolution was paramount. Fast forward some 30 years, and today’s converters don’t suffer from the same troubles anymore. The technology has long gone through its teething stages, and early hiccups have been addressed. All modern converters I've had the pleasure of working with deliver the same clean and transparent results at all reasonable recording levels - hot, and also cooler.
However, what still makes a difference with hotter or cooler gain staging is how analog front-end components behave, such as consoles, external preamps, and analog compressors - particularly vintage gear! These devices typically operate at +4dBU line level (aka. 1.228V RMS or 0VU). That’s the amount of voltage manufacturers designed the gear towards, some call it the voltage sweet-spot. Above that point, analog gear operates in its headroom, a few additional dBs above +4dBU . The amount of headroom available depends on the specific gear you’re using. Some cheaper devices offer only 8-10dBs of headroom, while others provide considerably more, varying from device to device.
So, what dBU input value causes 0dB FS in the digital domain? Well, it depends. For instance, the Apogee Symphony, Antelope Orion, and UAD Apollo X8 all reach their maximum with an input of +24dBU , while other converters may do so at +22dB U or even as low as +16dBU . Some converters are user adjustable. Let’s stick with the first 3 for a minute. If you aim to peak at 0dB FS, your analogue front-end needs to generate a peak of +24dBU at its outputs, which is 20dB above the intended operating level. Some gear can handle this with ease (modern high-end gear, also most preamps built into converters,) while others may start to distort (vintage valve gear, I’m looking at you!). It's worth noting that we're discussing peak levels
here, and the signal's RMS will be lower. So, the signal’s crest factor also plays a role here, too!
The key takeaway is that not all preamps and analog compressors deliver their best performance when driven high up into their headroom. Is the extra bit resolution worth some headroom distortion? Make an informed choice, know your gear, find out where your front-end’s strengths and weaknesses are. Record hot where it’s creatively appropriate. But please don’t “always” do so because of some dusty old 90s workflow.
The converter specifications don’t always tell it all!
Sample rates, bit depth, signal to noise ratio… all these are common specifications and different converters can look pretty similar on paper. Some manufacturers even advertise their entry level products with catchy phrases such as “we use the same ADC chips as [insert expensive product here]”. But marketing like this might just be a bunch of hot air. I’ve listened to old 16 bit mastering converters that clearly sounded better than some other 24 bit converters - although the technical specifications wouldn’t have suggested so.
Good sound comes from a well tuned overall converter design, not from an individual chip or component. Some of the often overlooked factors include the power-supply, which should provide the required current and stable DC, without fluctuations, ripples or noise. And most-importantly the accuracy of the converter’s word clock. The clock should be as sharp and stable as possible, and we’re talking fractions of nanoseconds here. If the word clock fluctuates even the slightest from the ideal interval, jitter occurs.
Jitter can manifest itself in various forms, including random fluctuations, periodic deviations, or even sporadic spikes in timing accuracy. These temporal jitter variations introduce irregularities in the sampling process, when analogue voltage values are captured slightly earlier or later. And since analogue voltages constantly rise and fall, the converter will capture a voltage value that’s no longer the true essence of the audio signal, thus compromising the accuracy with which the bit resolution captures the signal.
If you made it to the end of this article, then I’d like to congratulate you and welcome you to the small and select circle of audio enthusiasts who take the time to understand their gear on a deep level. The world of digital audio is a complex and ever-evolving landscape that continues to fuel intense debates and discussions among professionals and enthusiasts alike. I encourage you to always keep searching for more information, and welcome the changes as this fast evolving technology will continue to rattle what we define as the best practice again and again.
Recording and mix engineer at mixartist.com.au
Host of the Production Talk Podcast
Lecturer at SAE Byron Bay, Australia