Author's Note: We are grateful to Stereophile for the opportunity to address some frequently repeated technical questions appearing in comments to articles. Recently this has included misunderstandings about noise calculation, dynamic range, resolution, definition, music spectra, channel capacity, lossless processing and temporal aspects of digital channels.
To simplify this document we have grouped the topics and set them as questions and answers either as response, tutorial or axiom. Some months ago we published a comprehensive Q&A for an online forum and to avoid repetition we occasionally refer to topics already discussed there (see [37] in the "References" sidebar).—J. Robert Stuart
MQA Introductory Background
It is now widely, although not universally, accepted that "hi-rez" digital audio, with increased sampling rate or bit-depth, delivers improved sound quality. But it does so at large cost to coding efficiency. A 24-bit/88.2kHz recording requires three times the data rate of a 16-bit/44.1kHz alternative, and that ratio increases by further factors of two as sampling rate is doubled again to 176.4kHz and then to 352.8kHz, the sampling rate of DXD. While the progressive improvement in sound quality is welcome, it takes a disproportionate toll on data rates and storage capacity. Simply increasing sampling rate also fails to address head-on why it is that 44.1kHz and 48kHz sampling rates impose subjective limitations. Instead, sampling rate has become a proxy for resolution.
Recent hearing research provides support for the long-standing notion that the time-domain performance of anti-alias and reconstruction filters—most especially steep digital linear-phase filters—is responsible for perceptible degradation of sound quality. Recently, direct evidence for the audibility of low-pass filters used in digital audio has been published. [18]
It has been known since at least 1946 that the Fourier time-frequency uncertainty inherent in conventional signal analysis can be 'beaten' by human listeners, and by a significant margin.[31][32] Indeed, recent experimental studies have shown temporal discrimination at least 5 times higher.[33][35][36]
These findings accord with the idea that the capabilities of human hearing have been determined by evolutionary requirements, in particular the need to identify sounds as 'potentially threatening' or 'non-threatening' in the shortest possible time interval, thereby providing the maximum opportunity for fight or flight. While vision plays a part in this too, of course, we cannot see through 360°, around corners, or at as low light levels as some predators.
In these circumstances in particular, our hearing is the primary sense by which we detect danger, and speed of detection and rapid estimation of direction and range is of the essence. As too is the ability to separate direct sound from short-delay or closely-spaced reflections—which naturally require the resolution of short time intervals that are independent of frequency or bandwidth of the source.
Our understanding of natural soundscapes, reverberation, animal vocalisations and speech, requires adjustable time/frequency balances which, up until now, have not been adequately accounted for in audio system design.[9]
This all suggests that the time-domain acuity of the human auditory system may have been more important than frequency-domain acuity and explains why its time-frequency uncertainty is so much superior to that of an FFT analyser (and its close relative, the sinc-kernel of digital sampling). Causal signals are key to our achieving this feat; if natural signal waveforms are time-reversed we can no longer outperform the time-frequency uncertainty of Fourier analysis.[34]
Temporal acuity manifests a survival characteristic, one with origins that must reach back to much earlier in the mammalian timeline than the emergence of homo sapiens.
It would be strange indeed if our remarkable time-domain acuity were irrelevant to the perception of music. In fact there is persuasive evidence that this is not the case: those experimental subjects who have proven most adept at resolving time-frequency uncertainty are musicians, suggesting that time-domain acuity is enhanced—trained—by the process of becoming a musician.[33] So the traditional frequency-domain view of audio system performance is fundamentally at odds with our perception of music. A fresh approach to the specification and design of high fidelity audio encoding and equipment which takes much closer account of system time-domain performance is therefore long overdue.
It is now widely, although not universally, accepted that "hi-rez" digital audio, with increased sampling rate or bit-depth, delivers improved sound quality. But it does so at large cost to coding efficiency. A 24-bit/88.2kHz recording requires three times the data rate of a 16-bit/44.1kHz alternative, and that ratio increases by further factors of two as sampling rate is doubled again to 176.4kHz and then to 352.8kHz, the sampling rate of DXD. While the progressive improvement in sound quality is welcome, it takes a disproportionate toll on data rates and storage capacity. Simply increasing sampling rate also fails to address head-on why it is that 44.1kHz and 48kHz sampling rates impose subjective limitations. Instead, sampling rate has become a proxy for resolution.















