MQA: Questions and Answers Tutorial: Temporal Errors In Audio

Tutorial: Temporal Errors In Audio

In our Hierarchical paper [2] section 2.3 Temporal Limits, we explain:

"For the audio distribution channel we can consider temporal resolution in two aspects: its ability to maintain separation between closely spaced events (and not blur them together) and its ability to maintain a precise unquantized time-base within and between channels. Low-pass filtering may ultimately impact the separation of nearby events ... while filters in the digitizing process, that are sharper in the frequency domain, may also bring uncertainty to the start, stop and center of transient events."

What temporal blur is not . . .
Let's just cover what we don't mean when we are talking about blurring in the time domain. Too often people immediately assume we have fallen into the naïve trap that imagines the time-base of a single digital channel to be quantised at the sample rate. To quote Stanley Lipshitz (from [3]):

"One often misunderstood aspect of sampled-data systems is the question of their time resolution—can they resolve details that occur between samples, such as a time impulse or step? ... time resolution is in fact infinitely fine for signals band-limited in conformity with the sampling theorem, and is completely independent of precisely where the samples happen to fall with respect to the time waveform . . ."

Stanley suggests the time-base resolution is 'infinite.' Of course that is only true if we are actually asking the question: 'What is the limit of resolution of relative phase of a continuous sinewave below Fs/2 in a uniformly-sampled channel employing TPDF dither?' Even then our ability to prove the precision, like all measurements, depends on signal/noise ratio.

If there is no dither involved and the samples are quantised, there is an approximate estimate of time-base resolution which is possibly relevant to brief impulsive signals (which by definition do not benefit from dither).

Limit = 1/((Fs x Pi x 2((n-1))) where n is number of bits.

For 44.1kHz 16-bit data this resolves to 220ps, not to 22.7µs = (Fs–1).

There is a slight problem with Stanley's remark in that "details that occur between samples" don't really occur "for signals band-limited in conformity with the sampling theorem," however, the point in time at which signals change can have 'infinite resolution.' This somewhat pedantic point is raised because, although audio can be converted in conformity with the sampling theorem, processes in the studio and playback channel (such as inadvertent quantisation, gain control or overload) can cause the signal to no longer comply and thereby introduce uncertain results while exposing the natural impulse response of the replay chain rather than retaining the idempotent property of sinc-based sampling. In our experience such errors are the norm rather the exception.

However, when we discuss temporal blur we are not talking about quantisation of the time-base within or between channels.

What we mean by temporal blur . . .
There is no standard measure for temporal blur but we believe our use of the term is clear and intuitive. A causal transmission system has dispersive properties which result from filtering or attenuation. Fine details in the time waveform can be smeared or obscured if the end-to-end impulse response is not sensitive to the signal and to the receiver (human listener).

Blurring has a direct parallel in the optical world as it relates to the design of lenses, dispersion of light in media, in image processing. In electronics, this is well understood by the designers of oscilloscopes.

There is now considerable evidence from neuroscience that the human listener appears more sensitive to time than frequency, by which we mean both that the human listener can outperform Fourier time-frequency uncertainty and that sensitivity to temporal microstructure is finer than a linear system of the same bandwidth would enable. The fine details in sound that are important for the human listener seem to be on timescales as short as 5µs (again see 2.3 in [2] but also [6]). It is critical to appreciate that these small-scale events in time do not necessarily have origins in high-frequency elements. Sounds can arrive at the microphone from different objects, including reverberation and recognizing that voice and instruments are not point sources. It is very interesting to see that this order of sensitivity is not coupled related to the human tonal limit of ~18kHz.

In a linear analogue system which has cascaded elements contributing to high-frequency roll-off, we can see that temporal detail is smoothed by a function which moves the centroid (group delay) and spreads or can merge finely-separated events. See Fig.1 from [1] reproduced below.

Figure 4: The frequency and impulse responses of a cascade of low-pass filters.

If we consider a complete recording chain, it may be that the designers of each of the individual components considered should cover the frequency range up to 100kHz, but it is unlikely. Until recently it was considered adequate for an individual component to show a response flat to say 30kHz whereas temporal considerations suggest this is barely adequate for the whole journey from performer to listener through a cascade of microphone, preamplifier, mixer, converter pre- and post-filters, replay pre- and power amplifier and playback transducer. But what we see here is that such a chain has already used up 8µs of the budget while, by extrapolation (see Figure 3 in [2]), limiting one component to 30kHz uses the entire budget.

It is critical to appreciate that this argument is based on the temporal smear of signals and not on the (unlikely) requirement that the human listener benefits from signal harmonics in the range 30–100kHz.

The fact that our systems should exhibit wide bandwidth does not mean that high frequencies are the reason; rather that our neural processing is sensitive at the microsecond level to changes made within the audio band by filtering above 20kHz. [2][7][8][10]

One final thing to see is that this definition is made in the analogue domain; sound is analogue in air and if there is a digital storage channel it should fit into this framework.
Advertisement
Advertisement
Advertisement