MQA: Questions and Answers Temporal Blur In Sampled Systems

Temporal Blur In Sampled Systems

If we look back at the roots of sampled coding systems we encounter Shannon's elegant reconstruction theorem and Nyquist's theorem on channel capacity.[23][24]

A sampled signal can be unambiguously reconstructed if it contains no frequencies higher then Fs/2; it is completely determined by capturing its values at a series of points T = 1/Fs seconds apart and can be reconstructed with a perfect low-pass filter at Fs/2.

There are a number of problems with this conceptual framework for the human listener. First, strictly mathematical, a perfect low-pass filter (aka brick-wall) cannot exist. If it did, as a matter of reciprocity, the signal would take an infinite amount of time to change and we are much more interested in signals that change somewhat rapidly with time. Tight engineering or lazy thinking has led us to a place where the requirement 'contains no frequencies higher then Fs/2' has led to the prescription 'thou shalt have a brick-wall filter.'

Figure 5: Sampled system.

The Fourier transform of a brick-wall shows the temporal response of the well-known sinc function ((sine x)/x), shown in Figure 5. This function has good and bad properties including:

• On the positive side, that the value of the sinc is zero at each sample point except the center, which conveys the property of idempotency, while

• On the negative side, the sinc filter smears the time response of the system from ±∞ (±infinity) (a characteristic which has no counterpart in the physical world), and,

• There is considerable evidence that while the human listener may not be able to hear the ringing frequency, we are nevertheless sensitive to the overall envelope. [2]

• An uncertain benefit of this function is that the filter has a linear phase response and it has been both convenient and commonplace to approximate the sinc filter in integrated circuit filters in A/D and D/A converters since the mid-1980s.

However, real digital audio systems do not have brick-wall filters, which means that the system is never ideal from this narrow perspective. So what happens if the filters in A/D or D/A are not ideal? If the original analogue signal contains frequencies higher than Fs/2 then downward aliasing occurs. The existence of aliasing is not evil—it is after all the right signal reproducing at the wrong frequency—however in the case of A/D conversion at low audio sample rates such as 44.1kHz, where there can be significant incoming energy above 22kHz, the most severe constraints pertain because we do not want ultrasonic components appearing spuriously in the audible band. Despite this A/D converters commonly use steep half-band filters, which allow downward-aliasing into the region between 18 and 22kHz (footnote 4).

One way to avoid downward aliasing is to increase the sample rate so that either the Nyquist rate for that content is not exceeded, or because the higher rate allows a compromise with less severe filtering.

In recent years there has been considerable progress in sampling theory, particularly in specific areas where signal statistics are favorable. One example applies to audio music and speech signals; we can change the rules because the signal has a finite rate of innovation. [25][26]

Having captured the signal digitally we have to maintain the constraints; remember Shannon permits 1/Fs degrees of freedom per unit time, but if we create a single discontinuity it loses validity. Some signals created by digital processing are not band-limited, such as processing, re-quantisation, compression, clipping, etc. It turns out that our current playback systems are never lossless in the digital and analogue domains.

Reconstruction is used to convert back to analogue from digital. The traditional view of sampling requires a brick-wall filter to maintain a flat frequency response, to prevent upward aliasing and for zero ambiguity. More recently, particularly since Craven's paper, it has been found helpful to use upsampling with more gentle reconstruction filters, including those with apodizing characteristics. [20]

Nowadays we find high-performance D/A converters offering filter choices, sometimes with descriptions such as 'measure vs listen.' Part of the impetus to build converters this way is to deliberately remove ringing from the A/D and to superimpose only post- artifacts on the resulting analogue. The compromises sought include drooping HF response and/or upward-aliasing. However, using arbitrary filters that are not idempotent guarantees that the analogue output is in fact not a reconstruction of the original and is therefore subject to arbitrariness; the output is no longer a faithful reproduction of the input and most commonly allows high-frequency response to suffer.

One important aspect of MQA is that the sampling and reconstruction filters in the encoder and decoder are complimentary thereby ensuring that the resulting analogue output is a faithful copy of that monitored in the studio.

Figure 6: Examples of end-to-end responses of sampled systems with different kernels. In the top row we see the frequency response of a channel while the bottom row shows the corresponding impulse response. On the left is a typical sinc-type linear-phase channel. On the right the channel uses a Gaussian filter at each end, the impulse response is near ideal but the frequency response droops early. The Gaussian system is ideal for image transmission or applications where waveform is important such as in an oscilloscope. The human hearing system has a different time/frequency balance and sounds are more optimally transmitted using an intermediate kernel as shown in the center. [9]



Footnote 4: Possibly these A/D converters have been tolerated because the human judges all signals between approximately 18 and 26kHz to have the same low pitch.
Advertisement
Advertisement
Advertisement