"Audio Origami" or "Folding" Questions
Again, these were posted online:
a) MQA "folds" the double- and quad-rate audio data into the lowest bits below the baseband musical content of the 48/24 FLAC container. The "folded" MQA signal is fit into a 48/24 FLAC container for further compression. Undecoded MQA is claimed to restore the full baseband audio signal, but only to a bit depth of 16 bits, or 48/16. After all there are only 24 bits of resolution in a 24-bit FLAC container.
b) How in the world has MQA fooled so many people into thinking that they could "fold" the dual- and quad- rate audio data and "hide" it under the 16 (or possibly 17 bits) of baseband information?
c) MQA claims that the dual- and quad-rate information is very low in level and that it can safely be "folded" and beneath the 16-bits of audio data in the baseband that gives the "–120dBFS per-root Hz".
d) The quad-rate audio is compressed using lossy techniques (referred to as "encapsulation" in more recent materials by MQA) to fit within the lowest 1 or 2 bits of the 48/16 FLAC container.
e) The dual-rate audio is compressed using lossless techniques (presumably similar to FLAC) to fit in roughly 5 bits below the baseband audio data. This can be seen in recent articles on the Stereophile website, such as here. This shows how there is no musical content below –120dBFS noise per-root Hz, requiring only 16 bits to encode the baseband audio data. Please note that this only requires 16-bits of baseband audio data.
f) In effect it is no different than MP3 in this regard. Once the data is compressed in a lossy format, it is gone forever and can never be recovered.
g) Stuart and Craven (currently) claim roughly 7.5dBFS noise-per root Hz per bit (after claiming 9dBFS noise-per root Hz in the original AES paper).
h) Examining the above graph, we can see that the lossy compression used for region "C" is roughly 1 bit high. This means that the 24th LSB of the MQA 48/24 FLAC container contains all of the lossy encoded quad-rate data. The next fold utilizes lossless techniques (likely similar to FLAC) compress the dual-rate audio data. This requires much more room in the 48/24 FLAC container and appears to approximately use the 23rd through the 20th LSBs of the 48/24 FLAC container.
Answers:
a) That is broadly correct. Without a decoder the playback precision is 24 bits (or 16 with a 16-bit original), however for backward compatibility the MQA signalling has been placed in the region of the 16th bit in order that the highest sound quality can be achieved not only by listeners with no decoder but from a full-resolution file which in the 'last mile' of playback may be truncated to 16-bits such as over an Automotive, Airplay or Bluetooth link, or in fact placed on a CD. Some commentators (including the current questioner) have assumed that because the signalling is at this level that there is no information below that point. In fact a listener with a decoder can benefit from a much lower noise floor—and therefore from quieter details in the recording. Once again we have a confusion of understanding units: 24-bit resolution can pertain in a converter while the signal has a different noise-floor (eg, 20 bits).
We have already published some examples that show the dynamic-range capability of MQA that far exceeds 16 bits.[43][44]
b) There is no foolery here: MQA does indeed reconstruct a remarkably close approximation to the original ultrasonic information from the lower bits of a 24-bit signal.
c) The structure of MQA is somewhat flexible and doesn't really conform to the model that may be implied by the phrase "16-bits of audio data in the baseband". A 24-bit MQA file may be auditioned in at least three ways:
Truncated to 16 bits and auditioned without a decoder
Truncated to 16 bits and decoded
Fully decoded from 24 bits. We need to be aware that the recovered 0–20kHz 'baseband' signal is not the same between these three presentations. In the very broadest terms the top 16 bits convey most or all of what will end up in the 0–20kHz range, and the remaining bits can improve the resolution of the baseband and/or extend the frequency range. However it is not a case of "these bits provide resolution enhancement and those bits provide range extension": MQA is more sophisticated than that. A consequence of the three presentations is that if we are to talk about performance we need to be very clear which presentation we are talking about. If we assume the last: 24 bits, fully decoded.
'Origami' has been very useful in explaining MQA to a semitechnical audience, but like other analogies it cannot be taken too far. One really needs to consult the AES October 2014 paper mentioned above [2] and also the "Doubly Compatible Lossless Bandwidth Extension" patent, though with the caveat that that patent is now quite old. It presents a rather crude model and MQA has got a lot more subtle since it was written.
d) No—it's more complicated than the simplified Origami model implies.
e) See below.
f) MQA is completely different to MP3. In particular it does not smear in time; it does not have a dynamically-varying frequency response and it does not introduce a signal-dependent noise floor.
g) The "7.5" and the "9" are never mentioned by Stuart and Craven and appear to be a misunderstanding in the question.
e) & h) The questioner's interpretation of the simplified 'Origami' picture is unfortunately completely off-track. It needs to be reconsidered in its entirety in the light of the paper and patent referred to above. To help with this we have included a tutorial giving an extract from our presentation to the Japan Audio Society. [3]
Truncated to 16 bits and decoded
Fully decoded from 24 bits. We need to be aware that the recovered 0–20kHz 'baseband' signal is not the same between these three presentations. In the very broadest terms the top 16 bits convey most or all of what will end up in the 0–20kHz range, and the remaining bits can improve the resolution of the baseband and/or extend the frequency range. However it is not a case of "these bits provide resolution enhancement and those bits provide range extension": MQA is more sophisticated than that. A consequence of the three presentations is that if we are to talk about performance we need to be very clear which presentation we are talking about. If we assume the last: 24 bits, fully decoded.















