Listening Tests and Absolute Phase

Editor's note: The subject of "Absolute Phase," more correctly called "Absolute Polarity," was of intense interest to audiophiles in the 1980s, culminating in the publication of Clark Johnsen's 1988 book The Wood Effect. I wrote an article on the subject for the November 1980 issue of English magazine Hi-Fi News & Record Review, which expanded on that subject to include some thoughts on audio reviews in general, thoughts that are just as relevant now as they were 37 years ago. It is reprinted here with the kind permission of Paul Miller, the editor of that magazine, now called just Hi-Fi News.—John Atkinson

The problem confronting the magazine reviewer when organising the necessary listening tests to accompany/reinforce the measured behavior of a device under test is complex. There has never been a problem with the measurement aspect; as long as someone has access to the same test gear—and full knowledge of the test conditions—then he should be able to replicate the critic's findings exactly (assuming an infinitely narrow spread of behaviour from sample to sample—a rasher assumption with some manufacturers' equipment than of others). However, when it comes to determining reliably the audible (or inaudible?) effects on music program by an amplifier/cartridge/loudspeaker etc. then the going gets tough.

Unlike the reaction of an oscilloscope, that of a listener involves interaction: what he is hearing; what he had been expecting to hear; the identity of the equipment; the emotional effect of the music program; the emotional effect of other competing stimuli (a recent cup of coffee, a not-so-recent visit to the toilet); the apparent expectations of his fellow listeners; the ultimate purpose of the test; the desire for self-consistency and hence self-esteem; all these can—but needn't always—color the listener's assessment. Obviously this will affect the reliability of any conclusion, both when used to predict the same listener's reaction to the same piece of equipment, and when used to predict other people's reactions.

Which brings us neatly back to the point of reviews, which ultimately is to enable a reader to decide whether the effect of a piece of equipment will or will not be beneficial, and if beneficial, more importantly whether the degree of any improvement is sufficient to justify the expenditure. Even if the reader has the necessary equipment and expertise/experience, the measurement-only review doesn't supply this information. It can still exist, of course, in isolation, but magazines don't enjoy a continued existence when only publishing information of no practical benefit to readers, no matter how elegant in its own right. Any audible effects of the measured imperfections have also to be communicated.

Thus there are two distinct processes involved: the determination (listening test results); and the resultant communication. The latter has been a much-abused area of journalism, perhaps because of the lack of a precise vocabulary to describe aural sensations. Adjectives drawn in from all aspects of human behaviour have been pressed into service when describing the sound of hi-fi equipment—a "velvet sheen" to the midrange, a "suet pudding-like" bass, "metallic edge to vocals," "green felt coloration" etc. Adrian Orlowski's recent article (footnote 1) was an attempt to define subjective terminology and perhaps Peter Moncrieff in the USA has gone farthest in providing a rational language to communicate subjective impressions. After all, to quote a recent contributor to the debate (footnote 2), "Should I be about to spend $2,000 on a Mark Levinson ML-1, I want to know about the clarity of the midrange, not whether its flavor is chocolate or vanilla."

More important, however, than the fact that the message can be garbled by the language used, is the validity of that message. As indicated, the reliability of a listening test can be seriously affected by a number of extraneous circumstances and a reviewer must exclude such extra stimuli. Without such care the test results will be randomised: any observable change must only be produced by the insertion of the test item into the chain, or otherwise no conclusion can be drawn. And even then, a major pitfall lies ready to ambush the unwary. If a change is reliably heard, how can any value-judgement be made without knowing what the program material should be like. As Peter Moncrieff put it when defining his "M rule" (footnote 3): "No evaluation of a device can be scientific if that evaluation is carried out through other devices that are imperfect."

A recent amplifier review (footnote 4) observed that as the amount of reverberation on some records appears to be less when using one amplifier than another, the first amplifier—a transistor design—must therefore be suppressing the ambience. But unless the ambience level on the recording is known, this can only remain one of a number of hypotheses. One could just as well say that the other amp—a valve design—was somehow adding ambience. In the actual review, this was ruled out as not being consistent with the author's apparent intrinsic belief that in a valve/transistor amp confrontation, faults should be attributed, if possible, to the transistor design. Carrying out any test non-blind ie, with the identity of the device under test known to be the listener, brings in all the above-mentioned additional stimuli, totally invalidating any conclusions drawn. The listener's capacity for self delusion so that he really does hear differences which are nonexistent in reality (but enjoy a healthy existence in the pages of magazines) when he is aware of the device being tested, I would say is practically infinite.

Scientific Method applied to equipment reviewing does not consist of setting up a test and drawing the conclusion which fits in best with the reviewer's preconceptions. Sadly, this is the way in which many published tests are performed, because when care is taken to remove all variables, bar one, the device under test, there is far less scope to wax lyrically in the true subjective-only review manner. How much more journalistic flair there is in writing that "the preamp made the music sound like it was being played by amateurs" to quote an infamous review of the Quad 33 preamplifier, than perhaps to say that there is an 0.75dB depression between 1 and 4kHz. Unfortunately, magazine sales show that the former style of writing appeals as much, if not more, to the layman. Similarly, the publicity-conscious Matti Otala saying that "We do not know anything about audio!" (footnote 5) sound much more impressive to the layman than would a dry technical argument—as in one of his many papers—as to which parameters contribute to what audible effect.

It is an unfortunate fact, however, that to produce a magazine review within the available money and time budgets, not all the variables can be eliminated totally. To apply the necessary rigorousness to satisfy a psychologist, say, would mean that a review might only appear after an overlong preparation period, which, in turn, would lead to the review appearing after the model had been made obsolete, particularly so with those from Japanese manufacturers. Happily, short cuts which only slightly compromise the review's reliability, derived from criticism of the performing arts, do exist. Use of a "transfer standard," to use reviewer Trevor Attewell's terminology, gives a "ground reference" to the subjective comments, and ensures repeatability of results while, in the first instance, copping out from absolute judgements. A record critic can compare a new performance against, say, a well-known Karajan one, secure in the knowledge that by doing so, the majority of his readers already familiar with the Karajan will be able to follow the reasoning behind his conclusions.

Absolute judgements, however, can only be made when the absolute program quality is known. With loudspeakers this can be effectively ensured by using self-recorded master-tapes, or with electronics, by using Peter Walker's straight-wire bypass (footnote 6) where the original state of the program material, before being processed by the device under test, is available at the flip of a switch.

Another shortcut, where reference to the "real thing" for practical reasons is not available, ie, with disc playing equipment, is to use experienced listeners who score the device under test as to how far it departs from their conception of "reality." Obviously, the magazine reader has to take that very much on trust, but with reliable listeners, work by Martin Colloms, Gordon King and Noel Keywood has shown good correlation between observed departures from the consensus panel opinion of "reality" and measured deficiencies in the device under test—using source material of known quality, of course. There is no point in judging an item on the way it handles stereo imagery, for instance, when using program not possessing coherent stereo information. Unfortunately many reviews in American magazines do do this, resulting, perhaps, in a "good" loudspeaker with a carefully and evenly controlled dispersion pattern—essential for good stereo when the listener is even a small lateral distance from the "stereo seat"—being downrated against a much poorer design with all manner of side lobes at different frequencies, which nevertheless, on non-coherent recordings of the spaced-omni type, can give a (program and frequency dependent) impression of "solidity." Moncrieff's "M-Rule" is once again being violated.

This is not to suggest that HFN/RR is overconservative, or even dogmatic, in its approach: to adopt an attitude of being certain that no differences between amplifiers should exist—as in the "bumblebees can't fly" proof—and thus examining listening test evidence on that basis, is as unscientific as ascribing every subjective difference "heard" in an imperfect test to the object of the investigation, ie, the device under test. Reviewer, magazine, and reader must be open-minded so that experimental evidence contradicting personal dogma, if shown not to be spurious, must be examined.

Absolute Phase
This rather lengthy preamble leads to the subject of absolute phase, for it was while trying to replicate reviewers' subjective tests data and discovering that as spurious causes for differences were removed—level differences, frequency response differences, awareness of device identity—so were the audible differences between amplifiers, by such means, Stanley Lipshitz of the University of Waterloo in Toronto, found that the polarity of signal absolute phase did matter. He wasn't trying to prove that differences between amplifiers were nonexistent, but, rather as a mathematician, as well as a hi-fi enthusiast, was applying a somewhat more rigorous scientific methodology to determine what the audible differences, described in absolute terms by the American underground press, actually were.

In early 1977, he had discovered that the slightly asymmetric waveform produced by the tone generator of the Wireless World Dolby-B noise-reduction kit sounded different with the device in "record" mode than when in "replay" (footnotes 7,8,9,10). The only circuitry change between the two modes was the insertion of an inverting unity-gain buffer on one but not the other. Level, distortion, and frequency response differences were all examined and found to be insignificant, so all that remained to explain the audible difference was the difference in phase polarity of the signal.



Footnote 1: "A Rational approach to subjective evaluation," Adrian Orlowski, HFN/RR April 1980 p.49.

Footnote 2: "Issues of reliability and validity on subjective audio equipment criticism," Larry Greenhill, Audio Amateur January 1979 p.17.

Footnote 3: "The M Rule," J. Peter Moncrieff International Audio Review 3, 1978 p.36.

Footnote 4: Michaelson & Austin TVA-1 amplifier review, Dave Berriman Practical Hi-fi, April 1979 p.99.

Footnote 5: Matti Otala interviewed by Basil Lane, Practical Hi-fi, March 1979 p.82.

Footnote 6: "Dynamic testing of audio amplifiers," Frank Jones HFN/RR, November 1970 p.1655.

Footnote 7: Letter to the Editor, Wireless World, May 1977 p.62.

Footnote 8: Letter to the Editor, Wireless World, October 1977 p.60.

Footnote 9: Letter to the Editor, HFN/RR, January 1979 p.81.

Footnote 10: "A little understood factor in A/B testing," S. P. Lipshitz, BAS Speaker March 1979.
Advertisement
Advertisement
Advertisement