Features

Audio Engineering: the Next 40 Years Page 2

Let's conservatively assume that the resolution and accuracy of gesture technology will double every two years (though given the economic incentives, doubling each year may be more realistic). Common gestural devices ($100 at 150 pixels per inch [PPI], by today's standards) will boast two orders of magnitude greater resolution by about 2025. Costing only $1 for the ability to map and track 15,000 3D positions per inch, such devices will allow for much greater degrees of freedom and movement (think Minority Report without the $1 million price tag). By 2025–2030, the price of sophisticated, high-resolution, free-air gestural control will have fallen to commodity levels and the devices will be mass-produced.

Will gestural control replace touchscreens and mice by 2025? No. But the transition will be well under way. Clearly, the next 40 years of human/computer interaction will be free-space and gestural.

Spherical Audio
Let's move on to 3D virtualization. We need to think systemically, with video, audio, and head-motion tracking all working together seamlessly. We'll start with virtualized audio.

Both gaming and film are quickly moving into providing a sense of total audio immersion. In real acoustic spaces such as movie theaters, we're seeing the delivery of spherical audio from emerging technologies like Dolby Atmos, DTS Neo, and Barco Auro. However, these immersive real-space technologies require more speakers and amplifiers, more expense, and a great deal more work to maintain—all things that consumers embrace slowly, if at all. The average consumer has balked at six speakers. Requiring 10, 14, or 22 speakers, and the amps to drive them, is a nonstarter.

Market realities suggest that the primary thrust of 3D audio innovation will occur through headphones. Already, first-generation 3D headphone products such as DTS Headphone:X are breaking ground. Over the next decades, popular gaming and entertainment media will lead the relentless push toward fully immersive audio realism, predominantly over headphones.

Legitimate, full-coverage headphones (not only earbuds) have exploded into the mass consciousness in just the last few years, and the trend will only accelerate. Popular culture is becoming increasingly conditioned to accept "cans" as a primary method of consuming audio.

Jimmy Iovine and Andre Young, aka Dr. Dre, the creators of Beats by Dr. Dre headphones, have arguably done more than anyone else to position headphones as a generational, cultural, and global style statement. Beats now sells well over $1 billion of consumer audio products every year, and has captured more than 60% of the market in headworn audio products costing more than $100. And Beats isn't just following technology trends—it aims to lead, having recently contributed $70 million to the University of Southern California for the brand-new USC Jimmy Iovine and Andre Young Academy for Arts, Technology, and the Business of Innovation.

There are now entire stores devoted to headworn technology. I recently spotted one such store, the Headphone Hub, at Houston's Bush International Airport (see photo). This is not a fad: over the next 20-30 years, 3D soundfield production and design will be one of the biggest growth areas in audio delivery via headphones. Microphone designers, headphone makers, audio software engineers, and postproduction engineers will move from today's paradigm of x-dot-x channels (5.1, 7.2, etc.) to a seamlessly spherical, object-oriented soundfield.

Headphone Hub, Bush International Airport, Houston, in September 2013. Photo: John La Grou

If we plot a chart of 3D audio growth with a projection of it doubling every two years (fig.5), today's $1000 3D audio solution will be commodity priced by 2025, combined with a hundredfold improvement in spatial and timbral resolution experience over headphones.

Fig.5 Immersive 3D audio growth.

Conservatively, by 2025–2030 we should expect that highly realistic immersive audio will be part of every low-cost portable device, gaming console, and home entertainment system. And by about 2040, on-ear audio will rival or exceed the subjective performance of today's best audiophile rooms and loudspeakers. Moreover, in a very short time, perhaps as soon as 2020, common commercial music will be routinely mixed in full 3D immersion and delivered in an open-source format, most likely a derivative of Dolby Atmos or DTS Neo.

Virtualized Visuals
Virtualized imagery plays a central role in the future of audio production. The future of headworn visual displays is clear: higher resolution, finer dot pitch, better dynamic range, lower latency, and, of course, relentless evolution toward three-axis immersion as our standard image format.

By now, many of us have seen photos of and read articles about the prototype of the Google Glass, a headworn computer with a head-mounted display (see photo). Sources claim that the Glass will be available in 2014 for a street price of around $400. This is a paradigm shift. If there were only one takeaway from this brief look into the future, it should be this: We are moving from a culture of handheld devices to one of headworn devices.

Sergey Brin, co-founder of Google, wearing Google Glass. Photo: Reuters/Carlo Allegri

It won't be long before smart mobile computers are designed into small, lightweight, headworn devices not unlike the Google Glass, but increasingly more powerful and ubiquitous. Vendors such as Apple, Intel, Microsoft, Oakley, Olympus, Samsung, and Sony, along with at least a dozen startups, are all reportedly developing headworn smart-mobile devices.

While Google and others are defining the mainstream of headworn gear, I think there's another kind of device that's more directly applicable to the future of audio and media production: gaming displays. Of all the gaming displays now in development, I think one of the most important is the Oculus Rift (see photo).

Sergey Orlovskiy using the developer kit version of the Oculus Rift (with separate headphones).

The Rift has one discrete video display per eye, for true 3D (the resolution in development is true 1080p), and unrestricted head-motion tracking: If you turn your head, the audio and visual elements of the scene move with you in lifelike, immersive realism.

Observing gamers using the Oculus Rift, I feel that we're seeing the future of display technology. To see what I mean, watch the YouTube video "The Best and Funniest Oculus Rift Reactions." What you'll see is the most deeply convincing, fully immersive virtual-reality experience to date. The experience of Rift reality can be uncanny enough to be disturbing. Oculus plans to have shipped their first commercially available product by the time you read this.

To return to my analysis of trends: The comprehensive cost-performance efficiency of video displays since 1980 shows a doubling roughly every 18 months (footnote 2; a doubling of efficiency every year from now on would not be surprising). Thus, by 2025, the CPE of immersive displays will be at least 100 times better—at a commodity-priced entry point. (fig.6). By 2035, immersive visuals will be at least 10,000 times more powerful than today; and by 2050, we can reasonably project that commodity-grade, headworn virtuality will be nearly indistinguishable from what we see with our own eyes in real space. We also know that head displays will be much smaller and lighter, and perhaps use a technique called direct projection, in which images are projected (scanned) directly onto the human retina one pixel at a time.

Fig.6 Video display cost-performance trend.

Head-Motion Tracking
Immersive sound and picture would be impossible if did not "track" with natural movements of the head. When you turn your head, the virtual sound and picture must react as they would in real sensory time and space. Effective head-tracking requires near-zero latency response, with high spatial resolution in all axes of head movement. However, head-motion tracking is a relatively young technology using various sensing methods: IR-optical, e-field, RF, and so forth.

Footnote 2: Display technology cost-performance weighted sum of Resolution, Color Depth, Dynamic Range, Latency, Dot Pitch, Refresh, Contrast Ratio, Viewing Angle, Brightness, and Energy Use . . . vs cost.

Features

Audio Engineering: the Next 40 Years Page 2

ARTICLE CONTENTS

ArtIcle Contents