Can high fidelity enthusiasts adopt multi-channel audio formats?

Peter Wurmsdobler
9 min readMar 1, 2024

--

In my youth owning a powerful high fidelity sound reproduction system, short “Hi-Fi”, was something a lot of people aspired to. The desired “Hi-Fi” would be a two-channel stereophonic system, a stereo, and composed of separates: a stereo power amplifier (perhaps even with a separate stereo pre-amplifier and two mono-block power amplifiers), and of course a pair of speakers. For some reason I never questioned if two-channel stereo was the best way to reproduce music; we have two ears, two channels should be enough. Upon studying Floyd E Toole’s book Sound Reproduction — The Acoustics and Psychoacoustics of Loudspeakers and Rooms, however, I realised that such a stereo may actually not be the best way to reproduce complex sounds, or in the authors words: “… the restriction to two channels severely limits the sound fields that can be delivered to listeners”. This story is about my personal journey towards embracing multi-channel audio formats in sound reproduction for music, in particular classical music.

Going back to my interest in stereo, I loved attending trade shows in the 1980ies; all the finest HiFi components were exhibited, touted as systems capable of producing incredible sound, often with very questionable explanations or science. Eventually, I spent all my savings on a system (T+A speakers, Harmon Kardon amplifier); I was very happy listening to Pink Floyd or Beethoven symphonies. These were the glorious years of stereo systems, where artists exploited the effects that can be achieved when panning sound from left to right, like Pink Floyd in Final Cut (1983)”. It was lovely being able to hear the double bass of an orchestra on the right, the violins to the left for an orchestral piece of music. Friends even built their own components, as it was interesting to design and realise a power amplifier from discrete electronics, and to fabricate speakers from nice wood and off-the-shelf drivers. I had a go building my own class-D amplifier (albeit a bit later), obviously a stereo amplifier:

In a parallel world, from the early 1990ies on, multi-channel audio systems started to appear on the market, and the AV-receiver. These multi-channel receivers and amplifiers seem to cater mostly for the home cinema but were sneered at by the “audiophile” community, including myself, for pre-conceived reasons such as: their purpose is to produce catchy cinematic sound effects such as exploding helicopters for Holywood blockbuster movies; they serve the mass market by using cheap components, and they are unable to produce the pure sounds a sophisticated stereo system would be able to. A division existed between two mutually exclusive worlds: those interested in audio systems apparently for cinematic effects only, and those wanting to listen to pure music in a purist set-up, the audiophiles. Can the gap be bridged? Yes, by looking into the history and requirements.

Brief History of Multi-Channel Audio

The phonograph was invented in 1877 by Thomas Edison and improved by Alexander Graham Bell in the 1880ies; sound was originally recorded onto cylinders and later onto discs, the phonograph record was to be played back on an appropriate gramophone; only one channels was recorded and it was used mostly for music. In the 1930s British engineer Alan Blumlein came up with a two-channel stereophonic sound, prompted by the shortcomings of a single audio channel in cinemas when the film sound did not follow its source on the screen, a real issue. Interestingly enough, and ironically so, it was the cinema industry that enabled the development of stereophonic sound, pushing for more and more channel to create a 3D sound stage, ambience and envelopment. To name but a few: Fantasound (1940), Cinerama (1952), CinemaScope (1953), Dolby Stereo (1976), THX (1983), Dolby Digital (1986), Dolby Surround (2010), Auro-3D (2010), DTS Neuro:X (2011), Dolby Atmos (2012); the later products offer many channels at high resolution, both in terms of sampling frequency and number of bits, these products can generate an “Immersive Sound”.

Recording technology for audio, in particular consumer audio, lagged behind the advances in cinema and its recording media (magnetic or optical). There was one kind of development in the 1940ies using magnetic tapes, such as the Magnetophon and later in the 1950ies using a 1/4-inch tape, as well as several attempts to record multiple channels onto a disc with various combinations of vertical and horizontal motion in one groove or even synchronised grooves. But it was the stereo record based on concepts originally invented by Alan Blumlein that became successful from the late 1950ies onwards: record two channels in a groove with two motions perpendicular to each other, but tilted at 45 degrees. The first mass-produced two-channel stereo vinyl records were released in 1958. In the following decades, most music was released in that way, initially from monophonic master tapes, and gradually mixed down from multi-track recordings to two-channel media. The concept of using two channels only was later carried over to the compact cassette tape (1963) and the compact disc (1982); the Red Book CD specification includes two channels by default. Stereo became synonymous with two-channel media and the common perception got engrained in our heads: two-channel stereo is sufficient for sound reproduction, it is just a matter of optimising all links in an established chain.

After studying more in detail the principles of acoustics and sound reproduction, it became clear to me that it is impossible to recreate three dimensional sounds from two signals; as control systems engineer, I should have known better. In contrast, the multi-channel formats have all the provisions needed at sufficient resolution to allow proper stereo sound reproduction. For instance, Dolby Atmos technology allows up to 128 audio tracks plus associated spatial audio description metadata; channels can be encoded losslessly at up to 24-bit/192kHz. An other example: DTS-HD Master Audio supports a virtually unlimited number of surround sound channels, can deliver audio quality up to lossless 24-bit at 192 kHz. There are even quite different formats such as AmbiSonics, which allowes to capture the three-dimensional sound field at any point and reconstruct a facsimile at a specified point using a certain number of channels and speakers; there is even an open source stack. To summarise, the technology to encode, transmit and decode multi-channel audio streams is already reality, e.g. Dolby Atmos for Music was adopted by streaming music services Tidal and Amazon Music; Sennheiser launched a new sound bar with built-in Dolby Atmos technology, the AMBEO sound bar.

Requirements for Sound Reproduction

Given the audio technology that has been developed over the past decades and its capability, would it not make sense to embrace and take advantage of all the possibilities it offers, rather than insisting on the established means, stereo, to reproduce sound? All that remains is to make a paradigm shift in one’s mind. In order to facilitate in this paradigm shift, perhaps it is important to step back and ask the question: what is the objective of sound reproduction and what are the best means to obtain good sound irrespective of known ways? To me, the objective of sound reproduction is to recreate a sound close to the one that could be perceived in the original venue with one’s eyes closed, or perhaps as imagined what it would sound. The reproduction is the last link in a long chain from sound production to consumption:

Two ways of sound production reproduction, release in comprehensive format combined with two ways of sound reproduction yields four possible paths for sound to travel.

There are two sources for recordings of the sound paths above:

  1. Live performance multi-track recording of a performance at a real venue with a multitude of microphones placed at strategic positions to capture both individual instruments as well as the sound field of the venue that encode its acoustic qualities determined mostly by the reverberation characteristics.
  2. Studio performance multi-track recording of a performance in the studio with a multitude of microphones placed at strategic positions to capture mostly individual instruments and as little as possible of the acoustic qualities of the venue.

With regards to #1, as a control systems engineer, the term observability comes into my mind: how many signals are needed to be observed in order to be able reconstruct the system state (sound pressure and direction, i.e. sound pressure vector) in any point in time & space of the venue, given an adequate spatial model of its acoustic behaviour? In other words, how many microphones, and what kind of microphones, are required to be able to reconstruct the sound vector for every point in the audience (see AmbiSonics). The objective is to compile all tracks into a canonical representation of the sound field as a function of time with some additional metadata that describe the venue’s acoustics. As far as #2 is concerned, the objective might be to capture enough channels as a canonical representation of the sound field as a function of time that is mostly agnostic to any venue, even the studio. In both cases, the released music needs to contain enough tracks and metadata to make a production of a target sound possible at the consumer.

There are two sinks for recordings of the sound paths above:

  1. Listening room the concept of controllability comes into my mind: how many actuators (speakers) are needed to be able control the system state (sound pressure and direction, i.e. sound pressure vector) in any point in time & space of the room, given an adequate spatial model of its acoustic behaviour. Given a controllable room and a number of speakers+amplifiers (and all their frequency responses) it should then be possible to calculate the digital filter topology and its parameters that can synthesise the channel signals to be fed to every speaker; to this end, it will be necessary to use some system identification techniques to determine the transfer function matrix of the amplifier-speaker-room system using a broadband excitation signal at the amplifier input and a reference microphone at the listener’s position; this is a simple calibration that has to be performed only once after installation of the multi-channel audio system (or change of room furniture). The inversion of the identified room, amplifier & speaker transfer function combined with the recording venue transfer function using the multi-channel metadata should yield the necessary transformation topology and parameters for the sound processor. It appears that, perhaps among others, Trinnov products may accomplish this very task.
  2. Head phone: given certain head phones & an amplifier (and all their frequency responses) it should be possible to calculate the digital filter topology and parameters that can synthesise the binaural signals to be fed to headphones from the multi-channel input. Perhaps it will be necessary to identify certain parameters of the head-related transfer function of the individual listener given head and ear physiology.

Conclusion

Sound recording and reproduction has a long history, both as a support for cinema and on its own for high fidelity transmission of music. What I can see now is a two-dimensional space of markets & products as shown in the diagram below: on one axis, the number of channels from 2 to lots, and on the other axis some metrics as an indicator of quality, e.g. flatness of amplifier and speaker frequency response, directionality of speakers, sampling frequency & bits, etc. The chasm between the multi-channel cinema and the audiophile worlds lies perhaps in the juxtaposition of low-cost and effect seeking cinema sound using multiple channels in one corner, and the high quality two-channel stereo sound for audiophiles on the other. Since using multiple channels and high quality components are not mutually exclusive, the top-right corner is conceivable too: audiophile immersive sound. Therefore, audiophiles may well abandon two-channel stereo in due course and adopt true stereophonic sound reproduction using multiple channel technologies with as many audiophile amplifiers and sophisticated speakers.

Different markets as a combination of number of channels and quantifiable metrics for quality such as flatness of amplifier and speaker frequency responses, directionality of speakers, sampling frequency & bits

Another observation. I know few people who have the time and inclination or even space to sit down in front of a sound reproduction system in order to listen to a Mahler symphony. These days most music is consumed through head phones. For head phones, it is possible to synthesise perfect binaural sound from multi-channels sources, most likely streamed and hence liberated from traditional media such as CDs; it’s only signal processing after all that can even run easily on a smart phone.

--

--

Peter Wurmsdobler

Works on the technological foundations of autonomous vehicles at Five, UK. Interested in sustainable mobility, renewable energy and regenerative agriculture.