Fidelity in the Transmission of Music, Part III: Medium → Listener

Since my student years I have been interested in Hi-Fi, High Fidelity audio equipment, or even “High-End” audio gear. Sometimes I was amazed by the sound quality of some set-ups I could witness, but at other times I was also bemused with the explanation or justification of some technical features from an Engineering perspective. All these years I kept asking myself: what is really meant by high fidelity, fidelity of what to what by what?

In order to enjoy the music recorded and available on a medium, some playback equipment is needed to reconstruct the information contained in Sm and produce an audible spectrum Sr. This happens in essence through a digital-to-analogue converter (DAC), pre-amplifier, power amplifier, one or several speakers and of course the room acoustics as shown below:

The chain processing the two medium spectra into an audible spectrum at the ears.

Perhaps back in the 1980ies, producing a DAC with sufficient SNR at 16 bits@44.1kHz was a challenge and brands touted 4 or 8 times oversampling to be an essential feature; today, given the bandwidth of modern electronics, inexpensive DACs are available up to hundreds of MHz; the audio frequency range does not constitute a challenge any more. A few dollar are perhaps sufficient to buy a DAC with an SNR up to 127dB, operating up to 384kHz and 24 bits (see Texas Instruments, Analogue Devices, ESS Technology, or Cirrus Logic. Similar to the ADC, however, a stable clock source is still essential as is a stable reference voltage source; any noise there will be imparted on or modulated onto the resulting analogue signal.

In the likely case that the digital data source is representing a stereo 16 bits @ 44.1kHz audio signal, then a DAC may well internally up-sample these input data to a much higher frequency using an interpolation filter with a 32 bit representation and a Sigma-Delta modulator to create a very high fidelity representation of the original signal spectrum (Note that this does, or should not add any other frequencies beyond 22.05kHz for audio CDs). The quantisation noise contained in the medium’s 16 bit representation would be pushed into higher frequencies using noise shaping as shown below.

Up-sampling and increase in resolution with noise shaping of quantisation noise.

The analogue output of a DAC running at a frequency well above the Nyquist frequency limit in the original signal (22.05kHz for audio CDs) would only need to be filtered by a very gentle low pass filter to remove the conversion artefacts in the non-audible range: the quantisation noise as well as the sample-hold circuitry of the DAC that turns digital values into a constant analogue voltage signal. If there are still any other frequencies at the output beyond the Nyquist limit in the original music data source release frequency, they are artefacts and should not be there.

Signal at output of DAC with spectrum limited to release Nyquist limit.

Bottom line for DACs: any decent DAC chip in conjunction with a stable voltage and clock source will nowadays produce an analogue representation of the digital source signal at a very high level of fidelity, perhaps much above the limits of the human ear, both in terms of frequency and dynamic range.

The purpose of a pre-amplifier is usually to actually attenuate the signal level in order to adjust the volume, and at times to adjust high or low frequencies to a small degree. Similar to the recording electronics, however, I do not see that as a challenge any more these days. There is no power involved and low noise operational amplifiers or amplifier chips with sufficient signal-to-noise ratio (SNR) are cheaply available for frequency ranges up to the MHz and beyond. The bandwidth for audio applications, up to say 100kHz tops should not constitute a problem any more; therefore I assume the transfer function to be constant over the audio frequency range, i.e. a flat response.

A Google search for low noise audio operational amplifier for audio yields a long list including but not limited to components by Texas Instruments or Analog Devices such as Burr-Brown OPA1656 and LT1115, respectively. These kind of operation amplifiers exhibit a flat open-loop frequency response up to 500kHz and a THD+N of around 0.00004%, equivalent to 127dB SNR.

As a side note, some DACs allow to adjust the volume in the digital domain or some other hardware circuitry. That being said, for such a single digital source the output of the DAC can be sent directly to the power amplifier.

When it comes to amplifying an audio signal and driving loudspeakers, matters become more complicated. There is a long history in audio signal amplification and the types of amplifiers used, from class A, B, AB and class D amplifiers more recently. Again, given the evolution of power electronics, my view is that amplification of electrical signals in the audio frequency range is not a challenge any more; this includes class D amplifiers. The frequency and phase response can be thought to be flat for the audible frequencies with the total harmonic distortion (THD) being several orders of magnitude below the signal level (-80dB in the class D amplifier by Hypex shown below). The subsequent drivers in the loudspeakers are passive low pass filters by their nature being electro-mechanical devices with a certain inertia. If there are any frequencies beyond the audio band, they are artefacts and should not be there. Also, the noise rejection to mains power can be expected to be on par with the general performance for integrated circuits being used.

Amplitude response of Philips Applied Technologies/Hypex class D amplifier at different loads.
THD+N of Philips Applied Technologies/Hypex class D amplifier at various power ratings.

In a certain symmetry to the microphone, loudspeakers turn an electrical signal into a sound pressure signal. Speakers may use one or more drivers to do that where a driver is in most cases an electro-dynamic device that converts electrical current into motion that in turn produces sound pressure. I do not need to go into details here; first, I am not a specialist on the subject and second, there is plenty of information to be found in libraries and the Internet.

Exemplary frequency response for speaker with low and high frequency driver (AudioExpress).

What I would like to mention though is that the transformation of an electrical signal is complex and quite often affected by non-linearities in the driver, e.g. the stiffness in the membrane and its support. Even if the drivers do stay within a certain linear operation mode, the entire speaker will exhibit a non-flat amplitude and phase response as shown below, in particular for higher volume and hence larger displacements of the membrane.

Total harmonic distortion (THD) comparison of three loudspeakers (Advancements toward a high-power, carbon nanotube, thin-film loudspeaker).

The sound pressure level signal at the near field of a speaker in an anechoic chamber (i.e. no room acoustic effects) would be the result of the upstream processing stages, in particular the weakest links with the most uneven frequency response and harmonic distortion as shown below.

The audio signal at the output of a speaker in an anechoic chamber.

In order to minimise the distortion of the signal it would make sense to use a speaker with a very well balanced frequency response and a very high sensitivity, such that only little power and motion is needed to produce a certain sound level, e.g. carefully crafted speakers by David Haigner. As a second consequence, even the power amplifier does not need to produce that much power and can remain in a linear operation mode.

Very close to the speaker performance is the effect of the room acoustics, which is perhaps the weakest link of them all and often neglected. Suppose you sweep through the entire audible frequency spectrum by producing a constant amplitude signal with a test speaker at a certain location and orientation. It should then be possible to measure the sound pressure level at a different location, hence obtaining the room frequency response. It is quite likely that this response shows a few resonance frequencies and zeros.

Frequency response of one speaker measured in an untreated room (Acoustic Insider)

The sound pressure level signal at a certain position would be the result of the upstream processing stages, the speakers and their orientation as well as the listening position as shown below.

The sound pressure level at the listening position.

Fidelity in Playback Audio Equipment

This section was all about the reproduction fidelity, from the performance spectrum Sp over the medium spectrum Sm to the reproduction spectrum Sr along the performer — medium — listener path. There are many links in this chain with varying effect on the frequency spectrum and distortion which have to be minimised in order to obtain a high level of fidelity:

  • most electronic components that operate on analogue and digital signals can easily be chosen to offer a nearly flat frequency response in and beyond the audible spectrum, a dynamic range and SNR of easily 100dB with a matched THD+N of about 0.001%; this includes microphone amplifiers, ADCs, DACs, and pre-amplifiers.
  • power electronics in the power amplifier is perhaps a bit trickier, but can also offer a nearly flat frequency response in in and beyond the audible spectrum, and in the case presented of a class D amplifier a THD+N of maximum 0.01% which is 80dB, but also down to THD+N of about 0.001% (i.e. 100dB) for operation at low power regime.
  • the electro-mechanical conversion, both ways from sound pressure to an electrical signal and back from the electrical signal to sound pressure. The frequency response is not very flat with variations of +-5dB easily in the audible spectrum, taping off both at the low and high end. More importantly, these devices are susceptible to saturation which entails distortion up to 0.1% or even 1%, i.e. 40dB.
  • last, but not least, rooms acoustics (including speaker positioning) appears to have quite a significant effect on the audio experience in terms of frequency spectrum with variations of +-10dB, some resonances and zeros (cancellation of frequencies at certain positions).

The conclusion for me is to address the issues in reverse order:

  1. Identification of room frequency response from digital source; as long as there are no non-linearities such as saturation, it should be possible to send a rich frequency signal through the chain and use some system identification tools to obtain the transfer function. Use soft furnishing to eliminate resonances or zeros and modify the room until the transfer function converges to something flat-ish. Its inversion (within limits) can then be applied in the digital domain as equalisation of system response. Then work out what the remaining frequency ripple and distortion is.
  2. Make sure that speakers have enough sensitivity and power rating to create the sound level and dynamic range needed without going into saturation at normal operation. This should minimise the harmonic distortion which can hopefully be read from the manufacturer’s notes.
  3. Carry out some hearing tests, in frequency and pitch recognition; try to quantify the personal limits. All specification that go beyond those and would not be noticed in a blind test would be snobbery. Audiophiles appear to be in love with technology and specifications that do not matter.
  4. Get power and other electronics the specification of which is commensurate with the above findings. No point to go over the top or far above the weakest links (room acoustics and speakers, and most likely the matter between the ears). It is all about balance.

Contributes to the technological foundations for the self-driving revolution at Five, UK. Interested in sustainable economies and renewable energy.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store