Fidelity in the Transmission of Music, Part II: Performer → Medium

Whenever I went to the Golden Hall in the Vienna Musikverein I noticed quite a few steel wires being strung from one side to the other, both over the stage and the audience, with an array of microphones hanging off these wires. I do not know much about the recording process, but I can only imagine that recording, mixing and releasing is quite an art, compressing the life performance spectrum Sp into the medium carrier spectrum Sm, with aspects such as but not limited to:

Recording, Mixing & Release

What happens in this signal processing chain is that the sound scape of the performance in a concert hall is captured in many recorded tracks at very high definition in frequency and signal magnitude, but eventually all has to be distilled to (and released onto) the target medium format with as consequence a reduction in bandwidth and quantisation. This may happen in various stages as shown in the following diagram and explored below.

The chain processing N performance spectra eventually distilled down to two medium spectra.

Microphone & Transmission

Suppose the stereo frequency spectrum Sp of the sound pressure at the location of the listener in a concert is quasi-infinite as shown below.

Conceptual sound pressure signal with quasi infinite bandwidth.

This ideal signal will, however, be picked up by an array of microphones which all have a certain bandwidth. As one can imagine, there will be a roll-off at low and high frequencies, with some frequency ranges even elevated:

Frequency response of AKG Acoustics Perception 420 microphone.

A microphone is an electro-mechanical device that translates air pressure into an electrical voltage, depending on its principle. An important aspect is the dependency on the amplitude, as at high sound pressure mechanical or electrical parts will start to saturate and create harmonic distortions (which we all know when from the clipping effect when somebody speaks into a microphone at too close distance).

Total harmonic distortion as a function of the input amplitude (InvenSense).

The resulting analogue signals for all microphones will consequently carry a band-width limited representation of the sound pressure present in the concert hall as voltages, but with a distortion in time and frequency domain:

Analogue voltage signal representing sound pressure with finite bandwidth.

Note that it will be difficult to recover any information outside that bandwidth. In addition, the microphone signal will be affected by sound noise but also electrical (or thermal) noise picked up on the path to the recording desk which, using balanced transmission, should be minimal.

Pre-amplifier and level adjustment

The signal level at the microphone output needs to be adjusted to the level needed for the subsequent sampling stage. There is no power involved and, these days, low noise operational amplifiers or amplifier chips with sufficient signal-to-noise ratio (SNR) are cheaply available for frequency ranges up to the MHz and beyond. The bandwidth for audio applications, up to say 100kHz should not constitute a problem any more; therefore I do not consider this stage in much more detail but assume the transfer function to be constant.

A Google search for low noise audio operational amplifier for microphones yields a long list including but not limited to components by Texas Instruments or Analog Devices such as OPA1692 and LT1115, respectively. These kind of operation amplifiers exhibit a flat open-loop frequency response up to 500kHz and a THD+N of around 0.00004%, equivalent to 127dB SNR.

Analogue to Digital Conversion

A crucial stage in the signal processing chain is the transition from the analogue to the digital time domain which happens in the analogue to digital converter (ADC) at the recording frequency and bit-width which is certainly much above the release specification:

Sampling of analogue signal at recording frequency and resolution.

There is no need to go into further detail on the sampling process and theory behind that; suffice to say, that the bandwidth of the signal is limited to the Nyquist limit of half the sampling frequency, e.g. 24kHz for 48KHz, or 48kHz for 96kHz, but usually even less that that because of the anti-aliasing low pass filter cut-off frequency. No frequency, no information beyond the Nyquist frequency limit can be represented in the digital domain. For completeness, an anti-aliasing low-pass filter in front of the ADC will make sure that now artefacts of higher frequencies mirror down into the audible band.

In addition, sampling at a certain bit-width will add quantisation noise due to the rounding to a fixed number of values. The dynamic range and signal-to-noise-ratio depends on the bit-resolution, in the order of 96dB for 16-bits and 120dB for 24 bits. Ideally, if very low-noise amplifiers are used in the chain up to the ADC, the noise floor of the analogue signal to be digitised ought to lie below the quantisation noise, but probably will not always be. Any signal below that noise floor or outside the dynamic range of the ADC will not be represented in the digital signal and may not be easily recovered at a later stage any more (unless with coherent integration).

Note that the reference voltage and the clock source are of utmost importance; any deviation of the sampling frequency will create non-linear noise on the signal that is being sampled, on top of the noise being present before digitisation. This noise, also called phase noise, will have for an effect that for example a strong tone will result in generated noise associated with its amplitude. All effort put into high SNR pre-amplifier, high frequency and bit resolution will be obliterated and dwarfed by the effects or an unstable clock.

As a bottom line, precise clocks and ADCs with sufficient bit resolution and sampling frequencies well above the audible spectrum are inexpensive and ubiquitous, see Texas Instruments, Analog Devices, Cirrus Logic or many others, offering an SNR of up to 127dB and sampling rates of 768kHz. Unless some principles of sampling theory are violated, it can be expected that a decent recording desk can produce an image of the original spectrum Sp at a very high level of fidelity, in frequency, dynamic range and SNR.

Mixing

Once all the signals live in the digital domain, no signal below the dynamic range defined by the sampling resolution or frequency beyond the Nyquist limit can be represented or will ever be present. However, all signals can be mixed and transformed at a much higher resolution, even as floating point numbers with quasi infinite dynamic range, or at least as much as it matters for the human ear. The sky is the limit, allowing to filter out unwanted frequencies, noise or sounds, compensate for a microphone frequency response or even combining several signals to one, and funnelling many tracks to a stereo pair at high resolution (in frequency and amplitude).

Release

Unfortunately, at some point all this wide band, high dynamic range information has to be decimated to what the release medium is capable of carrying. While there may be some high resolution audio formats on the market, the most common one is still the compact disc format, in essence a stereo signal with 16 bits per channel at 44.1kHz.

Decimation of digital signal to release frequency and resolution.

I can only reiterate, once the high resolution mixing output has been decimated to the release frequency, there is no information above the associated Nyquist limit frequency, 22.05kHz for digital audio CD; note that this frequency is still above the limit of human ears which is about 20kHz for young listeners and most likely around 15kHz for older people.

The SNR and theoretical dynamic range are for a digital audio CD is about 96 dB for its 16 bit resolution. With the dynamic range of the human ear of about 120dB and the noise floor in a quiet room at about 40dB, exploiting the whole CD range of 96dB on top would be pushing the sound pressure to 136dB, well above the pain threshold. Bottom line, the 16bit@44.1kHz seem to match the limits of human hearing quite well and be sufficient for many recordings, provided care has been taken in all the steps to the release.

Fidelity in Recording, Mixing & Release

As far as fidelity is concerned in the recording, mixing & release process, high fidelity would mean that the digital stereo signal on the medium Sm is a very close representation of the sound experienced at the life performance Sp. I could imagine that there this process can yield high and low fidelity representations, depending on how much effort is put into all; most links in the chain have a flat response, with the weakest link perhaps being the room acoustics, the microphone and its positioning. But I would hope that it is safe to assume that most modern releases are the result of a diligent recording & mixing process and exhibit a very high level of fidelity.

Works on the technological foundations of autonomous vehicles at Five, UK. Interested in personal mobility, renewable energy and regenerative agriculture.