Having grown up in a European musical tradition, I have always asked myself: Why and how did the Western chromatic scale settle for 12 tones per octave, and why is there an octave at all? What is the unique set of tones needed to create a melody, or express musical ideas in general? To anticipate an answer to all of these questions, musical scales appear to have evolved together with musical instruments, guided by the perception of dissonance related to the functioning of human hearing; this cyclic evolution may have been seeded by naturally occurring sounds.
The Sonorous Body
The concept that any physical body exhibits a tendency to oscillate in various frequencies has been described as early as 1760 by Jean-Philippe Rameau, the French composers and music theorists, with the “corps sonore”, the sonorous body⁴. Rameau thought that bodies not only produce their fundamental tone but also overtones such as an octave, fifth or fourth. Contemporary to Rameau, the Swiss mathematician and physicist Daniel Bernoulli wrote various papers on vibrating instruments, too⁴. His main discovery was that the overtones of such an instrument extend over a whole series which are, due to their diminishing strength, not all audible, but contribute to the character of that instrument; at times, these overtones may even not be harmonious at all he noted.
Modern physics considers all matter to oscillate around a point of equilibrium, from atoms over complex mechanical structures to galaxies, each in its own natural modes and associated frequencies. The multi-modal oscillations produced by a physical body (or musical instrument for that matter) would comprise a wide range of frequencies which can be presented in a frequency spectrum, i.e. the amplitude (or energy, or power) of these modes as a function of frequency. This spectrum decomposes a time series of body motion or sound pressure into its spectral or frequency components and can be generated using Fourier analysis. A frequency spectrum, and in particular how it evolves over time called spectrogram, demonstrates the characteristics of any sonorous body, the musical instrument; it reveals the instrument’s timbre.
Some musical instruments produce a very simple spectrum, at least at first approximation. In addition to a fundamental frequency f₁ all overtone frequencies fₙ are integral multiple of the fundamental f₁, i.e. with fₙ = N * f₁ being the Nth harmonic. This holds for most string instruments as well as wind instruments that use a cylindrical tube as the main body, e.g. a flute or an organ¹. Harmonic analysis states that any sound produced is always a combination of these harmonics on a wide spectrum and depends on the initial and boundary conditions; this forms the timbre of the instrument (in addition to other factors such as the instrument body).
Not all bodies and in particular not all musical instruments produce harmonics as integral multiples of a fundamental. For instance, a closer look into the frequency spectrum of low piano notes reveals that the higher harmonics are not integer multiples of the fundamental any more, mostly due to the fact that these strings are quite thick which makes them stiffer, closer to a long solid bar¹. Instruments that do use flat bars such as a xylophone would exhibit quite a different series of non-integer harmonics, or partials, e.g. the sequence f₁, 2.76* f₁, 5.40* f₁, 8.93* f₁, etc¹. Another example, bells have very complex natural modes and frequencies due to oscillations of the body and the rim.
An isolated tone or a sequence of independent tones could be produced in any scale by a single performer, or a group in perfect unison; the fundamental frequency and spectrum of all tones employed do not really matter when sounded in isolation. However, the chosen set of tones will matter when several tones sound together or in close sequence, causing “con-sonance”, or dissonance. Of course, music is not just about achieving consonance; rather there will be an evolution between consonance and dissonance, with the ear’s desire to reach closure in consonance, e.g. a dominant 7th chord yearns for the tonic major. During history, different theories were produced on what causes consonance and dissonance.
The ancient Greeks, notably Pythagoras in the 6th century BCE, thought musical consonance arises from ratios between tones of small whole numbers, possibly due to his preference for geometry. In addition to the ratio 2:1, the octave, his preferred ratio was 3:2, the perfect fifth. Later, in the 4th century BCE, Aristoxenus extended this concept to more ratios such as 4:3 and 5:4, but still based on ratios between the fundamental frequencies. In the 17th century CE, with the discovery of harmonics and partials, Rameau and contemporaries opined that it is sufficient that different tones share similar overtones in order to produce harmony, i.e. coinciding harmonics determine consonance⁴. Eventually, in the 19th century CE, Helmholtz³ realised the importance of the relationship of all partials of the tones involved as any pair can produce consonance or dissonance in the presence of “beats”.
Measure the Beat
When two pure tones of similar frequency sound together, i.e. two sinusoidal signals, they create a resulting wave oscillating at the mean frequency whose amplitude is modulated by half of the difference frequency which becomes the beating frequency.
Helmholtz claimed that for a small difference, the beating can be noticed, for larger differences of about 30–40 Hz the beating is perceived as unpleasant and rough, and for even larger differences the beating disappears as the two tones are perceived separately. These findings were quantified by Pomp & Levelt⁶ and later confirmed by Sethares⁷. For a given pure sinusoidal tone at a base frequency f₁, the perceived dissonance with a second pure sinusoidal tone at a higher frequency f₂ was determined through experimentation. The resulting “sensory dissonance” can be approximated and plotted as function of the frequency ratio f₂/f₁ for various base frequencies.
These sensory dissonance curves confirmed Helmholtz’ prediction. It is remarkable to see that no interval stands out as there are in essence three areas for the ratio of two pure sinusoidal tones:
- very close together such that the ear cannot discern them at all and hence perceives them as “con-sonant”,
- about a quarter to a third of the critical bandwidth of the ear apart where the dissonance peaks as the ear seems to be confused,
- very far apart such that the ear considers them as very different tones.
The critical bandwidth is related to the functioning of the human ear as the cochlea acts more or less like a Fourier analyser: it projects a sound pressure signal into the frequency spectrum. The frequency resolution is limited by the size of the nerve cells involved resulting in “frequency bins”. Tones very close together fall in the same bin, tones far apart fall into very different bins, and tones at a critical distance straddle several bins which the ear might find a bit confusing, causing sensory dissonance.
Consonance of Tones
The findings by Pomp & Levelt as well as Sethares reveal that there are no intrinsic special intervals for pure tones which, I hasten to add, do not exist in nature. Tones produced by real instruments, however, always exhibit a more or less rich spectrum, and for string and tube instruments, a simple harmonic spectrum. The most astounding result of these authors’ research is the calculation of the sensory dissonance for two such compound tones when they are a certain ratio delta of their fundamental apart.
As it can be seen, given 6 harmonics for tones A and B at descending amplitude of 0.88ⁿ, 36 pairs between the individual constituent components occur at the same time, i.e. n² pairs for n spectral components. As detailed in Setheres’ paper, the sensory dissonance between two compound tones can be calculated for all possible intervals delta in a wide range, e.g. from unison to an octave, by summing the contribution of each pair to the global dissonance.
The figure clearly shows the absence of dissonance, i.e. consonance, of an octave as then the harmonics of the base tone and the tone at twice the frequency coincide. For a perfect fifth, the ratio of 3:2, the harmonics of the higher note fall either on the position of the lower or in the middle. Consequently, no source for dissonance. The conclusion is that, for instruments with perfect integral harmonics, the perfect scale would be made up of ratios or intervals at the positions of local minima in the compound dissonance curve. For a different instrument such as a xylophone or a set of bells the scale would be very different.
Bottom line: the harmonics or partials of an instrument, the timbre, determine the musical scale that ought to be used to create and perform music with that instrument. The reason the Western world uses the scales as it does is most likely due to the fact it evolved by using string and tube based instruments. In other places of the world music has evolved around different instruments producing different scales, e.g. Gamelan music.
The Evolution of a Musical Scale
Given that all physical bodies are capable of oscillating at certain frequencies in certain modes, how would humans have started building a musical scale to sing melodies in or make some music with, using simple instruments? My starting point would be the oscillations occurring in natural phenomena, e.g. the sound produced by the string of a bow, or the whistling of a tube. Given the current understanding on how scales ought to be built based on timbre, how did the tonal scales evolve in the Western world to what it is now, the 12-tone equal temperament? Here I would like to give a brief overview, mostly based on a book on the subject by Stuart Isacoff⁴, combined with the findings of Pomp & Levelt as well as Sethares.
Purity of Octaves
Given the natural frequencies of a string at integral multiples of the fundamental frequency f₁, a very obvious observation would be that certain harmonics produced together sound similarly: the ones being at twice the frequency of the previous, i.e. the series of powers of 2, i.e. f₂, f₄, f₈, etc, commonly known as octaves. According to the dissonance curve above for string instruments (and tubes alike), these octaves produce no dissonance at all. This effect, called octave equivalence, might be due to the fact that in the time domain, when in phase, the periodic nodes in the sound pressure coincide for common points in time, an effect already noted by Leonard Euler. This might be the reason why it is difficult for the ear to discern them and they sound pleasant together, as proven by Sethares. Therefore, for a given tone at f₁, the octaves at frequencies f₂, f₄, f₈, etc. would be part of the most simple tonal system.
5-Tone Natural Scale
Higher harmonics above the first octave can be replicated across various other octaves by multiplying or dividing by 2 a suitable number of times due to octave equivalence. Since these replicated tones are conceptionally the same (called being of the same pitch class), in the following only the harmonics projected into the first octave range from f₁ to f₂ are considered. The result for the projections maps each harmonic number to a unique frequency ratio within the first octave. The following diagram shows first 9 harmonic frequencies on a chromatic scale for comparison:
It can be seen that the 2nd, 4th and 8th harmonic project to the C an octave higher, the 3rd and 6th harmonic fall on the G which is commonly known as the “fifth”. The 9th harmonic falls on the D and the 5th harmonic falls on the E, the “major third”. It is conceivable that a scale could be established and even melodies composed using these 5 unique tones or even more if additional harmonics were being used. They could be produced by a string or a tube instrument such as a flute.
It is not surprising that this 5-tone scale would contain mostly consonant tones, i.e. the octave and the fifth. The 9:8 which is equivalent to a musical second in the chromatic scale, does show a higher level of sensory dissonance, i.e. two tones two semi tones apart do sound slightly dissonant.
7-Tone Pythagorean Scale
As philosopher with an inclination towards geometry, Pythagoras appears to have preferred simple arithmetic relationship using whole numbers, notably the ratio 2:1, the octave, and in particular 3:2, the “perfect fifth”. The sequence of the first 6 Pythagorean “perfect fifths” can be projected to tones within an octave by starting at any base tone of any key in the European diatonic scale, e.g. F. The result is the sequence close to F, C, G, D, A, E, & B. A logarithmic plot shows the correlation of these 7 tones to the chromatic scale:
There is something elegant about that scale. The distance or ratio between most notes (intervals) is 9/8, e.g. C to D or a whole step in our current thinking, and between some others about half of that, 256/243, e.g. between E and F, a semi-tone. Centuries later, the resulting diatonic scale was adopted in Gregorian chant and formed the musical foundation of European and Wester musical practise up to the 16th century⁵.
Since Pythagoras based his tuning on perfect fifths, it is not surprising that his favourite intervals such as octave, the perfect fifth and fourth would produce perfect consonance at any position on the scale. The major third and sixth, however, perform less well, at 81/64 instead of the perfect 5:4, and 27/16 instead of the perfect 5/3, respectively. For instance, E is 81/64 above C which, using a C at 262Hz, does create beating between the 5th harmonic of C and 4th harmonic of E, (5 — 4 * 81/64)*262/2 = 8Hz.
12-Tone Pythagorean Scale
The 7-tone Pythagorean scale may have worked well for simple melodies and chords, in all modes, as long as the music stayed in that scale. However, if one wanted to start a similar diatonic scale from say G, an additional sixth fifth would be needed to accommodate the last semi-tone in the sequence, namely an F#. In addition, various forms of polyphony such as singing in parallel motion of fifths or trying to build chords and scales on any note as a base (e.g. A) necessitated even more fifths and fourths. Since the original Pythagorean scale used only two kinds of interval, 9/8 and 256/243, the latter being approximately half of the former, the natural result is that the entire octave will eventually be divided into the smallest interval of approximately 256/243, i.e. into 12 semi-tones.
There were only a few issues with this approach. First, only allowing fifths and fourths to produce additional notes on a scale necessitated many more notes in an octave. Marin Mersenne proposed keyboards with 19, 23 and 31 keys per octave for instance which was reflected by organ builders by adding a key or lever for every note (making it definitely a bit more challenging to play). Second, only fifths sound harmoniously. In the 15th century, however, thirds became popular in England (Dunstable), and German organist Conrad Paumann⁴ started to use thirds and fifths simultaneous in his organ works, i.e. triads.
7-Tone Just Intonation
By the 16th century music had become complex exposing the limits of Pythagorean tuning leading to a quest for alternative tunings. With the re-discovery of Ancient Greek in the Renaissance works were being translated such as the Elementa harmonica by Aristoxenus (4th century BCE) who suggested to rather use the ears than being dogmatic with mathematical concepts for tuning, e.g. use not only perfect fifths (3:2) but also other such as perfect fourth (4:3), perfect major third (5:4) or perfect minor third (6:5). The Italian music theorist Gioseffo Zarlino re-introduced this concept in a new tuning called Just intonation, just the one proved to yield perfect consonance for instruments with integer harmonics as shown above.
Triads in just intonation on the base C sound pure as long as one stays in that scale. However, there are now three kinds of intervals in an octave: half-tone of 16/15, and two flavours of whole tones, 9/8 for C-D, F-G and A-B as well as 10/9 for D-E and G-A. Chords such as triads constructed on other notes than C with more complex scales will sound terrible as beating is created. For instance, a fifth D-A will have a ratio of 40/27 ≅1.48148 which is less than the ideal 3/2 = 1.5. The third harmonic of D and second harmonic of G will beat at (3–2*40/27)*294/2 = 5.44Hz.
12-Tone Equal Temperament
By the 16th century music based on various 12-tone scales reached a certain level of sophistication. Musicians and music theorists were looking to “temper” the Pythagorean or Just intonation in order to obtain a 12-tone system that sounds well in all diatonic scales. Mean-tone temperament as a quadratic average between the two contestants was created as well as various Well temperament tinings such as the Werckmeister temperament. The latter was promoted by Vincenzo Galileo, and demonstrated at its best in Johann Sebastian Bach’s Well Tempered Clavier.
According to Isacoff⁴, Adrian Willaert published “Quid non ebrietas” in 1519 which introduced a new concept, some radical tuning based on an equal division of an octave. Its precursor was a translation of Euclid’s Elements, proposing a geometric solution. A long process, fraught by philosophical disputes, excellently described by Isacoff, lead eventually to the Equal temperament, the division of the octave into 12 equal ratios or intervals, namely 2^(1/12) = the 12th root of 2, approximately 1.05946, an irrational number which the world at the time hesitated to accept.
Rameau, as an early adopter, claimed that harmony served as the root for musical expression and insisted that it was crucial to express harmonies. Since it is not possible to have perfect consonance of thirds and fifths at the same time, the advantage of equal temperament is that all semi-tones are equal. Hence all keys produce similar sounds and both thirds and fifths would produce an acceptable beat in their overtones. For instance, the third on C at 62Hz creates (5–4*2^(4/12))*262/2 = 5.2Hz as the equal third is at 1.25992 > than the ideal 1.25. The fifth on C results in a low (3–2*2^(7/12))*262/2 = 0.4Hz beat as the equal fifth at a ratio of 1.49831 very close to the perfect fifth at 3/2 = 1.5.
As expressed in Sethares, if it is possible to predict the perfect scale for a given instrument, it should equally be possible to devise the frequency spectrum (or timbre) of an ideal musical instrument for a given scale, in particular the established Western 12-tone chromatic scale. The solution can be obtained through numerical optimisation as outlined in his paper. While it would be easy to produce an electronic instrument that implements the timbre found, it would be an interesting challenge to modify existing instruments to produce the required spectrum, e.g. special strings for pianos with a bespoke mass distribution.
- Guillaume, Philippe (2002). Module d ouverture: SON ET MUSIQUE, National Institute of Applied Sciences of Toulouse.
- Harrison, P. M. C., & Pearce, M. T. (2020). Simultaneous consonance in music perception and composition. Psychological Review, 127(2), 216–244.
- Helmholtz, H. von (1863). Die Lehre von den Tonempfindungen als physiologische Grundlage für die Theorie der Musik (Verlag
F. Vieweg & Sohn, Braunschweig)
- Isacoff, S. (2007). Temperament, Faber & Faber
- Kelly, T. F. (2014). Capturing Music — The Story of Notation, W. W. Norton & Company (4 Nov. 2014)
- Pomp, R. and Levelt, W.J.M. (1965). Tonal Consonance and Critical Bandwith. The Journal of the Acoustical Society of America, 38, 548–560. http://dx.doi.org/10.1121/1.1909741
- Sethares, William. (1993). Local consonance and the relationship between timbre and scale. The Journal of the Acoustical Society of America. 94. 10.1121/1.408175.
On my quest to understand why there are 12 tones per octave in Western music, I came across It was Can Octave Sound Dissonant? by New Tonality. This video caused my eureka moment as it visualised so well the relationship between instrument spectrum and tonal scale.