Nowadays everyone can find sound and music on the internet. To play it, just click on it and turn on your speakers. However, behind those musical notes and sounds there is a lot of technology that should make all this possible.

Arthur de Graef

Sony Walkman 2

Whether it’s roaring guitars, mums or motorbikes, in fact sound always ‘looks’ the same, even though it can take many different forms when you listen to it. In fact, you listen to vibrations due to pressure differences in the air. If all goes well, these are collected by the eardrum and then passed along the ossicles: hammer, anvil and stapes. The vibrations are thus sent to the ‘cochlea’, where they travel through a fluid and cause all kinds of small hairs to vibrate. The vibrations are in turn transported to your brain by neurons, where everything is converted into sounds that you can recognize. However unlikely it may seem, all sounds you hear actually consist of vibrations or pressure differences in the air. These pressure differences can be described as waves. Factors such as pitch are influenced by the frequency of a wave, or how many ‘peaks’ and ‘troughs’ such a wave shows within a period of 1 second. This value is expressed in ‘hertz’, which is abbreviated to ‘Hz’. A fundamental note of 440 Hz corresponds to the note ‘la’, although that ‘bare’ 440 Hz will sound quite boring. Pianos, guitars and trumpets have a lot of undertones and overtones, which are multiples of the fundamental note. It is in this way, according to the laws of physics, that things make sounds and sound the way they do. The volume is then determined by the amplitude of such a wave, or how high the peaks and how low the troughs of the wave reach.

From sound to audio

However, such a computer cannot do much with the laws of physics: the computer cannot blow a trumpet or play a piano. Computers only know how to work with bits and bytes. In the background, all kinds of systems ensure that the sound waves can be stored in bits and bytes and converted back into sound waves during playback. This conversion from analog sound wave to digital carrier is usually accompanied by some loss. That is also the reason that a live performance sounds so much better than its recording on CD, at least if you have a good seat in the hall. Various technologies have already been proposed to reproduce audio as faithfully as possible. You’re probably already familiar with some of these technologies: MP3, AAC, and FLAC are just a few of those technologies also known as audio coding formats. These formats ensure that audio can be maintained on digital carriers. Audio codecs provide the algorithms that convert the sound waves into computer language and unpack them back into sound waves, so that you can listen to music. Some of those codecs are LAME (used for the MP3 format), FLAC and ALAC. The names of codecs and file formats are often related, but that is not necessary.

Pulse code modulation

audio files1 2
There are different ways
to visually represent sound waves

The history of the first digital audio format dates back to that of the telegraph. The technology was actually invented to transmit data from multiple telegraph lines over a single line. In other words: the data from multiple telegraphs had to be reduced in size in order to transmit them simultaneously. A technique was developed for this purpose that is still used today: pulse code modulation. In fact, the technology works on the laws of physics and waves and always works, whether the waves come from a telegraph, speech or a musical instrument.

To do this, you first need to have a good idea of ​​what such a wave actually looks like. A schematic representation like the one below is easy to visualize sound, but is actually not entirely correct. Sound waves do not stand still, but are constantly moving. The images of waves you see here are actually just snapshots. With pulse code modulation, multiple snapshots are taken and sequenced in order to digitally store the sound. Two terms are important: on the one hand there is the sampling rate which, expressed in Hz, indicates how many of those snapshots or samples are made per second. Do you have an audio file with a sampling rate of 32 kHz? This means that such a snapshot is taken 32,000 times per second. On the other hand, there is the bit depth, which indicates how many bits a single snapshot takes up. In 1967, the first PCM recording device was developed in Japan. Audio signals were recorded at a sampling rate of 30 kHz, with a depth of 12 bits. The sound signals were stored on video tapes and could thus be played back by people at home. Two years later, recording equipment was significantly improved and support for two-channel stereo sound was introduced, with a sampling rate of 32 kHz and a bit depth of 13. However, it was not until 1979 when Ry Cooder’s Bop till you Drop was the first music album recorded digitally with a sampling rate of 50 kHz and a 16-bit depth.

recorder 2

What can be done on a computer today, used to require a major installation

Compact discs and cassettes

At that time, systems for watching videos at home were still very expensive and simply unaffordable for most consumers. It was not until the introduction of the compact disc or CD in 1982 that music lovers could listen to digital recordings for the first time. To keep the CDs somewhat compact, it was decided to slightly reduce the sound quality: the sampling rate became 44,100 Hz, but the bit depth was retained. This allowed one disc to carry a maximum of 80 minutes of audio. Ry Cooder’s Bop till you Drop must have sounded better on videotape than on CD, but CDs were preferred because they were affordable and incredibly practical. The encoding technique used for this is called Compact Disc Digital Audio (CD-DA). This standard was developed by Sony and Panasonic and is known as the first true digital audio format.

audio files2 2

Listening to music was once a lot less compact

Shortly afterwards, in 1987 and 1992, the first digital tapes for consumers also came onto the market. Although those digital cassettes were a lot better sound-wise than their analogue counterparts, they never became really successful. Both the digital audio tape (DAT) and the digital compact cassettes (DCC) were short-lived. The digital audio tapes from 1987 remained successful for a little longer, because it was possible to make sound recordings with a higher sampling rate then CDs. By 1996, just four years after the launch of DCC, Philips announced that it would no longer produce tapes and products. The older DAT came to an end in 2015, when Sony announced that it would no longer produce blank tapes. The last Walkman, with which you could play the cassettes, rolled off the production line in November 2005.

Large files

The popularity of home computers started in the 1980s, which meant that people also wanted to play sound via these new devices. To ensure that users could store sound files on their Macintosh computers, Apple developed the Audio Interchange File Format (AIFF). Unlike many audio file formats we know today, AIFF files were an uncompressed representation of PCM technology. In theory, the sound quality is much better, but the files take up much more disk space. Today there are storage media of a terabyte and more, but in the days of floppy disks things were a little different. So it was not very practical to listen to music in AIFF format. Anyone who did not have a Macintosh computer but did have a Windows machine is undoubtedly familiar with it Waveform Audio File Format, also known as WAV. In the early days, only uncompressed audio was used here, which often meant that the files were quite large. Compression support was later added, but uncompressed audio remains the most common. As a result, the quality of the audio file is identical to that of the recording: no data is lost due to conversions to AIFF or WAV.

Compressed and compromised

A one minute recording, with a sampling rate of 44.1 kHz and 16bit depth, would be slightly larger than 5 MB in both AIFF and WAV formats. A CD with 75 minutes of music would be approximately 375 MB in size. One solution is to store all that music on external hard drives, although that is quite cumbersome. A better solution would be to offer the audio in a smaller format. That is something that audio formats such as MP3 and FLAC respond to. To compare, the same recording in MP3 format would take up between 1 and 2.29 MB, depending on the bitrate (see the paragraph Lossy or lossless). This is because MP3 files, unlike AIFF and WAV, are compressed. To do that, parts of the sound wave are actually ‘cut off’. This does happen with some logic. For example, humans cannot perceive all sounds at the same time. Is it a loud recording? Then all frequencies above 16 kHz are simply deleted: they would be drowned out by the other sounds anyway. The average person is also sensitive to tones between 1 and 5 kHz. The frequencies that you do not hear can therefore simply be omitted.

Lossy or lossless?

MP3 files are often referred to as the bitrate. This number indicates how many bits the recording takes per second. Suppose a CD quality recording lasts one minute: an MP3 file with a bitrate of 256 kbps will be about 5.5 times smaller, without sounding noticeably worse. By the bitrate If you lower it, it can be halved in size, although you will compromise on sound quality. Besides MP3 there are others lossy file formats. For example, the inventors of MP3 also developed AAC, which can provide better sound quality in smaller files. This is the default audio format for Apple Music and YouTube. Spotify uses Ogg Vorbis, an open source format that it can use at no extra cost. Whether you hear the frequencies lost during MP3 conversion or not? That will depend somewhat from person to person. Audiophiles don’t like the idea of ​​music being lost, which quickly created a demand for smaller, lossless audio file formats. The audio format FLAC answered that question in 2001: a digital audio format that was smaller than AIFF and WAV files, but still included the entire audio wave in the file. The exact size of such a Free Lossless Audio Codec file depends on the degree of compression. It is certain that such a FLAC file is 50 to 70 percent smaller than uncompressed AIFF and WAV files, without losing audio quality.

Best audio format?

What is the ‘best’ audio format? That depends on what it is intended for. Do you want to listen to music on your phone? Then AAC is the preferred format. Do you want to store a lot of music on your PC? It is best to do this in MP3 or FLAC format, depending on the desired sound quality. Most people only hear the difference between 320kbps MP3 and FLAC on the better speaker systems. You can also use the better AAC instead of MP3, although it is a little more cumbersome to play those .aac and .m4a files. AIFF and WAV are interesting for those who work with music. Either way, there are years of development and improvement between that one song on Spotify and the recording studio, so you can enjoy high-quality audio without carrying multiple CDs, cassettes or hard drives.