Music identification apps seem like magic at first, but underneath the hood is a sophisticated algorithm that can find songs in an instant. Here’s how they work.
The Magic of Music Identification
It’s probably happened to all of us. You’re having dinner at a nice restaurant, hanging out at a coffee shop, or walking around in a store, when you suddenly hear a great song playing over the speakers. Maybe it’s a song you’ve listened to before or a track you’ve never heard. So, you pull out your phone, open Shazam, and hold up your device to the ceiling. In just a flash, the app tells you what the song is, who the artist is, and where to stream it.
They’re quick, remarkably accurate, and can identify even the most obscure of songs. In a nutshell, they work by isolating the song out of a recording and searching it against an expansive database of tracks. But the technology behind how they do this is quite complex and impressive.
You might be shocked to know that the Shazam app that we know today was released way back in 2002, and the system was just as accurate and quick then as it is now. That’s all thanks to a unique algorithm that would revolutionize the music world.
It’s Not Just the Lyrics
At first glance, music identification apps like Shazam may seem simple. You might think they just listen to the lyrics, the same as any voice assistant, and search it in a database of song lyrics to tell you what the song is.
However, most music identification apps are capable of telling what the title of an instrumental is, or even the singer of a cover song. That’s because, instead of analyzing the lyrics of the track, they’re looking for “fingerprints” that are unique to each song in their extensive databases.
You likely have devices that can be unlocked using your fingerprint, which is the arrangement of the small lines on your finger that are unique to you. Similarly, when you hold up your microphone to record a brief clip of a song, this clip gets turned into patterns of data that Shazam or another app can look up in their database.
At first glance, that method seems prone to several problems. Most of the time that you hear music in public, there’s background noise and distortion caused by the speakers, which can make songs unidentifiable or result in inaccurate matches. Also, there’s a lot of data captured in even a brief sound clip, which can make searching for these patterns across a database of millions of songs slow.
In an interview with Scientific American in 2003, Avery Li-Chun Wang, the chief data scientist and co-founder of Shazam, explains how their algorithm fixes these issues. The information of an audio clip can be visualized with a 3D chart known as a spectrogram, which represents a change in frequencies over a period of time. It also takes into account amplitude, which is how loud a sound is. This is represented in a spectrogram using the intensity of color.
In the same way that humans cannot perceive sound unless they are at a particular frequency, instead of taking the entirety of a song into account when performing a search, Shazam only takes in “peaks,” which is the highest energy content within an audio clip. The fingerprints it captures only take in the highest frequency points within a given time frame and then the peak amplitude spots within those frequencies.
In a research paper for Columbia University, Wang stated that the method allows them to take out most of the unnecessary parts of an audio clip like background noise and to clear up distortion. It also makes the size of the prints small enough that it takes mere milliseconds to identify a song among their vast database.
Aside from being helpful for average listeners who hear a song they like, music identification apps also help shape the music world.
Radio stations and streaming services often use the data regarding what people are Shazam-ing the most to figure out what tracks are being listened to by the public. This is helpful because it indicates a song’s catchiness and potential popularity, regardless of the artist. When you identify a song with the app, you’ll immediately see how many people have also tried to identify it.
Since the rise of Shazam, a handful of competitors have also popped up. Soundhound claims to be able to identify a song simply by you singing or humming to it, with mixed results. There’s also a song identifier integrated with voice apps such as Google Assistant that work very similarly to Shazam’s system.