Pitch tracking

Using audio features to control a synthesiser

Audio can also be used as a control signal in its own right. In this tutorial we will look at a technique for extracting the frequency of an audio input signal and using that to control the pitch of a synthesiser.

Table of contents

  1. Extracting features from an audio signal
  2. Hardware setup
  3. The code
  4. Practice tasks

Extracting features from an audio signal

Once captured, an audio signal is a rich source of information which we can extract lots of different types of musical information from. In the audio world this is know as “feature extraction” and the features vary from the fundamental characteristics of the audio wave, like its loudness, pitch and timbre, to more high level musical descriptors of the sound wave, like key signature, tempo, and orchestration. In this example we are going to use an object in Pure Data which allows us to derive the pitch and loudness of our sound input and use that to control a separate synthesiser. First it is worthwhile dipping our toes into a little mathematics.

One of the most important mathematical findings in history came from the French mathematician Joseph Fourier, and in the world of audio his equations allow us to analyse the constituent elemental parts which make up a complex sound. The Fourier transform has been described as follows: given a soup, the Fourier transform can tell you the ingredients used. It does so by passing the soup through a series of filters which catch all the individual ingredients. Once we know what the ingredients are and their quantities we then have a recipe for recreating that soup!

So how does this relate to sound? When we apply a Fourier transform (actually a Fast Fourier Transform which is the efficient version of the equation commonly used) we can break a complex sound down into a series of fundamental frequencies each with their own magnitudes. First let’s consider what a sound wave actually is.

When we capture a sound wave through a microphone we get a representation of the variation in amplitude over time. This amplitude we capture through a microphone is actually the amplitude of air particles which are oscillating because of the pressure change in the atmosphere due to sound. If we then play this signal through a speaker we are actually moving the speaker cone back and forth in accordance with this amplitude, translating it back into pressure waves in the air. When it comes to analysis of an audio signals, amplitudes are not very informative as they only tell us about the loudness of a signal.

We can gain much more insight about the sound by transforming it into the frequency domain using the Fourier transform. This lets us know the different frequencies which are present in our signal and their relative magnitudes. From here we can start looking at more musical descriptors of the signal such as the pitch of a note played on an instrument, or by looking at the relative distribution of frequencies we can find out things about the timbre of the instrument, and perhaps we can even identify whether it is a trombone, cello, or harp from the frequency content of each note, even if they share the same pitch.


The Fourier transform is also fundamental for visualising the spectral content of a sound wave like you would see in spectrogram. Check out Sonic Visualiser for a brilliant programme for visualising audio signals.

Hardware setup

In this example we will keep things simple and use only an electret mic capsule connected to the left audio input channel.

As mentioned in a previous tutorial, the electret capsule consists of a small membrane inside the metal casing. This membrane is similar to a drum skin, or our ear drums. As sound waves hit the top of the capsule the membrane vibrates, moving back and forth at the same speed as the fluctuations of the air pressure caused by the sound energy. As the membrane moves back and forth it generates a fluctuating voltage which we can capture on Bela – this is our audio signal!

Note that the electret microphone capsule is polarised, meaning that it must be oriented correctly on the breadboard. If you look at the bottom of the microphone where the pins are you will see that one of the pins is connected to the outer casing of the capsule - this is your ground pin. We’ll connect the other pin to to 3.3V through a 2.2K resistor to give the mic power. This pin is also where we take our reading.

Note that in this example we are connecting the mic to the audio input. To do so you’ll need a jumper that has a pin on one side and a socket on the other.

The code

Find the pitch-detection sketch in the Pure Data section of the Examples tab of the Bela IDE.


For this example to work without audio dropouts it is necessary to increase the block size to 256 or above. This can be done in the project settings tab of the IDE.

1. Preparing the audio input

The key object in this patch is sigmund~ which is a Swiss army knife for signal analysis in Pure Data. Before sending our audio signal to be analysed we first want to focus in our signal on the frequencies we would like to analyse.

We pass our audio input through a series of filters. First we pass the signal through two high pass filters with a cutoff of 88Hz. This will help us remove some lower frequency rumbles from our audio signal before anaylsis. Remember that a high pass filter lets the high signals pass through and rejects frequencies below the cutoff frequency. Passing our audio input through two versions of this high pass filter will make the effect more pronounced.

We then do the same with a low pass filter with a cutoff of 300Hz to remove some higher pitches which we’re not interested in analysing (we’re going to use our voice as the input signal to control this synthesiser).

2. Analysing audio

The sigmund~ object performs the Fourier transform we mentioned above and has a series of arguments which define its behaviour. Check out the help file for a full list of the possible arguments. In our case we are defining the size of the analysis window (4096 samples) and asking for pitch and envelope to be given as outputs. In terms of the size of the analysis window, the general rule of thumb is that a smaller window size will result in better temporal resolution but with less accurate frequency information. The opposite is true for a larger analysis window which will result in temporal smearing but with better frequency detection. In our case we have a chosen a compromise between the two and depending on the use case it can be worth experimenting with different window sizes.

The pitch outlet of the sigmund~ give us the MIDI note of the fundamental frequency which is detected. Try singing into the microphone and see what is printed out on the console of the IDE. You should see the frequency and amplitude of the sung note.

3. Controlling amplitude

To control the amplitude of the note we create a simple gate which is triggered when the signal is over a certain loudness threshold. To change the ramp up and down time of the envelope it is necessary to edit the message box above the [line~] object. Currently it is set to 30 ms.

Practice tasks

Task 1: Check the microphone is working

Run the sketch, plug your headphones into the audio adapter cable into the audio output port, and check for sound. If you can't hear anything the audio input is connected to the scope in the browser, so you can open the scope and see if you can see the signal there (launch the scope using the button in the toolbar). If you're not seeing anything on the scope, check the orientation of your mic capsule. You can also increase the gain on the audio input in the browser by increasing PGA Gain Level in the Settings menu of the Bela IDE (note that this is in decibels).

Task 2: Add reverb to the output

In the patch we have a ready made plate reverb which can be added to the output of the signal. It is the [pd platereverb] object. Add it in before the audio output to hear the effect. This will help smooth out any indiscrepancies from the frequency analysis.

Task 3: Split the signal into high or low

Trigger a print message saying "HIGH" for notes over 400Hz and one saying "LOW" for notes below this frequency. The challenge here is getting the readings to be stable enough to get a consistent reading. Try only reading when the a higher amplitude is passed or look at the note argument of [sigmund~] which only outputs a pitch when a given frequency has been held for a certain amount of time. This could be used to control LEDs or even motors to make a robot whose movement is controlled by the pitch of an instrument!