Pages coming from (2004-2013):

Music and New Technologies - Conservatory "A.Casella" in L'Aquila- Italy

Laboratory for pitch / formants analysis using WaveSurfer

Please read the lecture about voice and his timbre to get the definitions of the various parameters and phenomena here quoted. There you can find also a link to the WaveSrufer download page.

First of all, you need a file with some vowel pronounced. You can record yourself using WaveSurfer, or use a previously recorded wave file. In any case, we suggest to record a spoken version of each vowel, and a sung version at different pitch, to make the game interesting. Let use a short mono recording, @ 16KHz of sampling frequency which is quite enough. You can record also a musical instrument, of course. Winds and bowed strings are better, because of some pronounced formants due to the sound box,  the tube or the bell.

WaveSurfer is multiplatform, so that we will make use in the following of a multiplatform terminology and lexicon (at less for Windows and MacOS), may be to the detriment of precision.

Now install WaveSurfer. So speaking: it doesn't require any installation at all: simply decompress the executable in any reasonable folder, then put a link (alias) to the executable on the desktop, in order to launch it easily. 

This is the WaveSurfer main Window:


Take into account that  WSRF (we will cal him in this way in the following) is configurable, namely it can be adapted to the specific task you have in your mind.

Create a New file using  , or with menu File->New. Or you can open an existing one using .  In either case, you will see a configurations window, containing standard, or previously saved WSRF configurations.

Select "Speech analysis". WSRF will place on your screen three panes, after you have wait a while to let WSRF doing calculations, if you have open an existing file.

Wafeform Pane : This is a classic waveform plot, such as you can find in any sound editor. This pane is synchronized with the subsequent two panes, showing only the fraction of the entire file on which you have zoomed in. This is not the behavior of the last pane, in which the entire file is always shown.

Spectrograms and  formants Pane: synchronized with the waveform pane, it shows the spectrogram with the first four formant track superimposed. Red for F1, Green for F2, Blue for F3, Yellow for F4. You can change these colors if you want, in the preferences pane (see later).

Pitch Pane: Here comes the plot of the pitch (be aware it is the pitch, not the fundamental. The algorithms used detect the pitch even if the fundamental is missing, from the even space between partials).

Waveform "overall" Pane: Here the entire file is always visible in the form of a waveform. The selected portion is in relief, acting as a button you can move around to see different portions of the file in the first three panes. 

How to record a file: 

Check File->Preferences .... Be sure that every parameter is OK, particularly that the selected audio input and outputs are those you want. Press the classic red key, wait that the system start and speak (or play). When finished, press the square button (as in any real or virtual recorder).

How to select a portion of the waveform visible in the first three panes.

Click on the left point. Shift-click on the right side point, or drag from the left point. The selected zone will show a light yellow background.

How to make a zoom-in:

Menu View->Zoom to selection. 

How to adjust the visibility of the spectrogram.

Two factors affect the spectrogram appearance. The windows / frame length, and the contrast - brightness. As to the frame length, take into account that the spectrum is discrete, showing a dot for every frequency multiple of the inverse of the frame length. if you have into your signal a frequency which exactly coincide with this grid, it will be showed as a dot at maximum amplitude. If a frequency is in-between two adjacent points, it will be seen as a (sampled) sync and the peak value will be pull-down (you will find two peak values, for the sake of precision). Moreover, the longer the frame, the better the frequency resolution, and the more likely is that any actually present frequency comes close to an existing point.

To modify the window length and brightness / contrast, right-click on the pane (supposedly Apple-click on the MacIntosh). This is the interesting two-columns menu which  will appear: 


Spectrum Section: shows a window with power spectrum.

LTAS: shows a window with the  Long Term Average Spectrum.

Spectrogram Controls ... Will show:

You can modify windows length and brightness / contrast and see the effects in real time.


Create Pane, Delete Pane: you can delete or add panes to your window.

Apply configuration / Save configuration: You can save the configuration (pane choice) for future reference, or apply an existing one.

Properties: Will show / modify every parameters related to the pane on which you made click.

Open Data File: You can choose the file to show in the pane.

Save Data File: Save the data showed in the pane into a file. Formants will go into a file with suffix frm,  pitch into a file with  suffix f0. They are all pure text ASCII files, one row per frame.

Files formats:

You can choose the column separator going to the pane properties. Properties->Data Plot->Column delimiter

Pitch File (.f0):

Each row is a frame (by default 10 msec). You can change the frame length in Pane Properties ->Pitch Contour->Frame interval.

If the algorithm used for pitch tracking is  ESPS (default), each row  corresponds to a frame, and has four columns:  Pitch in Hz, probability of voicing, means square local error of the pitch, the normalized peak value of the cross correlation found when computing the pitch.

Formats file (.frm):

Each row corresponds to a frame  (default 10 msec).  You can change the frame length in Pane Properties ->Formants->Frame interval). One column per formant. Frequencies in Hz.

Critical notes:

WSRF, when tracking formants, is restrained to the detection of the mere frequencies. This is probably all what you need in Linguistics. From the point of view of the timbre, we know that more parameters are important and necessary: the peak gain value, the Q, the presence of antiresonances (for nasality, but not only for that). For instance, the box of many musical instruments (such as violin, Alto, Cello ...) show a resonance-antiresonance couple at low frequency (relatively to the tessitura of the instrument) due to the resonance of the volume of the box. It is a behavior similar to Quartz crystals, or to LRC circuits, that the currently used LPC analysis cannot grab.

Please also note that in this analysis of the Italian vowels  /a/ /e/ /i/ /o/ /u/ pronounced by myself  (first sequence spoken, "vNI" in annotation pane. Second sequence sung: "vI"). Let see the /o/ in both versions. You can see a weak formant in between the second and the third ones, that is not recognized as such, and the second formant goes in this case very close to the first one (much like it happens for  the /u/)

Attention! As the automatic tracker can go wrong,  WSRF allows the correction by hand of the formant profile, simply drawing on the track with the mouse. Do not use this pane to select zones, because you go to rewrite formants. 

Let observe also the unintentional  (or better,  "spontaneous") coloratura (ascending pitch at the beginning) in the sung vowels.