Laboratory for pitch / formants analysis
Please read the lecture about voice
and his timbre to get the definitions of the various parameters and
phenomena here quoted. There you can find also a link to the WaveSrufer
First of all, you need a file with some vowel
pronounced. You can record yourself using WaveSurfer, or use a
previously recorded wave file. In any case, we suggest to record a
spoken version of each vowel, and a sung version at different pitch, to
make the game interesting. Let use a short mono recording, @ 16KHz of
sampling frequency which is quite enough. You can record also a musical
instrument, of course. Winds and bowed strings are better, because of
some pronounced formants due to the sound box, the tube or the
WaveSurfer is multiplatform, so that we will make use in
the following of a multiplatform terminology and lexicon (at less for
Windows and MacOS), may be to the detriment of precision.
Now install WaveSurfer. So speaking: it doesn't require
any installation at all: simply decompress the executable in any
reasonable folder, then put a link (alias) to the executable on the
desktop, in order to launch it easily.
This is the WaveSurfer main Window:
Take into account that WSRF (we will cal him in
this way in the following) is configurable, namely it can be
adapted to the specific task you have in your mind.
Create a New file using , or with menu File->New. Or you can open an existing
one using . In either case, you will see a
configurations window, containing standard, or previously saved WSRF
Select "Speech analysis". WSRF will place on your screen
three panes, after you have wait a while to let WSRF doing
calculations, if you have open an existing file.
Wafeform Pane : This is a classic waveform plot, such as you can
find in any sound editor. This pane is synchronized with the subsequent
two panes, showing only the fraction of the entire file on which you
have zoomed in. This is not the behavior of the last pane, in which the
entire file is always shown.
Spectrograms and formants Pane: synchronized with the
waveform pane, it shows the spectrogram with the first four formant
track superimposed. Red for F1, Green for F2, Blue for F3, Yellow for
F4. You can change these colors if you want, in the preferences pane
Pane: Here comes the plot of the pitch (be aware it is the pitch,
not the fundamental. The algorithms used detect the pitch even if the
fundamental is missing, from the even space between partials).
Waveform "overall" Pane: Here the entire file is always visible in
the form of a waveform. The selected portion is in relief, acting as a
button you can move around to see different portions of the file in the
first three panes.
record a file:
File->Preferences .... Be sure that every parameter is OK,
particularly that the selected audio input and outputs are those you
want. Press the classic red key, wait that the system start and speak
(or play). When finished, press the square button (as in any real or
select a portion of the waveform visible in the first three panes.
on the left point. Shift-click on the right side point, or drag from
the left point. The selected zone will show a light yellow background.
make a zoom-in:
View->Zoom to selection.
adjust the visibility of the spectrogram.
factors affect the spectrogram appearance. The windows / frame length,
and the contrast - brightness. As to the frame length, take into
account that the spectrum is discrete, showing a dot for every
frequency multiple of the inverse of the frame length. if you have into
your signal a frequency which exactly coincide with this grid, it will
be showed as a dot at maximum amplitude. If a frequency is in-between
two adjacent points, it will be seen as a (sampled) sync and the peak
value will be pull-down (you will find two peak values, for the sake of
precision). Moreover, the longer the frame, the better the frequency
resolution, and the more likely is that any actually present frequency
comes close to an existing point.
modify the window length and brightness / contrast, right-click on the
pane (supposedly Apple-click on the MacIntosh). This is the interesting
two-columns menu which will appear:
Spectrum Section: shows a window with power
LTAS: shows a window with the Long Term
Spectrogram Controls ... Will show:
modify windows length and brightness / contrast and see the effects in
Pane, Delete Pane: you can delete or add panes to your window.
configuration / Save configuration: You can save the configuration
(pane choice) for future reference, or apply an existing one.
Will show / modify every parameters related to the pane on which you
Data File: You can choose the file to show in the pane.
Data File: Save the data showed in the pane into a file. Formants
will go into a file with suffix frm, pitch into a file with
suffix f0. They are all pure text ASCII files, one row per frame.
choose the column separator going to the pane properties.
Properties->Data Plot->Column delimiter
Each row is a frame (by default 10 msec). You can change the frame length in Pane Properties
->Pitch Contour->Frame interval.
algorithm used for pitch tracking is ESPS (default), each
row corresponds to a frame, and has four columns: Pitch in
Hz, probability of voicing, means square local error of the pitch, the
normalized peak value of the cross correlation found when computing the
row corresponds to a frame (default 10 msec). You can change the frame length in
Pane Properties ->Formants->Frame interval). One
column per formant. Frequencies in Hz.
when tracking formants, is restrained to the detection of the mere
frequencies. This is probably all what you need in Linguistics. From
the point of view of the timbre, we know that more parameters are
important and necessary: the peak gain value, the Q, the presence of
antiresonances (for nasality, but not only for that). For instance, the
box of many musical instruments (such as violin, Alto, Cello ...) show
a resonance-antiresonance couple at low frequency (relatively to the
tessitura of the instrument) due to the resonance of the volume of the
box. It is a behavior similar to Quartz crystals, or to LRC circuits,
that the currently used LPC analysis cannot grab.
also note that in this analysis of the Italian vowels /a/ /e/ /i/
/o/ /u/ pronounced by myself (first sequence spoken, "vNI" in
annotation pane. Second sequence sung: "vI"). Let see the /o/
in both versions. You can see a weak formant in between the second and
the third ones, that is not recognized as such, and the second formant
goes in this case very close to the first one (much like it happens
for the /u/)
As the automatic tracker can go wrong, WSRF allows the correction
by hand of the formant profile, simply drawing on the track with the
mouse. Do not use this pane to select zones, because you go to rewrite
observe also the unintentional (or better, "spontaneous")
coloratura (ascending pitch at the beginning) in the sung vowels.