Binaural phase detection with the NI DSP-tool and the Kalman filter


created 05/02/2009 last update 08/02/2009	author: Claude Baumann
Note: A binaural detector can determine the interaural time delay (ITD) that is equivalent with the time difference of arrival (TDOF). Instead of observing directly the time lag between two periodic signals, one also can look at the phase difference.
Main references : Phase detection NXT robot (explains the algorithm) Spatial sound localization (uses the cross-correlation method) Introduction to the scalar Kalman filter (external reference) Material: NI DSP toolkit (see photo) Download: DSP project for the binaural phase detector

In this project we present a variant of the binuaral phase detection program on the NI-DSP that we shared on this site a few months ago. Besides the fact that the LabVIEW program is better graphically structured now, we are using a scalar Kalman filter to extract the time-lag between the signals that arrive on both microphones. The result is quite impressive compared to the FIR filter that we used in the previous version. It also requires fewer computation steps. Note that the Kalman filter has a better response than the FIR-filter. The statistics of the sampled interaural time delays help improving the result. In the presence of very weak signals -probably below the noise threshold- the program tends to move back to the middle position, although there are significant fluctuations. We already met this issue in our Elektor project (Article: C. BAUMANN, L. KNEIP, Stereo robot ears, Elektor, July/August 2007, p. 13-17 , where we added a sensitivity potentiometer to the circuit that tells the microcontroller software to ignore the audio noise beneath the threshold. There also is the option "relative", that can be set through a jumper telling the microcontroller. If it is set, the device considers the position to the sound source as "relative", and the result of the measurement is zero. This could be interesting for a robot that is moving the head in the direction of sound, where the position of the sound source is referred relatively to the orientation of the head. However, imagine that a video-camera system would try to move the fixed camera to a sound source, then an "absolute" position should be observed, and in the case of weak signals the system should hold the last valid position in order to avoid swinging of the camera. In a next version we will add this feature to the DSP-program. Consult the following figures, if you want to understand how the program works. You should be familiar with LabVIEW. During the program execution the DSP was still connected to the PC, in order to have the live data upload and to produce the data graphs. The scalar Kalman filter follows the equations and notations that are developed at http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html .
An alternative method multiplies the shifted signals in a limited cross-correlation function. (We call it limited, because not all the products and certainly no normalization need to be effectuated.) Although this method, that we already used in some earlier RCX sound localization projects, is slower with the RCX than the zero-crossing method, with the DSP there is not real difference in time. The code is very similar, but the resulting cues are much more stable.
The DSP allows rapid multiplying. Since the cross-correlation method delivers additional information that can be exploited, we will now concentrate on this method. According to earlier projects, we know that it is sufficient to limit the cross-correlation function to the interval [-MAX_LAG, MAX_LAG]. LabVIEW and most computer languages do not allow negative bounds for arrays, so we choose [0..2MAX_LAG] instead. In the final version of the ITD-detecting program, we now use Blauert's degree of coherence [J. BLAUERT, Spatial Hearing, MIT Press, 1983, pp. 201]. The program therefore computes the RMS-values for each set of 2048 data-points that are sampled on two channels. Note that Blauert's degree of coherence in fact represents the normalized correlation function at the point or time-lag, where this function is maximal. The new program also works with a noise threshold. Signals with strengths beneath that threshold are ignored. The combination of the scalar Kalman filter with the degree of coherence and noise suppression stabilizes the output. We fixed the DSP module on a rotating base. A phone-cell (Sony Ericsson G700) was placed at 1m from the Speedy-33 at azimuth beta_0=125° and elevation omega=-13°. The cell was playing mp3 encoded music. The cross-correlation program yielded a time-lag of 7 units (correct -7, since signs are inverted in the program, because of the graphs). Then the Speedy-33 support was turned by -30° around the z-axis. The new azimuth beta_1=155° was calculated from the new time-lag -12.4 using the Kneip/Baumann algorithm that is explained at (Spatial sound localization). Note that one unit has the duration of 1/48000sec, because the sampling frequency is 48kHz. The estimated location is beta_1=149° and omega*=15°, within the theoretical error limits that are fixed according to the mentioned algorithm.