created 05/02/2009 last update 08/02/2009  author: Claude Baumann 
Note: A binaural detector can determine the interaural time delay (ITD) that is equivalent with the time difference of arrival (TDOF). Instead of observing directly the time lag between two periodic signals, one also can look at the phase difference.  


In this project we
present a variant of the binuaral phase detection program on the NIDSP
that we shared on this site a few months ago. Besides the fact that the
LabVIEW program is better graphically structured now, we are using a
scalar Kalman filter to extract the timelag between the signals that
arrive on both microphones. The result is quite impressive compared to the
FIR filter that we used in the previous version. It also requires fewer
computation steps. Note that the Kalman filter has a better response than
the FIRfilter. The statistics of the sampled interaural time delays help
improving the result.
In the presence of very weak signals probably below the noise threshold the program tends to move back to the middle position, although there are significant fluctuations. We already met this issue in our Elektor project (Article: C. BAUMANN, L. KNEIP, Stereo robot ears, Elektor, July/August 2007, p. 1317 , where we added a sensitivity potentiometer to the circuit that tells the microcontroller software to ignore the audio noise beneath the threshold. There also is the option "relative", that can be set through a jumper telling the microcontroller. If it is set, the device considers the position to the sound source as "relative", and the result of the measurement is zero. This could be interesting for a robot that is moving the head in the direction of sound, where the position of the sound source is referred relatively to the orientation of the head. However, imagine that a videocamera system would try to move the fixed camera to a sound source, then an "absolute" position should be observed, and in the case of weak signals the system should hold the last valid position in order to avoid swinging of the camera. In a next version we will add this feature to the DSPprogram. Consult the following figures, if you want to understand how the program works. You should be familiar with LabVIEW. During the program execution the DSP was still connected to the PC, in order to have the live data upload and to produce the data graphs. The scalar Kalman filter follows the equations and notations that are developed at http://www.swarthmore.edu/NatSci/echeeve1/Ref/Kalman/ScalarKalman.html . 

An alternative method multiplies the shifted signals in a limited crosscorrelation function. (We call it limited, because not all the products and certainly no normalization need to be effectuated.) Although this method, that we already used in some earlier RCX sound localization projects, is slower with the RCX than the zerocrossing method, with the DSP there is not real difference in time. The code is very similar, but the resulting cues are much more stable.  
The DSP allows
rapid multiplying. Since the crosscorrelation method delivers additional
information that can be exploited, we will now concentrate on this method.
According to earlier projects, we know that it is sufficient to limit the
crosscorrelation function to the interval [MAX_LAG, MAX_LAG]. LabVIEW
and most computer languages do not allow negative bounds for arrays, so we
choose [0..2*MAX_LAG] instead. In the final version of the ITDdetecting
program, we now use Blauert's degree of coherence [J. BLAUERT, Spatial
Hearing, MIT Press, 1983, pp. 201]. The program therefore computes the RMSvalues
for each set of 2048 datapoints that are sampled on two channels.
Note that Blauert's degree of coherence in fact represents the normalized
correlation function at the point or timelag, where this function is
maximal. The new program also works with a noise threshold. Signals with
strengths beneath that threshold are ignored.
The combination of the scalar Kalman filter with the degree of coherence and noise suppression stabilizes the output. We fixed the DSP module on a rotating base. A phonecell (Sony Ericsson G700) was placed at 1m from the Speedy33 at azimuth beta_0=125° and elevation omega=13°. The cell was playing mp3 encoded music. The crosscorrelation program yielded a timelag of 7 units (correct 7, since signs are inverted in the program, because of the graphs). Then the Speedy33 support was turned by 30° around the zaxis. The new azimuth beta_1=155° was calculated from the new timelag 12.4 using the Kneip/Baumann algorithm that is explained at (Spatial sound localization). Note that one unit has the duration of 1/48000sec, because the sampling frequency is 48kHz. The estimated location is beta_1*=149° and omega*=15°, within the theoretical error limits that are fixed according to the mentioned algorithm. 