page created : August 30th, 2008      last update : January 10th, 2010 (mistake in code)  author : Claude Baumann

Abstract: This page presents a fast method for digital phase detection of two signals in the time domain. These signals are digitalized at high sampling speed using a 1-bit ADC. A certain number of samples are stored in the system memory. Then the digital values pass an eXclusive OR-gate (XOR) in order to produce a bit pattern indicating which samples do not correspond between both signals. The logical result of the operation is integrated over time. The XORing is repeated with one signal being bitwise shifted against the other. The minimum of the integrate and its argument is determined. The latter is equivalent to the initial time-lag of both signals. The method can easily be implemented into small microcontroller systems, requiring only a small portion of memory. It is very fast compared to other digital methods like the cross-correlation or the sum of absolute difference method. The efficacy of the algorithm is illustrated through the implementation into the LEGO™ RCX module that is used in combination with a stereo audio amplifier forming a binaural sensor that is capable of discriminating the time-lag between the signals that arrive on two microphones. The phase detection just takes 100ms in the case of 2x1000 samples (sampling process included)
Robot: Nic_3, a spatial sound localizing robot.
Video: A small radio is emitting sound waves in the horizontal plane. The robot listens and turns the head (rather abruptly) towards the sound direction. The room reverberation causes some errors in cues. So, the robot must retry, but rapidly fixes the radio with its laser beam. The motion could be improved by ramping the motor power. There also are a few issues with the fact that sometimes Nic_3 listens to its own motor noise, because the NXT does not yet send a message back to the RCX telling it that the movement has been completed. Another problem has appeared: we are no longer certain that the audio sampling rate still is 36kHz. We will need some more testing to determine this rate exactly. We suppose that through the additional code in the sampling process, the rate dropped to about 33-34kHz. (From the code the rate cannot be deduced easily, since the RCX H8 instruction execution depends on the access speed to RAM. Thus, another experiment is needed.) The repercussion is that the computation of the azimuth is biaised. ==> Stand-by for the further developments !!!

Previous work:
2D sound localizing based on interaural level differences
Sensor based on the precedence effect
3D precedence effect sensor
Detail of a robot using the previous sensor
binaural sensor
2D sound localizing based on interaural time differences (Nic_1)
Article: C. BAUMANN, L. KNEIP, Stereo robot ears, Elektor, July/August 2007, p. 13-17 (presents a digital ITD sensor)
L. KNEIP, C. BAUMANN, Binaural model for spatial sound localization based on interaural time delay cues ..., unpublished work that comprises the 3 DOFs LEGO™ RCX based robot Nic_2, Feb. 2007
L. KNEIP, C. BAUMANN, Binaural model for artificial spatial sound localization based on interaural time delays and movements of the interaural axis, The Journal of the Acoustical Society of America -- November 2008 -- Volume 124, Issue 5, pp. 3108-3119 (link)
Nic_3, a spatial sound localizing robot
Binaural phase detection with NI-DSP toolkit and the Kalman filter
Introduction to the Kalman filter
Bill of material:
  • Nic_3 robotic base serves as the support for the binaural sensor. (link not active yet)
    • LEGO™ NXT: motor control, connection to LEGO™ RCX
    • LEGO™ RCX: 2 channel ADC (36kHz), cross-correlation, Kneip/Baumann algorithm
    • HiTechnic IR-link (invisible on the photo)
    • 3 LEGO™ NXT servo-motors
    • RCX compatible 2-channel audio amplifier
    • RCX compatible laser pointer
    • NXT / RCX wire
    • About 100 standard LEGO™ pieces
Figure 1: Two coherent audio signals -in the sense of [BLAUERT1997, p. 201]- are sampled with the RCX using the ULTIMATE ROBOLAB® high speed sampling feature through the Nic_3 two-channel audio sensor. (The RCX has a 10-bit ADC that converts two signals at a sampling rate of 36kHz.) The signals are a out of phase. The green and the blue overlapping bars represent the most significant bit (msb) of each ADC channel over time. These bits are XORed, which means that the result only is a logical "1", if both arguments are different from each other. The "1"s are counted and listed into a histogram.

Figure 2: The signals are shifted against each other. In this example, the bit-bars almost completely overlap. The XOR-operation distillates those bits that do not correspond. They are counted and noted on the histogram list.

Figure 3: Only the relevant time-lags are considered. (In the example, the signals, being recorded from the binaural sensor, may not lag for more than 24 samples on each side, the microphone spacing being 20cm.) The abscissa of the minimum on the histogram indicates the initial time-lag of the signals. The non-linear shape of the histogram changes in function of the signal form. This makes a function of the integrated XORed data to the time-lag unreliable. (Note that analog phase detectors that linearize the time-lag proportionally to the XORed "1"s integrate present the issue of being unprecise for complex signals.) However, the argument of the minimum does not change its location, if the signals change in form or amplitude.

Figure 4: If the signals are noisy, the number of logical "1"s changes a lot from one phase detection process to the next. The determined minimum of the histogram also varies, but if the data passes a FIR filter (depth 25 in this example), the result is a rather stable value.

Function summary:
    • Two analog signals are digitalized the most roughly possible through a 1-bit ADC. This is realized either by just considering the msb from a higher order ADC value or by using a simple comparator stage, where the reference voltage exactly is the middle of the maximum amplitudes. Both channels should be calibrated to present the same DC-level.
    • The bits are bundled into bytes that gradually fill the digital signal arrays. This storing is very memory-friendly. If 1000 values are sampled, just 125 bytes are filled. (The number of samples therefore should be a multiple of 8).
    • Most microcontrollers have bit-wise operations that are applied to complete bytes. The bytes in the digital signal arrays are XORed byte per byte and the "1"s are counted. This process corresponds to an elementary integration of the XORed bit pattern. The bit-signals of one channel are shifted bit-wise and the procedure is repeated.
    • Each time the integrate has been calculated, it is compared to the last minimum (initialized with the highest possible value). If the new value is smaller than the previous minimum, it replaces the minimum and the actual time-lag is noted.
    • Once all the relevant bit rotations have been executed, the final time-lag value corresponds to the initial lag of both signals. It therefore is not necessary to store the integrated values.
    • Practically the integration speed is increased, if the bits are not directly counted through the bytes, which would require 8 additions plus bit selections Instead a special look-up table is consulted that lists the number of bits corresponding to a certain byte value. For example, decimal 92 equals the binary "1011100" and has 4 "1"s. Written differently : f (92) = 4. The integration then only needs the call of the look-up table and one addition per byte. Thus, the number of operations is drastically reduced.
    • The maximum time-lag should also be a multiple of 8.
    • In order to have the best computing speed possible, the phase detection is programmed partly in Assembler.
    • The RCX is programmed using the ULTIMATE ROBOLAB® (UR) environment. In order to optimize the code, parts of it have been written in Assembler. The NXT is programmed using the LabVIEW® NXT toolkit.
    • Consult the graphical code:
      • Main RCX test program : This program continuously determines cues as the average of 10 values. From the time-lag the angle a (cf. ELEKTOR article) is computed according to a variant of the common formula relating the azimuth to the time-delay [VON HORNBOSTEL & WERTHEIMER1920] and is sent to the NXT via HiTechnic IR-link. (Note that the RCX toggles the outputB mode between brake and float states in order to tell the NXT that the RCX is ready for sending data. The NXT sensor port1 is configured as a LEGO RCX touch sensor and reads 0, 1, in function of the RCX outputB state. This is very important, since the RCX works critically or with interrupts disabled during the sampling and detection process. Note that RCX sensor1 powers the stereo audio sensor. OutputA powers the laser. The shorthand formula to calculate the speed of sound in dependency of the ambient air temperature is taken from [HARTMANN2000, p.611]
      • NXT program: uses the Hitechnic driver for the IR-link (with a few bugs fixed.)
    • We denote :
      • N : unsigned integer   //number of samples
      • R : array [0..N] of bits  //contains the sampled signal from the right channel
      • L : array [0..N] of bits  //holds the sampled signal from left channel
      • x_R : bit //current value from the right channel
      • x_L : bit //current value from the left channel
      • MAX_LAG : unsigned integer  //defines the relevant phase shifts
      • minimum : unsigned integer   //minimum of the integrated XORed values
      • lag : unsigned integer  //=argmin
      • tau : integer //=MAX_LAG - lag (the sign of tau indicates whether the left or the right signal is preceeding
      • sum : unsigned integer //counts the logical "1"s after the XOR process
      • i,j,k : unsigned integers  //loop indices
There are two successive procedures:
    • The sampling process that obeys the following sequence (in pseudo-code):
      • Initialize 1-bit ADC
      • FOR k:= 0 TO (N-1) DO
        • Start ADC
        • Wait for ADC completion
        • R[k]:=x_R
        • L[k]:=x_L
    • The phase detection that executes the steps:
      • minimum:= 0xFFFF  //initialize with maximal possible value
      • lag:=MAX_LAG  //initialize with the middle position
      • FOR j:= 0 TO (2*MAX_LAG-1) DO
        • sum:= 0  //clear the integrate value
        • FOR i:= 0 TO (N-2*MAX_LAG-1) DO
          • IF R[i+MAX_LAG] XOR L[i+j] = TRUE THEN sum:=sum+1
        • IF sum<minimum THEN
          • minimum:= sum
          • lag:= j
      • tau:= MAX_LAG - lag
Because it is not possible with the H8/3292 (the microcontroller that forms the heartz of the RCX) to address single bits in arrays, the transcription into executable code is a bit more complicated. As already pointed out, the practical implementation uses the H8 features of bit-operations that are applied to whole bytes. Also, since the H8 10-bit ADC-channels must be read anyway before being able to reactivate the ADC device, the down-sampling to a 1-bit ADC is done by answering the question, if the upper 8-bit value is greater than 127, which is equivalent to the question, if the current signal value is greater than 2.5V.
We denote :
  • M : unsigned integer = N DIV 8
  • R_8 : array [0..M] of bytes
  • L_8 : array [0..M] of bytes
  • x_R8 : upper 8 bits of the H8/3292 10bit ADC right channel
  • x_L8 : upper 8 bits of the H8/3292 10bit ADC left channel
  • bit_index : byte  //selects bit in current byte
  • g : unsigned integer //index for arrays
  • c_flag : bit  //copy of the carry-flag
  • _carry : bit  //H8 system carry-flag
  • noo(x:byte): byte look-up table of the function count the logical "1"s in the byte
The sampling procedure is redesigned:
  • Initialize 10-bit ADC
  • bit_index:=7
  • x_R8 = 127  //initialize the channel value to the neutral position
  • x_L8 = 127  //idem (==> first value is useless)
  • g:=0
  • FOR k:= 0 TO (N-1) DO
    • Start ADC
    • //the ADC process takes 133 H8 states during which the CPU may do some work,
    • //so analyse the data from the previous iteration
    • IF x_R8>127 THEN bit[R_8[g],bit_index]:= 1 ELSE bit[R_8[g],bit_index]:= 0
    • IF x_L8>127 THEN bit[L_8[g],bit_index]:= 1 ELSE bit[L_8[g],bit_index]:= 0
    • IF bit_index=0 THEN
      • bit_index:= 7
      • g:=g+1
    • ELSE
      • bit_index:=bit_index-1
    • Wait for ADC completion
    • x_R8 = HI(ADC channel 0)  //get ADC data (upper 8 bits) for next iteration
    • x_L8 = HI(ADC channel 1)
And the phase detection is rewritten as:
  • minimum:= 0xFFFF  //initialize with maximal possible value
  • lag:=MAX_LAG  //initialize with the middle position
  • FOR j:= 0 TO 2*MAX_LAG-1 DO
    • sum:= 0  //clear the integrate value
    • FOR i:= 0 TO (M-MAX_LAG DIV 4-1) DO
      • sum:= sum + noo(R_8[i+MAXLAG DIV 8] XOR L_8[i])
    • IF sum < minimum THEN
      • minimum:= sum
      • lag:= j
    • //now shift all the bits in the L_8 array 1 digit to the left
    • c_flag:=0
    • FOR k2:=N-1 DOWNTO 0 DO
      • _carry:= c_flag
      • Rotate_left_with_extended_carry( L_8[k2] )
      • c_flag:= _carry
  • tau:= MAX_LAG - lag
H8/3292 Assembler code of the sampling and phase detection process:
              //Fast phase detector subroutine for binaural sensor
              //Author CB, August 2008, version 1.0
              //expect differences from pseudo-code at
              //especially all the loops decrement
              //therefore the bit_index must start with 0
              //because arrays are being filled from the right
              //also variable "g" is replaced with pointers
              //For easy debugging, all variables are in
              //global memory or registers
              //Assembler partly generated by ULTIMATE ROBOLAB
              //sub tau
label begin_of_sub_tau
              //****************begin of sampling part*****************              
              //disable all interrupts
              //during sampling
              orc #0x80
              //set sensor1 to 9V
              mov.b @0xFFBB,r6L
              or #0x4,r6L
              mov.b r6L,@0xFFBB
              //define array
              //L_8 @0xCEB8
              //define array
              //R_8 @0xCF38
              //Configure ADC
              mov.b #0x19,r6L
              mov.b r6L,@0xFFE8
              //Initialize registers for sampling process
              //r0 : pointer to L_8-array ; initialized with last byte
              //r1 : pointer to R_8-array ...
              //r2 : pointer to R_8 root address
              //r3L and r3H : last sampled values; initialized with 0x7F
              //r4L : bit_index
              //r5/r6 : standard use for moves
              mov.w #0xCF37,r0
              mov.w #0xCFB7,r1
              mov.w #0x7F7F,r3
              mov.b #0x0,r4L
              mov.w #0xCF38,r2
              //For k=(M-1)+R_pointer downto 0+R_pointer
label beginloop_k_loop
              cmp.w r2,r1
              bcs endloop_k_loop
              //start ADC
              mov.b @0xFFE8,r6L
              or #0x20,r6L
              mov.b r6L,@0xFFE8
              //IF x_L > 127 then set current bit
              //else clear the bit
              cmp.b #0x7F,r3H
              bhi true_1080
              jmp false_1080
label true_1080
              bset r4L,@r0
              jmp endif_1080
label false_1080
              bclr r4L,@r0
Label endif_1080
              //IF x_R > 127 then set current bit
              //else clear the bit
              cmp.b #0x7F,r3L
              bhi true_1079
              jmp false_1079
label true_1079
              bset r4L,@r1
              jmp endif_1079
label false_1079
              bclr r4L,@r1
Label endif_1079
              //IF bit index = 7 then reset bit index
              //and decrement the pointers, else
              //increment the bit index
              cmp.b #0x7,r4L
              beq true_1078
              jmp false_1078
label true_1078
              mov.b #0x0,r4L
              subs #0x1,r0
              subs #0x1,r1
              jmp endif_1078
label false_1078
              inc r4L
Label endif_1078
              //Wait for ADC completion
label wait_for_A_D_completion
              mov.b @0xFFE8,r6L
              btst #7,r6L
              beq wait_for_A_D_completion
              //clear flags
              and #0x5F,r6L
              mov.b r6L,@0xFFE8
              //get x_L from ADC channel (upper 8 bits)
              mov.w @0xFFE0,r6
              mov.b r6H,r3H
              //get x_R from ADC channel (upper 8 bits)
              mov.w @0xFFE2,r6
              mov.b r6H,r3L
              //NEXT iteration
              jmp beginloop_k_loop
label endloop_k_loop
              //re-enable interrupts
              andc #0x7F
              //******************end of sampling part*******************
              //******************begin of phase detection part**********
              //if this saction is made critical, the computing time is
              //reduced by half.
              //now registers have different purposes
              //r0 : i loop index
              //r1 : k2 loop index
              //r2 : =0 used for comparison
              //r3L<0> : carry_flag
              //r4 : unused
              //r5/r6 : normal use for moves
              //r6 also used as index to the noo look-up table
              // MIN = 0xFFFF
              mov.w #0xFFFF,r6
              mov.w r6,@0xD020
              // LAG = 0
              mov.w #0x0,r6
              mov.w r6,@0xD024
              // J = 48
              mov.w #0x30,r6
              mov.w r6,@0xD026
label beginloop_1082
              //IF J > 0
              mov.w #0x0,r5
              mov.w @0xD026,r6
              cmp.w r5,r6
              bhi fordo_1082
              jmp endloop_1082
label fordo_1082
              subs #0x1,r6
              mov.w r6,@0xD026
              // SUM = 0
              mov.w #0x0,r6
              mov.w r6,@0xD028
              //i-loop with phase detection
              mov.w #0x7A,r0
              mov.w #0x0,r2
label beginloop_i_loop
              cmp.w r2,r0
              bhi fordo_i_loop
              bra endloop_i_loop
label fordo_i_loop
              subs #0x1,r0 
              //get the bytes from L and R
              mov.b @(0xCEB8,r0),r5L
              mov.b @(0xCF3B,r0),r6L
              //compute the bits, where the bytes are different
              //and add NOO-function  result (look-up table @0xCD86)
              //with displacement in r6 to SUM
              xor r5L,r6L
              mov.b #0x0,r6H
              mov.b @(#0xCDB6,r6),r5L
              mov.b #0x0,r5H
              mov.w @0xD028,r6
              add.w r5,r6
              mov.w r6,@0xD028
              bra beginloop_i_loop
label endloop_i_loop
              //IF SUM < MIN
              mov.w @0xD020,r5
              mov.w @0xD028,r6
              cmp.w r5,r6
              bcs true_1083
              jmp false_1083
label true_1083
              // LAG = J
              mov.w @0xD026,r6
              mov.w r6,@0xD024
              // MIN = SUM
              mov.w @0xD028,r6
              mov.w r6,@0xD020
              jmp endif_1083
label false_1083
Label endif_1083
              mov.b #0x0,r3L
              mov.w #0x80,r1
              mov.w #0x0,r2
label beginloop_k2_loop
              cmp.w r2,r1
              bhi fordo_k2_loop
              bra endloop_k2_loop
label fordo_k2_loop
              subs #0x1,r1
              //get carry flag from previous iteration
              bld #0x0,r3L
              //get the iterated byte out of L_8
              mov.b @(0xCEB8,r1),r6L
              //rotate the byte with extend carry left
              rotxl r6L
              //store carry flag for next iteration
              bst #0x0,r3L
              //store rotated byte
              mov.b r6L,@(0xCEB8,r1)
              bra beginloop_k2_loop
label endloop_k2_loop
              jmp beginloop_1082
label endloop_1082
              // TAU = 24
              mov.w #0x18,r6
              mov.w r6,@0xD032
              //TAU =TAU - LAG
              mov.w @0xD024,r6
              jsr sys_type_cast_U16_to_I16
              mov.w r6,r5
              mov.w @0xD032,r6
              sub.w r5,r6
              mov.w r6,@0xD032
              //******************end of phase detection part************                         
              rts 0
DSP module:

    • Using the LabVIEW DSP toolkit, the SPEEDY33 DSP module (Hyperception) has been programmed to execute the phase detection that is exposed here.
    • See the LV code that produces the following result. Note that with this method of the ITD segregation works best in the presence of complex signals:

Improvement (added August 2009):
    • We have been able to drastically improve the performance of the phase detection. In fact, we had observed that occasionally this method produces very strange measures. We first thought that simple filtering would help - and in fact it does -, but, except for the Kalman filter method (cf. Binaural phase detection with NI-DSP toolkit and the Kalman filter), filtering did not really stabilize the respons. So, we went back by studying the signals and we found that early reflections and reverberation partly corrupts the cross-correlation function. In normal cross-correlation we could use Blauert's degree of coherence, in order to prune bad signals. But for all that, in this phase detection method, there is no analysis of the signals. Another observation was the casual presence of strong high frequency components that produces more than one local maximum (here: minimum) in the region of interest (ROI). If one bad maximum has higher amplitude than the correct one, the method would yield the bad one.
    • Therefore we changed the point of view by introducing a probabilistic approach:
      • We can assume that over time the correct cross-correlation maximum will be statistically yielded more frequently than the bad one.
      • So, over a certain time we draw a histogram of the occurrences of the different maxima.
      • From this chart we choose the argmax, and we can be rather certain that it will be the correct one. In fact, this approach is a maximum-likelihood method.
      • Note that early reflections and reverberation still causes problems, although considerably less.
      • In order to allow a continuous process, we use an IR-filter for the histogram.
      • In Ultimate ROBOLAB (for the RCX) the new function changes to: (Note that we now use stack variables, except for the arrays. Also notice that the SUMS array is no longer needed.)

    • Many hardware phase detectors are based on the eXclusive OR process, cf [SIEGL2003, pp. 430]. They are acceptable in the case of simple shaped low-noise signals. However, since they yield a value that is proportional to the integrate of the instant XOR function, they suffer from the high variability of the integrate value. Even if used with a Phase-Lock-Loop PLL circuitry -cf [SIEGL2003, pp. 430]- noisy or complex signals may lock the PLL on a local minimum corresponding to a phase that not necessarily is the real phase difference of the initial signals.
    • From the software point of view, Greedy algorithms, that often are applied to determine optimums, fail for the same reason, locking onto a local optimum that not necessarily represents the global optimum, cf [SCHROEDER2001, pp. 185]