Acoustic source location using a microphone array

Imported: 23 Feb '17 | Published: 22 Oct '02

Pi Sheng Chang, Aidong Ning, Michael G. Lambert, Wayne J. Haas

USPTO - Utility Patents


An apparatus and method in a video conference system provides accurate determination of the position of a speaking participant by measuring the difference in arrival times of a sound originating from the speaking participant, using as few as four microphones in a 3-dimensional configuration. In one embodiment, a set of simultaneous equations relating the position of the sound source and each microphone and relating to the distance of each microphone to each other are solved off-line and programmed into a host computer. In one embodiment, the set of simultaneous equations provide multiple solutions and the median of such solutions is picked as the final position. In another embodiment, an average of the multiple solutions are provided as the final position.



FIG. 1 shows an embodiment of the present invention in video teleconference system


FIG. 2 shows a Cartesian coordinate system

150 used in conjunction with video teleconference system

100 to illustrate the present invention.

FIG. 3 is a block diagram representing the functions of time delay estimation and voice activity detection module

106 of FIG.


FIG. 4 shows an alternative approach to computing time delay using an adaptive filter


FIG. 5 shows the steps of a Cross-Power Spectrum Phase (CPSP) computation.

FIG. 6 shows a plot of the time-domain cross-correlation coefficients resulting from a CPSP computation.

FIG. 7 shows 16 configurations each representing three range differences obtained from pairs of microphones



107C and



a and


b show the analytical solutions for speaker location (x, y and z) solved using equation groups y

134 and y

234, respectively.

FIG. 9 shows illustrates the distance frame, which is the horizontal span covered by an image, in relation to the zoom angle zoom.



a) and


b) illustrate a method for adjusting detected speaker position to minimize steering a camera to a erroneously calculated sound source position.



a) and


b) illustrate a method for minimizing undesirable camera movements, by dividing the field seen by a camera into 3-dimensional zones.


1. A method for locating a speaking participant in a video conference, comprising:

2. A video conference system, comprising:

3. A video conference system, comprising:

4. A system as in claim 3, wherein said predetermined boundary is determined based on the minimum and maximum heights expected of a conference participant.

5. A video conference system, comprising:

6. A system as in claim 5, wherein said camera control module, upon receiving said final position from said position determination module, compares said final position with a current position of said camera, and wherein when said final position is separated from said current position by less than a predetermined number of zones, said camera control module does not direct said camera to said final position.

7. A system as in claim 6, wherein when said final position and said current position both correspond to the same speaker, said predetermined number of zones is two zones.

8. A system as in claim 6, wherein when said final position and said current position correspond to positions of different speakers, said predetermined number of zones is one zone.