Computationally Efficient and Robust BIC-Based Speaker Segmentation

Kotti, M.; Benetos, E.; Kotropoulos, C.

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Kotti, M., Benetos, E. & Kotropoulos, C. (2008). Computationally Efficient and Robust BIC-Based Speaker Segmentation. IEEE Transactions on Audio, Speech & Language Processing, 16(5), pp. 920-933. doi: 10.1109/tasl.2008.925152

Abstract

An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches.

Publication Type:	Article
Additional Information:	© 2008 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Publisher Keywords:	Audio recording, Bayesian methods, Covariance matrix, MPEG 7 Standard, Maximum likelihood estimation, NIST, Performance evaluation, Robustness, Speech, System testing
Subjects:	M Music and Books on Music > M Music Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Departments:	School of Science & Technology > Department of Computer Science
SWORD Depositor:	Symplectic Administrator

[thumbnail of kottibenetoskotropoulos_taslp_postprint.pdf]

Preview

PDF
Download (353kB) | Preview

Official URL: https://doi.org/10.1109/tasl.2008.925152

Export

Downloads

Downloads per month over past year

View more statistics

Metadata

Altmetric

View Altmetric information about this item.

CORE (COnnecting REpositories)

Actions (login required)

Admin Login

Creators:	Kotti, M. Benetos, E. Kotropoulos, C.
Status:	Published
Refereed:	Yes
Journal or Publication Title:	IEEE Transactions on Audio, Speech & Language Processing
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
ISSN:	1558-7916
e-ISSN:	1743-0003
URI:	https://openaccess.city.ac.uk/id/eprint/2045
Date available in CRO:	15 Jan 2013 11:22
Dates:	Date Event 31 July 2008 Published