Speaker Utterances Tying Among Speaker Segmented Audio Documents Using Hierarchical Classification: Towards Speaker Indexing of Audio Databases

Document Type

Conference Proceeding

Publication Date



Speaker indexing of an audio database consists in organizing the audio data according to the speakers present in the database. It is composed of three steps: (1) segmentation by speakers of each audio document; (2) speaker tying among the various segmented portions of the audio documents; and (3) generation of a speaker- based index. This paper focuses on the second step, the speaker tying task, which has not been addressed in the literature. The re- sult of this task is a classification of the segmented acoustic data by clusters; each cluster should represent one speaker. This paper investigates on hierarchical classification approaches for speaker tying. Two new discriminant dissimilarity measures and a new bottom-up algorithm are also proposed. The experiments are con- ducted on a subset of the Switchboard database, a conversational telephone database, and show that the proposed method allows a very satisfying speaker tying among various audio documents, with a good level of purity for the clusters, but with a number of clusters significantly higher than the number of speakers.


This is a pre-copy-editing, author-produced PDF of an article presenteed at ISCA International Conference on Spoken Language Processing (ICSLP 2002). This article may not exactly replicate the final published version.