Abstracts Submitted: 510
Number of Users: 676
View Abstracts Submitted
Back to home Page
The primary objective of speech signal is the transfer of linguistic message. However, the acoustic signal also contains auxiliary information related to language, speaker, acoustic environment etc. Automatic recognition of the language of speech signal is not only scientifically interesting but also of technological importance in a multilingual country such as India. While a few language identification systems in the context of Indian languages have been implemented, most such systems have used small scale speech data collected internally within the organization. Recently, the Babel program of IARPA has released challenging speech data in three Indian languages: Assamese, Bengali and Tamil. Here, we report the development of an automatic language identification (LID) system that recognises the language of a given speech signal in any of these 3 languages. The LID system is built as an layer over the Automatic Speech Recognition (ASR) systems of these 3 languages trained using the Babel database. Corresponding to a given test speech, the 3 ASR systems yield 3 likelihood values corresponding to the chances that the test speech belongs to the respective language. The LID system applies the Maximum Likelihood criterion to declare the language of the test speech as that corresponding to the maximum of the 3 likelihood values. The Kaldi toolkit was used to implement the ASR systems that employed GMM-HMM to model acoustic properties of phones, and bigram language model. The Word Error Rates (WER) of the Tamil, Assamese and Bengali ASR systems for test data are 65.3%, 61.3% and 52.6% respectively. These are comparable to the WERs of Assamese and Bengali ASR systems (64.3% and 66.8% respectively) implemented using the same Babel database by the Cambridge University group. However, these WERs are much larger than the WERs (about 5%) of ASR systems implemented by us recently. This change of WER could be attributed to significant differences in the diversity of acoustic environments, number of speakers, size of the lexicon and length of transcriptions between the databases used by the previous and current ASR systems. The accuracy of the LID system is 86.1%; major confusion is between Assamese and Bengali, languages belonging to the same language family.
© Copyright 2017 All Rights Reserved