Language Identification is Difficult for Non-native Speech
Abstract: Identifying languages from speech is a crucial initial step in many systems for processing spoken language. In recent years, the accuracy of state-of-the-art language identification systems has improved rapidly. This improvement is mainly due to the use of self-supervised models that are pre-trained on multilingual data and the use of large training datasets, such as VoxLingua107. This presentation demonstrates that when dealing with speech containing non-native or regional accents, or speech from highly conversational scenarios, the accuracy of spoken language identification systems decreases dramatically. However, non-native language identification can be improved by applying text classification methods on the outputs of multiple speech recognition systems.