Morphotactic based Statistical Language Modeling for Large Vocabulary Continuous Speech Recognition Systems

Funding Agency: Bogazici University Research Fund, BAP (Project 06A102)

Project Manager: Tunga Güngör

Dates: 2006-2007

In this project, we aimed to develop a new language model for large vocabulary continuous speech recognition (LVCSR) systems for agglutinative languages like Turkish. As is known, the ability to produce an unlimited number of words in agglutinative languages causes difficulties in creating language models in speech recognition systems, and the lack of a good language model significantly affects the effectiveness of these systems.

This work aimed to create an effective language model by combining Turkish morphotactic (morpheme ordering rules) information with an n-gram language model. Thus, large vocabulary speech recognition systems with many application areas could be developed for Turkish.

Morphotactic based Statistical Language Modeling for Large Vocabulary Continuous Speech Recognition Systems

Status

Morphotactic based Statistical Language Modeling for Large Vocabulary Continuous Speech Recognition Systems