Morphotactic based Statistical Language Modeling for Large Vocabulary Continuous Speech Recognition Systems
BAP funded project on morphotactic-based language modeling for speech recognition.
Status
Completed 2006 - 2007
Morphotactic based Statistical Language Modeling for Large Vocabulary Continuous Speech Recognition Systems
Funding Agency: Bogazici University Research Fund, BAP (Project 06A102)
Project Manager: Tunga Güngör
Dates: 2006-2007
In this project, we aimed to develop a new language model for large vocabulary continuous speech recognition (LVCSR) systems for agglutinative languages like Turkish. As is known, the ability to produce an unlimited number of words in agglutinative languages causes difficulties in creating language models in speech recognition systems, and the lack of a good language model significantly affects the effectiveness of these systems.
This work aimed to create an effective language model by combining Turkish morphotactic (morpheme ordering rules) information with an n-gram language model. Thus, large vocabulary speech recognition systems with many application areas could be developed for Turkish.