Natural Language Processing: Luganda Heteronym Disambiguation.
Abstract
Speech input received from a user, can contain a heteronym, with one or additional words. A correct pronunciation of the heteronym can be based on at least one of the phonemic string and the frequency of occurrence of the n-gram. A dialogue response, which can include the heteronym to the speech input can be generated. The dialogue response can be output as a speech output. The heteronym in the dialogue response can be pronounced in the speech output according to the determined correct pronunciation. This study is about solving the ambiguities in these heteronyms.
Methodology includes Part of speech tagging which defines the nature of the word in the sentence and word sense disambiguation which is the problem of determining which "sense" (meaning) of a word activated by the use of the word in a particular context.
The available results are still based on the output of disambiguating a set of heteronyms, response to larger data has not yet been evaluated.
The implemented Lesk algorithm shall be integrated to the speech system when it gets to disambiguate the larger amount of data.