A Luganda part of speech tagger.
Abstract
The study seeks to build a Luganda part of speech tagger through machine learning methods. Parts of speech in the Luganda dialect are gathered, studied and tagged (indexed) with codes. Further more, tenses of the words are taken into account and added to the code. This provides are complex code that is unique to a specific word and the tense used.
The words together with their corresponding codes are fed to an HMM (written in Java) for training, and therefore tested. Known words are used for training and for these the program is expected to achieve 100% accuracy. Unknown are passed on for testing purposes.
The overall accuracy for the program is expected to be atleast 70%.