Natural Language Processing: A Luganda Part of Speech Tagger
Abstract
This research study describes the initial experiment in designing a Hidden Markov Model (HMM)-based part-of-speech tagger for the Luganda language. Part-of-speech tagging involves assigning the proper tag to each word in a text based on its context. The process was accomplished in two primary steps: morphological analysis and disambiguation. This study focuses on tagging accuracy, specifically the challenge of correctly tagging each token and handling new tokens. We constructed a first-order stochastic disambiguation algorithm, using supervised learning techniques, to learn HMM parameters from hand-crafted corpora. The Viterbi algorithm was employed to determine the most probable tag for each word.