Design and Implementation of a Luganda Text Normalization Module for a Speech Software Program

Kagumire, Sulaiman

dc.contributor.author	Kagumire, Sulaiman
dc.date.accessioned	2019-11-11T12:21:21Z
dc.date.available	2019-11-11T12:21:21Z
dc.date.issued	2019-06-14
dc.identifier.citation	Kagumire, S. (2019). Design and Implementation of a Luganda Text Normalization Module for a Speech Software Program. Unpublished undergraduate dissertation, Makerere University.	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.12281/7076
dc.description.abstract	This report investigates the problem of text normalization; specifically, the normalization of non-standard words (NSWs) in Luganda. Non-standard words can be defined as those word tokens which do not have a dictionary entry, and cannot be pronounced using the usual letter to-phoneme conversion rules. NSWs pose a challenge to the proper functioning of text to speech technology, and the solution is to spell them out in such a way that they can be pronounced appropriately. In addition to ordinary words and names, real text contains non-standard “words” (NSW), including numbers, abbreviations, dates, currency amounts and acronyms. Typically, one cannot find NSW in a dictionary, nor can one find their pronunciation by an application of ordinary “letter-to-sound” rules. Non-standard words also have a greater propensity than ordinary words to be ambiguous with respect to their interpretation or pronunciation. In many applications, it is desirable to “normalize” text by replacing the NSWs with the contextually appropriate ordinary word or sequence of words. Typical technology for text normalization involves sets of ad hoc rules tuned to handle one or two genres of text (often newspaper-style text) with the expected result that the techniques do not usually generalize well to new domains.Text normalization means converting non-standard words into standard words. Such words can be in the format of numbers, dates, time, measurements, currencies and abbreviations. Text Normalization ensures that these non-standard words are pronounced easily by a TTS system. It is therefore an important part of any text-to-speech system because unintelligible speech is produced, especially for languages like Luganda, if text normalization is not implemented.In this report, a rule-based Luganda text normalization module that detects, classifies and verbalizes numbers, dates, time, measurements, currencies and abbreviations into Luganda words was designed and implemented using python programming language. Its implementation will enable production of intelligible speech by Luganda text-to-speech systems.	en_US
dc.language.iso	en	en_US
dc.subject	Detection-conversion	en_US
dc.subject	Text normalization module	en_US
dc.subject	Text -to-speech system	en_US
dc.title	Design and Implementation of a Luganda Text Normalization Module for a Speech Software Program	en_US
dc.type	Thesis	en_US

Files in this item

Name:: kagumire-cedat-BSce.pdf
Size:: 909.9Kb
Format:: PDF
Description:: A Thesis Report Submitted in ...

View/Open

This item appears in the following Collection(s)

School of Engineering (SEng.) Collections

Show simple item record