Real time translation from Ugandan sign language to speech
Real time translation from Ugandan sign language to speech
| dc.contributor.author | Nakuwanda, Bridget Hellen | |
| dc.contributor.author | Luwaga, Micheal | |
| dc.contributor.author | Sozzi, Henry | |
| dc.contributor.author | Egesa, Alex | |
| dc.date.accessioned | 2024-11-22T13:06:19Z | |
| dc.date.available | 2024-11-22T13:06:19Z | |
| dc.date.issued | 2024 | |
| dc.description | A report submitted to the School of Computing and Informatics Technology for the study leading to the implementation of a project in partial fulfillment of the requirements for the award of the Degree of Bachelor of Science in Computer Science of Makerere University. | en_US |
| dc.description.abstract | Sign Language is the mode of communication used by the deaf and hard of hearing communities in the entire world. It involves use of body key points to make standard glosses that have meaning. It differs from region to region and in Uganda specifically, Uganda Sign Language is gazetted. Real-Time Translation from Ugandan Sign Language to Speech project intends to leverage on pose estimation, hand tracking computer vision, and sequence to sequence modeling to translate visual Ugandan Sign Language into English speech in real time. The core methodology included developing a comprehensive dataset of USL gestures with corresponding English annotations. Non-manual linguistic features such as hand position and orientation were emphasized extracted using MediaPpipe Library. An encoder neural network was also used to extract context-specific spatio-temporal information from the hand tracking data. Comparative pipelines that involve use Resnet50 and VGG19 were run to compare their performance. The model architecture incorporated convolutional layers for initial feature extraction, followed by multiple transformer blocks designed to capture long-range dependencies and contextual nuances inherent in sign language glosses. The model was trained, tested, and validated, achieving a perfect accuracy rate of 100 across all classes. Evaluation metrics such as precision, recall, and F1-score also reached perfect scores, demonstrating the model’s robustness and effectiveness. The training process involved 60 epochs, with consistent improvements observed in both training and validation metrics. The training loss decreased from 0.3851 to 0.3574, while the training accuracy increased from 87.16 to 88.12. The validation loss showed significant reduction, reaching as low as 0.000042, and the validation accuracy consistently reached 100, underscoring the model’s ability to generalize well to unseen data. To prevent overfitting, advanced regularization techniques such as dropout were employed. The model’s performance was further confirmed through a detailed confusion matrix, which indicated no misclassifications. This project is the first to deliver an inclusive Ugandan Sign Language translation system using state-of-the-art techniques in computer vision and artificial intelligence. | en_US |
| dc.identifier.citation | Nakuwanda, B. H. (2024). Real time translation from Ugandan sign language to speech (Unpublished undergraduate dissertation). Makerere University, Kampala, Uganda. | en_US |
| dc.identifier.uri | http://hdl.handle.net/20.500.12281/19438 | |
| dc.language.iso | en | en_US |
| dc.publisher | Makerere University | en_US |
| dc.subject | Sign language | en_US |
| dc.title | Real time translation from Ugandan sign language to speech | en_US |
| dc.type | Thesis | en_US |