Explainable real-time sign language to text conversion
View/ Open
Date
2024Author
Katende, Jericho
Mawejje, Mark William
Musemeza, Murungi Isaac
Metadata
Show full item recordAbstract
Sign languages are the primary means of communication for the Deaf and Hard of Hearing (DHH) community, but communication barriers remain between sign language users and the hearing population. Existing sign language translation models frequently lack transparency, which undermines trust and adoption. In this paper, we develop an explainable real-time sign language to text translation system that employs deep learning techniques and multiple interpretability methods, including SHapley Additive Explanations (SHAP), Gradient-weighted Class Activation Mapping (Grad-CAM), and Local Interpretable Model-Agnostic Explanations (LIME). We use a pre-trained VGG-16 network for feature extraction in conjunction with a custom classification model. In addition, we also finetuned other pre-trained models like VGG-19, Vision Transformers, EffecientNet, etc. The models were trained on the WLASL and Synthetic ASL Alphabet datasets, yielding explanations that shed light on the neural network’s decisionmaking process. We used SHAP, Grad-CAM, and LIME to enhance model interpretability. SHAP assigns importance values to each input feature, Grad-CAM depicts the input regions most relevant to the model’s predictions, and LIME generates local explanations for individual predictions. The combination of these methods yields a thorough understanding of the model’s behavior. We demonstrated the effectiveness of our approach through extensive experiments, achieving a translation accuracy of 97.8% on the test set and outperforming baseline methods. SHAP, Grad-CAM, and LIME explanations showed that the model relies on hand shape, movement, and facial expression features to accurately classify signs. These findings not only boost confidence in the model’s predictions, but also emphasize the importance of considering multiple aspects of sign language for effective translation. We use the trained model to create a user-friendly mobile application that provides real-time sign language translation to the DHH community while also encouraging inclusive communication. Our approach not only yields accurate and interpretable results, but it also encourages responsible AI practices in sign language translation. Our system’s explainability, achieved through the integration of multiple interpretability techniques, promotes trust and adoption, with the potential to bridge communication gaps between sign language users and the hearing population.