Deep Learning-Aided Image Captioning In Chest X-Rays For TB Screening

Sebampitako, Duncan

View/Open

Undergraduate Dissertation (1.481Mb)

Date

2022

Author

Sebampitako, Duncan

Metadata

Show full item record

Abstract

Tuberculosis (TB) is a contagious disease that is a major source of illness and one of the top causes of mortality across the world.TB can be screened through Chest X-ray, Ultrasound, Computed Tomography and Magnetic Resonance Imaging(MRI). Chest X-ray is cost effective and has widespread availability hence it is preferred for TB screening. Chest X-ray images must be interpreted by radiologists. The radiologists must describe the findings of each part of the body inspected in the imaging scan in textual reports, specifying whether each area was determined to be normal, abnormal, or potentially abnormal. Writing medical-imaging reports is time-consuming, error-prone and laborious for radiologists especially those operating in rural areas where healthcare quality is low. Two deep learning models were developed in this work to automate the task of medical report writing. The CheXNet Convolutional Neural Network-Long Short Term Memory(LSTM) model applied transfer learning from the pretrained CheXNet CNN to extract visual features from chest X-ray images. The LSTM model was then used as the medical report generator from the extracted visual features. The second model was the EfficientNet CNN-Transformer model. It used the EfficientNet CNN to extract the visual features from the chest X-ray images. The EfficientNet CNN exploits compound scaling of dimensions such as width, depth and resolution of the network to achieve high accuracy and efficiency. The transformer model was then used for visio-language attention and generation of the medical report. Both models were trained on the Indiana University chest X-ray dataset for 70 epochs. The EfficientNet CNN-Transformer model outperformed the CheXNet CNN LSTM model on all the BLEU score metrics with a BLEU score of 0.515 for a one word n-gram. The results demonstrate the importance of the choice for the visual feature extractor as well as the language generator models. We also demonstrated the importance of a robust dataset in achieving the best results when training AI models. Data bias is a severe problem that may degrade even the best models. It is crucial in the development of medical reports that the data acquired not only accounts for all stages of a single pathology, but is also sufficient across all lung pathologies to create a clinically accurate and coherent medical report.

URI

http://hdl.handle.net/20.500.12281/14306

Collections

School of Engineering (SEng.) Collections