Deep Learning-Aided Image Captioning In Chest X-Rays For TB Screening
Abstract
Tuberculosis (TB) is a contagious disease that is a major source of illness and one of the top
causes of mortality across the world.TB can be screened through Chest X-ray, Ultrasound,
Computed Tomography and Magnetic Resonance Imaging(MRI). Chest X-ray is cost effective
and has widespread availability hence it is preferred for TB screening. Chest X-ray images
must be interpreted by radiologists. The radiologists must describe the findings of each part
of the body inspected in the imaging scan in textual reports, specifying whether each area
was determined to be normal, abnormal, or potentially abnormal. Writing medical-imaging
reports is time-consuming, error-prone and laborious for radiologists especially those operating
in rural areas where healthcare quality is low. Two deep learning models were developed in
this work to automate the task of medical report writing. The CheXNet Convolutional Neural
Network-Long Short Term Memory(LSTM) model applied transfer learning from the pretrained
CheXNet CNN to extract visual features from chest X-ray images. The LSTM model was then
used as the medical report generator from the extracted visual features. The second model
was the EfficientNet CNN-Transformer model. It used the EfficientNet CNN to extract the
visual features from the chest X-ray images. The EfficientNet CNN exploits compound scaling
of dimensions such as width, depth and resolution of the network to achieve high accuracy and
efficiency. The transformer model was then used for visio-language attention and generation
of the medical report. Both models were trained on the Indiana University chest X-ray dataset
for 70 epochs. The EfficientNet CNN-Transformer model outperformed the CheXNet CNN LSTM model on all the BLEU score metrics with a BLEU score of 0.515 for a one word n-gram.
The results demonstrate the importance of the choice for the visual feature extractor as well
as the language generator models. We also demonstrated the importance of a robust dataset
in achieving the best results when training AI models. Data bias is a severe problem that
may degrade even the best models. It is crucial in the development of medical reports that
the data acquired not only accounts for all stages of a single pathology, but is also sufficient
across all lung pathologies to create a clinically accurate and coherent medical report.