Deep Learning-aided Image Captioning In Chest X-rays for TB Screening
Abstract
The fight against TB in the African region and other low and middle income countries
is mainly challenged by the limited number of skilled radiologists in those countries.
There are large populations yet few radiologists who then face a problem of attending to
the many patients. Writing medical reports for each patient takes relatively a lot of time
hence diagnosing many patients and manually writing a medical report for each of them is
very time consuming and laborious. A deep learning-aided image captioning system offers
great support to the radiologists in automatic image captioning of the CXRs which can be
a valuable tool to the TB detection and medical report writing for the patients as well as
offer faster and more accurate results. In this work, application of deep learning towards
image captioning of CXRs for TB was investigated. An open source dataset from Indiana
University and a local clinical dataset from Mengo Hospital were obtained. The Indiana
University dataset contained 7470 chest X-ray images with their 2955 associated reports
in xml format. 311 images with their reports were also collected from Mengo Hospital
to comprise the local dataset.Two pretrained models i.e. EfficientNet and CheXNet were
used as baseline feature extractors and used to design two models that can generate
captions for Chest X-ray images. The Efficient-Net transformer model comprised of the
Efficient CNN used as a feature extractor, a vanilla transformer encoder and decoder used
to generate the captions. The CheXnet-LSTM model comprised of the CheXNet CNN
used as a feature extractor, an encoder and an LSTM used as decoder to generate captions.
The models were trained using the Indiana University dataset and evaluated using the
Indiana University dataset and the local dataset. The EfficientNet-Transformer model
emerged the best performance model with a BLEU score of 0.515 which was better than
the results of the state of art approaches. The model was deployed in a web application
allowing the user to upload a chest X-ray image and get a predicted caption in seconds.