Deep Learning-aided Image Captioning In Chest X-rays for TB Screening
MetadataShow full item record
The fight against TB in the African region and other low and middle income countries is mainly challenged by the limited number of skilled radiologists in those countries. There are large populations yet few radiologists who then face a problem of attending to the many patients. Writing medical reports for each patient takes relatively a lot of time hence diagnosing many patients and manually writing a medical report for each of them is very time consuming and laborious. A deep learning-aided image captioning system offers great support to the radiologists in automatic image captioning of the CXRs which can be a valuable tool to the TB detection and medical report writing for the patients as well as offer faster and more accurate results. In this work, application of deep learning towards image captioning of CXRs for TB was investigated. An open source dataset from Indiana University and a local clinical dataset from Mengo Hospital were obtained. The Indiana University dataset contained 7470 chest X-ray images with their 2955 associated reports in xml format. 311 images with their reports were also collected from Mengo Hospital to comprise the local dataset.Two pretrained models i.e. EfficientNet and CheXNet were used as baseline feature extractors and used to design two models that can generate captions for Chest X-ray images. The Efficient-Net transformer model comprised of the Efficient CNN used as a feature extractor, a vanilla transformer encoder and decoder used to generate the captions. The CheXnet-LSTM model comprised of the CheXNet CNN used as a feature extractor, an encoder and an LSTM used as decoder to generate captions. The models were trained using the Indiana University dataset and evaluated using the Indiana University dataset and the local dataset. The EfficientNet-Transformer model emerged the best performance model with a BLEU score of 0.515 which was better than the results of the state of art approaches. The model was deployed in a web application allowing the user to upload a chest X-ray image and get a predicted caption in seconds.