(Makerere University, 2024-08-30)
Abataki, Wani Richard
In today’s healthcare landscape, analyzing patient reviews is crucial for personalized medicine, yet the complexity of medical language poses challenges for traditional sentiment analysis tools. This study develops a machine learning model to accurately predict drug ratings from patient reviews by leveraging advanced sentiment analysis and robust techniques. Using a dataset from the UCI Machine Learning Repository, we preprocessed the reviews with extractive summarization and vectorized them using TF-IDF and LSA. Four lexicon-based sentiment analysis methods (VADER, Bing, AFINN, NRC) were employed to generate sentiment scores, evaluated for MSE and correlation with actual ratings. Random Forest and XGBoost Classifiers were trained and assessed on both unbalanced and balanced datasets, with performance metrics including accuracy, precision, recall, and F1 score. The results indicated that VADER achieved the highest accuracy, with an MSE of 52.22 and a correlation of 0.38 with actual ratings. The XGBoost Classifier outperformed the Random Forest model, achieving an accuracy of 72.7% on balanced data, with precision of 67.82%, recall of 65.25%, and an F1 score of 65.63% at a tree depth of 20. The study successfully developed a predictive model for drug ratings, demonstrating the effectiveness of integrating VADER’s sentiment analysis with XGBoost for accurate predictions.