Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques

dc.contributor.author Abataki, Wani Richard
dc.date.accessioned 2024-11-29T08:52:25Z
dc.date.available 2024-11-29T08:52:25Z
dc.date.issued 2024-08-30
dc.description A dissertation report submitted to the School of Statistics and Planning in partial fulfillment of the requirements for the award of a Bachelor’s Degree in Statistics of Makerere University en_US
dc.description.abstract In today’s healthcare landscape, analyzing patient reviews is crucial for personalized medicine, yet the complexity of medical language poses challenges for traditional sentiment analysis tools. This study develops a machine learning model to accurately predict drug ratings from patient reviews by leveraging advanced sentiment analysis and robust techniques. Using a dataset from the UCI Machine Learning Repository, we preprocessed the reviews with extractive summarization and vectorized them using TF-IDF and LSA. Four lexicon-based sentiment analysis methods (VADER, Bing, AFINN, NRC) were employed to generate sentiment scores, evaluated for MSE and correlation with actual ratings. Random Forest and XGBoost Classifiers were trained and assessed on both unbalanced and balanced datasets, with performance metrics including accuracy, precision, recall, and F1 score. The results indicated that VADER achieved the highest accuracy, with an MSE of 52.22 and a correlation of 0.38 with actual ratings. The XGBoost Classifier outperformed the Random Forest model, achieving an accuracy of 72.7% on balanced data, with precision of 67.82%, recall of 65.25%, and an F1 score of 65.63% at a tree depth of 20. The study successfully developed a predictive model for drug ratings, demonstrating the effectiveness of integrating VADER’s sentiment analysis with XGBoost for accurate predictions. en_US
dc.identifier.citation Abataki, W. R. (2024). Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques; unpublished dissertation, Makerere University, Kampala en_US
dc.identifier.uri http://hdl.handle.net/20.500.12281/19586
dc.language.iso en en_US
dc.publisher Makerere University en_US
dc.subject Sentiment Analysis en_US
dc.subject Machine Learning en_US
dc.subject Drug Ratings en_US
dc.subject Lexicon-based en_US
dc.title Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques en_US
dc.type Thesis en_US
Files