Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques

Abataki, Wani Richard

Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques

dc.contributor.author	Abataki, Wani Richard
dc.date.accessioned	2024-11-29T08:52:25Z
dc.date.available	2024-11-29T08:52:25Z
dc.date.issued	2024-08-30
dc.description	A dissertation report submitted to the School of Statistics and Planning in partial fulfillment of the requirements for the award of a Bachelor’s Degree in Statistics of Makerere University	en_US
dc.description.abstract	In today’s healthcare landscape, analyzing patient reviews is crucial for personalized medicine, yet the complexity of medical language poses challenges for traditional sentiment analysis tools. This study develops a machine learning model to accurately predict drug ratings from patient reviews by leveraging advanced sentiment analysis and robust techniques. Using a dataset from the UCI Machine Learning Repository, we preprocessed the reviews with extractive summarization and vectorized them using TF-IDF and LSA. Four lexicon-based sentiment analysis methods (VADER, Bing, AFINN, NRC) were employed to generate sentiment scores, evaluated for MSE and correlation with actual ratings. Random Forest and XGBoost Classifiers were trained and assessed on both unbalanced and balanced datasets, with performance metrics including accuracy, precision, recall, and F1 score. The results indicated that VADER achieved the highest accuracy, with an MSE of 52.22 and a correlation of 0.38 with actual ratings. The XGBoost Classifier outperformed the Random Forest model, achieving an accuracy of 72.7% on balanced data, with precision of 67.82%, recall of 65.25%, and an F1 score of 65.63% at a tree depth of 20. The study successfully developed a predictive model for drug ratings, demonstrating the effectiveness of integrating VADER’s sentiment analysis with XGBoost for accurate predictions.	en_US
dc.identifier.citation	Abataki, W. R. (2024). Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques; unpublished dissertation, Makerere University, Kampala	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.12281/19586
dc.language.iso	en	en_US
dc.publisher	Makerere University	en_US
dc.subject	Sentiment Analysis	en_US
dc.subject	Machine Learning	en_US
dc.subject	Drug Ratings	en_US
dc.subject	Lexicon-based	en_US
dc.title	Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques	en_US
dc.type	Thesis	en_US

Collections

School of Statistics and Planning (SSP) Collection

Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques

Files

Collections