Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques
Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques
| dc.contributor.author | Abataki, Wani Richard | |
| dc.date.accessioned | 2024-11-29T08:52:25Z | |
| dc.date.available | 2024-11-29T08:52:25Z | |
| dc.date.issued | 2024-08-30 | |
| dc.description | A dissertation report submitted to the School of Statistics and Planning in partial fulfillment of the requirements for the award of a Bachelor’s Degree in Statistics of Makerere University | en_US |
| dc.description.abstract | In today’s healthcare landscape, analyzing patient reviews is crucial for personalized medicine, yet the complexity of medical language poses challenges for traditional sentiment analysis tools. This study develops a machine learning model to accurately predict drug ratings from patient reviews by leveraging advanced sentiment analysis and robust techniques. Using a dataset from the UCI Machine Learning Repository, we preprocessed the reviews with extractive summarization and vectorized them using TF-IDF and LSA. Four lexicon-based sentiment analysis methods (VADER, Bing, AFINN, NRC) were employed to generate sentiment scores, evaluated for MSE and correlation with actual ratings. Random Forest and XGBoost Classifiers were trained and assessed on both unbalanced and balanced datasets, with performance metrics including accuracy, precision, recall, and F1 score. The results indicated that VADER achieved the highest accuracy, with an MSE of 52.22 and a correlation of 0.38 with actual ratings. The XGBoost Classifier outperformed the Random Forest model, achieving an accuracy of 72.7% on balanced data, with precision of 67.82%, recall of 65.25%, and an F1 score of 65.63% at a tree depth of 20. The study successfully developed a predictive model for drug ratings, demonstrating the effectiveness of integrating VADER’s sentiment analysis with XGBoost for accurate predictions. | en_US |
| dc.identifier.citation | Abataki, W. R. (2024). Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques; unpublished dissertation, Makerere University, Kampala | en_US |
| dc.identifier.uri | http://hdl.handle.net/20.500.12281/19586 | |
| dc.language.iso | en | en_US |
| dc.publisher | Makerere University | en_US |
| dc.subject | Sentiment Analysis | en_US |
| dc.subject | Machine Learning | en_US |
| dc.subject | Drug Ratings | en_US |
| dc.subject | Lexicon-based | en_US |
| dc.title | Predicting drug ratings using lexicon-based sentiment analysis and machine learning techniques | en_US |
| dc.type | Thesis | en_US |