Credit card fraud detection using machine learning with deployment

Mwesigwa, Brian

Credit card fraud detection using machine learning with deployment

Date

2025

Authors

Mwesigwa, Brian

Publisher

Makerere University

Abstract

The proliferation of digital payment systems has led to a significant increase in credit card fraud, which poses major challenges to both financial institutions and consumers. Traditional rule-based fraud detection systems are often ineffective at identifying sophisticated fraudulent activities, making it necessary to use more advanced techniques like machine learning (ML). A major hurdle in this area is the issue of imbalanced datasets, where fraudulent transactions make up a tiny fraction of the total. This study's main objective was to develop, evaluate, and deploy a credit card fraud detection model using five machine learning algorithms on the 2013 European credit card transaction dataset from Kaggle. The study utilized a quantitative, experimental research design that involved data preprocessing, model training, and evaluation. The methodology included exploring and preprocessing the dataset, which involved data cleaning, feature scaling using StandardScaler, and handling class imbalance with four resampling techniques: SMOTE, random under sampling, random oversampling, and a combination of both. The five machine learning models; Logistic Regression, Decision Tree, Random Forest, XGBoost, and K-Nearest Neighbors (KNN) were then trained and tested on the data, with performance evaluated using the Area Under the ROC Curve (AUC-ROC) and Confusion Matrix. The best performing model was then selected for deployment on an R Shiny web dashboard prototype. The results show that the combination of the XGBoost algorithm and the SMOTE resampling technique achieved the highest performance, with an AUC of 98.72%. This significantly outperformed all other model-sampling combinations tested. The findings confirm that addressing class imbalance is crucial for developing effective fraud detection models, as the performance of all tested algorithms improved significantly after applying resampling techniques. Furthermore, The study concluded that the optimal resampling strategy is highly dependent on the chosen algorithm. For example, Logistic Regression and Random Forests performed best with under sampling, while Decision Trees performed best with oversampling. The study recommends that financial institutions adopt advanced models such as XGBoost and integrate sophisticated resampling techniques, such as SMOTE, into their fraud detection pipelines. The best-performing model should be deployed on an interactive platform to support real-time monitoring and decision-making. For future research, it is recommended to focus on acquiring and analyzing local datasets from Low- and Middle-Income Countries (LMICs) like Uganda and to conduct real-world deployment studies.

Description

A dissertation submitted to the School of Statistics and Planning in partial fulfillment of the requirements for the award of the degree of Bachelor of Statistics of Makerere University

Keywords

Credit card, Fraud detection, Machine learning, machine learning deployment

Citation

Mwesigwa, B. (2025). Credit card fraud detection using machine learning with deployment. Unpublished undergraduate dissertation. Makerere University. Kampala.