Supervised learning for weather prediction model using global meteorological data
Supervised learning for weather prediction model using global meteorological data
Date
2025
Authors
Kabila, Francis Edrias
Journal Title
Journal ISSN
Volume Title
Publisher
Makerere University
Abstract
Accurate weather prediction plays a crucial role in the stability and safety of diverse sectors
such as agriculture, transportation, energy management, and disaster response. NWP
models, which use complex physical equations to simulate atmospheric processes, have
long been the foundation of weather forecasting. Despite their successes, these models
require extensive computational resources and depend heavily on dense, high-quality
observational data, often lacking in many regions, especially across Africa. This research
explores the potential of supervised machine learning as an alternative, data-driven
approach to address these limitations by efficiently modeling complex, nonlinear
relationships in global meteorological data.
The study focuses on developing a supervised learning model capable of classifying
weather into 15 distinct categories using an extensive dataset comprising 8,141
observations from 24 African countries. The key objectives included optimizing the
predictive model, identifying the meteorological variables most influential to classification
accuracy, rigorously assessing model performance through metrics like accuracy and the
F1-score, validating the model’s robustness across diverse African climates, and generating
practical insights for forecasting and climate research.
The study employed data preprocessing techniques such as feature engineering—
combining wind speed and direction into a singular wind vector—and stratified sampling
to mitigate class imbalance in the dataset. Z-score normalization standardized predictor
variables including temperature, humidity, wind components, pressure, cloud cover, and
“feels like” temperature. Six supervised machine learning algorithms were implemented
and compared: Logistic Regression, SVM, Decision Trees, KNN, Random Forest, and
GBM. The models were evaluated based on training accuracy, validation accuracy, and the
F1-score, particularly emphasizing the latter due to the substantial class imbalance
dominated by categories like "Partly cloudy" and "Sunny." Results revealed the Random Forest algorithm achieved the highest overall accuracy at
91.4%, though its perfect training accuracy of 1.0 indicated overfitting. The GBM model
proved the most effective, balancing accuracy (90.8%) with superior generalization and
achieving the highest F1-score of 0.4645. This metric confirmed GBM’s strength in
accurately predicting minority weather classes representing critical conditions such as
"Heavy rain" and "Thundery outbreaks." Feature importance analysis highlighted cloud
cover, humidity, visibility, and air quality as the strongest predictors, reinforcing the
soundness of the model’s learning.
In conclusion, this research validates that supervised machine learning, and specifically the
Gradient Boosting Machine, offers a reliable, efficient, and scalable approach to weather
classification. It presents a compelling complement or alternative to traditional NWP
models, especially in data-scarce regions. The developed model’s deployment and
performance suggest significant potential for enhancing weather prediction capabilities,
advancing operational decision-making, and improving climate resilience across Africa
and similar contexts worldwide. This work contributes to the growing body of evidence
supporting data-driven meteorological forecasting methods, paving the way for future
innovations in the field.
Description
A dissertation submitted to the School of Statistics and Planning in partial fulfilment for the award of degree of Bachelor of Statistics of Makerere University
Keywords
Machine learning,
Supervised learning,
Weather prediction,
Weather prediction model,
Global meteorological data
Citation
Kabila, F. E. (2025). Supervised learning for weather prediction model using global meteorological data. Unpublished Undergraduate dissertation, Makerere University, Kampala