Application of unsupervised learning techniques with analysis of unlabeled structured data: a case study of the US Census 1990 Dataset

Nabuule, Kevin

dc.contributor.author	Nabuule, Kevin
dc.date.accessioned	2023-01-06T13:18:31Z
dc.date.available	2023-01-06T13:18:31Z
dc.date.issued	2022-11
dc.identifier.citation	Nabuule, K. (2022). Application of unsupervised learning techniques with analysis of unlabeled structured data: a case study of the US Census 1990 Dataset. Unpublished undergraduate dissertation. Makerere University, Kampala, Uganda	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.12281/13912
dc.description	A dissertation submitted to the School of Statistics and Planning in partial fulfillment of the requirements for the award of the degree of Bachelors of Statistics of Makerere University	en_US
dc.description.abstract	With the ability to gather massive amounts of data in a large number of domains, data is collected at an unprecedented rate and the analysis rather than the storage of this data becomes a challenge (Hastie et al., 2009). The vast amounts of data are both labeled data which is a designation for pieces of data that have been tagged with one or more labels identifying certain properties or characteristics, or classifications of objects; and unlabeled data which (Sydorenko, 2020) refers to as pieces of data that have not been tagged with labels identifying characteristics, properties or classifications. Unlabeled data includes photos, audio, videos, news articles, tweets, articles, x-rays (when working with medical data) among others and such data is at a high rate of accumulation due to the increased use of the internet. In the big data era, the need for fast robust machine learning techniques is rapidly increasing yet the exponential growth in today’s data sources exposed traditional machine learning (ML) techniques are susceptible to poor scalability, loss in robustness and redundancy (Nadine Hajj, Rizk Yara, & Mariette, 2015). Powerful algorithms capable of extracting hidden structures from large datasets are hence a necessity especially for unsupervised learning approach to machine learning. This research paper aims to show the application of the unsupervised learning techniques such as clustering to extract the existing patterns and relations in the US Census 1990 dataset which is unlabeled and yet structured. For purposes of this study, the k-modes clustering algorithm because of categorical data and hierarchical clustering alongside PCA are to be demonstrated.	en_US
dc.language.iso	en	en_US
dc.publisher	Makerere University	en_US
dc.subject	Unsupervised learning techniques	en_US
dc.subject	Unlabeled structured data	en_US
dc.subject	US Census	en_US
dc.subject	Dataset	en_US
dc.title	Application of unsupervised learning techniques with analysis of unlabeled structured data: a case study of the US Census 1990 Dataset	en_US
dc.type	Thesis	en_US

Files in this item

Name:: nabuule-cobams-bstat.pdf
Size:: 1.130Mb
Format:: PDF
Description:: Undergraduate dissertation

View/Open

This item appears in the following Collection(s)

School of Statistics and Planning (SSP) Collection

Show simple item record