Show simple item record

dc.contributor.authorNabuule, Kevin
dc.date.accessioned2023-01-06T13:18:31Z
dc.date.available2023-01-06T13:18:31Z
dc.date.issued2022-11
dc.identifier.citationNabuule, K. (2022). Application of unsupervised learning techniques with analysis of unlabeled structured data: a case study of the US Census 1990 Dataset. Unpublished undergraduate dissertation. Makerere University, Kampala, Ugandaen_US
dc.identifier.urihttp://hdl.handle.net/20.500.12281/13912
dc.descriptionA dissertation submitted to the School of Statistics and Planning in partial fulfillment of the requirements for the award of the degree of Bachelors of Statistics of Makerere Universityen_US
dc.description.abstractWith the ability to gather massive amounts of data in a large number of domains, data is collected at an unprecedented rate and the analysis rather than the storage of this data becomes a challenge (Hastie et al., 2009). The vast amounts of data are both labeled data which is a designation for pieces of data that have been tagged with one or more labels identifying certain properties or characteristics, or classifications of objects; and unlabeled data which (Sydorenko, 2020) refers to as pieces of data that have not been tagged with labels identifying characteristics, properties or classifications. Unlabeled data includes photos, audio, videos, news articles, tweets, articles, x-rays (when working with medical data) among others and such data is at a high rate of accumulation due to the increased use of the internet. In the big data era, the need for fast robust machine learning techniques is rapidly increasing yet the exponential growth in today’s data sources exposed traditional machine learning (ML) techniques are susceptible to poor scalability, loss in robustness and redundancy (Nadine Hajj, Rizk Yara, & Mariette, 2015). Powerful algorithms capable of extracting hidden structures from large datasets are hence a necessity especially for unsupervised learning approach to machine learning. This research paper aims to show the application of the unsupervised learning techniques such as clustering to extract the existing patterns and relations in the US Census 1990 dataset which is unlabeled and yet structured. For purposes of this study, the k-modes clustering algorithm because of categorical data and hierarchical clustering alongside PCA are to be demonstrated.en_US
dc.language.isoenen_US
dc.publisherMakerere Universityen_US
dc.subjectUnsupervised learning techniquesen_US
dc.subjectUnlabeled structured dataen_US
dc.subjectUS Censusen_US
dc.subjectDataseten_US
dc.titleApplication of unsupervised learning techniques with analysis of unlabeled structured data: a case study of the US Census 1990 Dataseten_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record