• Login
    View Item 
    •   Mak UD Home
    • College of Business and Management Sciences (CoBAMS)
    • School of Statistics and Planning (SSP)
    • School of Statistics and Planning (SSP) Collection
    • View Item
    •   Mak UD Home
    • College of Business and Management Sciences (CoBAMS)
    • School of Statistics and Planning (SSP)
    • School of Statistics and Planning (SSP) Collection
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Application of unsupervised learning techniques with analysis of unlabeled structured data: a case study of the US Census 1990 Dataset

    Thumbnail
    View/Open
    Undergraduate dissertation (1.130Mb)
    Date
    2022-11
    Author
    Nabuule, Kevin
    Metadata
    Show full item record
    Abstract
    With the ability to gather massive amounts of data in a large number of domains, data is collected at an unprecedented rate and the analysis rather than the storage of this data becomes a challenge (Hastie et al., 2009). The vast amounts of data are both labeled data which is a designation for pieces of data that have been tagged with one or more labels identifying certain properties or characteristics, or classifications of objects; and unlabeled data which (Sydorenko, 2020) refers to as pieces of data that have not been tagged with labels identifying characteristics, properties or classifications. Unlabeled data includes photos, audio, videos, news articles, tweets, articles, x-rays (when working with medical data) among others and such data is at a high rate of accumulation due to the increased use of the internet. In the big data era, the need for fast robust machine learning techniques is rapidly increasing yet the exponential growth in today’s data sources exposed traditional machine learning (ML) techniques are susceptible to poor scalability, loss in robustness and redundancy (Nadine Hajj, Rizk Yara, & Mariette, 2015). Powerful algorithms capable of extracting hidden structures from large datasets are hence a necessity especially for unsupervised learning approach to machine learning. This research paper aims to show the application of the unsupervised learning techniques such as clustering to extract the existing patterns and relations in the US Census 1990 dataset which is unlabeled and yet structured. For purposes of this study, the k-modes clustering algorithm because of categorical data and hierarchical clustering alongside PCA are to be demonstrated.
    URI
    http://hdl.handle.net/20.500.12281/13912
    Collections
    • School of Statistics and Planning (SSP) Collection

    DSpace 5.8 copyright © Makerere University 
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of Mak UDCommunities & CollectionsTitlesAuthorsBy AdvisorBy Issue DateSubjectsBy TypeThis CollectionTitlesAuthorsBy AdvisorBy Issue DateSubjectsBy Type

    My Account

    LoginRegister

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    DSpace 5.8 copyright © Makerere University 
    Contact Us | Send Feedback
    Theme by 
    Atmire NV