A deep learning voice recommender system for farmers' questions.
Abstract
In today’s society services centered around voices are gaining popularity. Being able to provide the users with voices they like, to obtain and sustain their attention, is of importance
for enhancing the overall experience of the service.
In the field of Natural Language Processing great progress has been made using embeddings
from Deep Learning models to represent words in an unsupervised fashion.
As Uganda strives to improve its Food and Nutrition Security (FNS), the role of technology
in increasing the reach of agricultural knowledge should not be overlooked. In order to fulfill
this vision, systems that can transmit and store a variety of data must be developed, with
voice recommender systems serving as an example.
In this report, we propose a recommendation system that applies machine learning techniques developed for image classification to this sound recognition problem in order to
produce an appropriate response to a farmer’s question. The system is categorized into two
units: the first is a speech recognition block that can convert the digital nature of audio
files into mel-spectrogram representations that mimic the nature of the human ear. The
second block incorporates a VGG16 model to perform the question and recommendation
classification using generated embeddings from the spectrogram images.
The proposed solution to this problem was developed after collecting over 10,000 audio files
containing 2061 questions commonly asked by maize, cassava, and bean farmers. These
audio files were recorded by a group of Makerere University students aged 20 to 25. We
were also able to generate artificial data from these original files using techniques such as
noise addition, time stretching, frequency shifting, and others. This proved to improve the
system’s accuracy. We believe that with such a comprehensive approach, it could be used as
a baseline for question recognition and answer recommendation to these farmers’ questions