A deep learning Recommender system for Farmer's Questions in Audio
Abstract
As Uganda strives to improve its Food and Nutrition Security (FNS), the role of technology in increasing the reach of agricultural knowledge should not be overlooked. Inorder to fulfill this vision, systems that can transmit and store a variety of data must be developed, with voice recommender systems serving as an example. In this report, we propose a recommendation system that applies machine learning techniques developed for image classification to this sound recognition problem in order to pro-
duce an appropriate response to a farmer’s question. The system is categorised into two units: the first is a speech recognition block that can convert the digital nature of audio files into mel-spectrogram representations that mimic the nature of the human ear. The second block uses a VGG16 model to perform the question and recommendation classification using generated embeddings from the spectrogram images. The proposed solution to this problem was developed after collecting over 10,000 audio files containing 2061 questions commonly asked by maize, cassava, and bean farmers. These audio files were recorded by a group of Makerere University students aged 20 to 25. We were also able to generate artificial data from these original files
using techniques such as noise addition, time stretching and frequency shifting. As a result, the system’s classification accuracy increased from 25% to 39%. We believe that with such a comprehensive approach, it could be used as a baseline for question recognition and answer recommendation to these farmers’ questions. Such a recommender system would assist in conveying fundamental agricultural knowledge in a space accessible to the majority of Ugandan farmers.