A model for recommending related research papers: A natural language processing approach

Van Heerden, Juandre Anton

A model for recommending related research papers: A natural language processing approach

Authors: Juandre Anton Van Heerden
Publication date: 1 April 2022
Publisher: Faculty of Engineering the Built Environment and Technology

Abstract

The volume of information generated lately has led to information overload, which has impacted researchers’ decision-making capabilities. Researchers have access to a variety of digital libraries to retrieve information. Digital libraries often offer access to a number of journal articles and books. Al though digital libraries have search mechanisms it still takes much time to find related research papers. The main aim of this study was to develop a model that uses machine learning techniques to recommend related research papers. The conceptual model was informed by literature on recommender systems in other domains. Furthermore, a literature survey on machine learning techniques helped to identify candidate techniques that could be used. The model comprises four phases. These phases are completed twice, the first time for learning from the data and the second time when a recommen dation is sought. The four phases are: (1) identify and remove stopwords, (2) stemming the data, (3) identify the topics for the model, and (4) measuring similarity between documents. The model is implemented and demonstrated using a prototype to rec ommend research papers using a natural language processing approach. The prototype underwent three iterations. The first iteration focused on under standing the problem domain by exploring how recommender systems and related techniques work. The second iteration focused on pre-processing techniques, topic modeling and similarity measures of two probability dis tributions. The third iteration focused on refining the prototype, and docu menting the lessons learned throughout the process. Practical lessons were learned while finalising the model and constructing the prototype. These practical lessons should help to identify opportunities for future research.Thesis (MIT) -- Faculty of Engineering the Built Environment and Technology, Information Technology, 202