2 research outputs found

    The advances of stemming algorithms in text analysis from 2013 to 2018

    Get PDF
    Stemming is an activity within the pre-processing step of Text Analysis. It plays a role in the Text Analysis results. It drives Data Mining in fields such as Business Information Systems. Eight percent of existing organisational data that contributes Big Data is in an unstructured format. One of the focus areas within the concept of “Big Data” is the complexity of processing the data and being able to represent the results in such a way that they are easily understood. This challenge has been taken up by researchers over time. To determine the advances in Stemming Algorithm research, a systematic review was performed on articles on Stemming Algorithms published in journals from 2013 to 2018. Data was collected from accessible scholarly databases. The articles were then filtered by year and topic. The remaining articles were processed through a set of methodological quality criteria. The final articles were put through a bi-gram Text Analysis process to answer the research questions. The results concluded that the research focus for Stemming Algorithms has started to decrease as it reaches the plateau of productivity. The results show an evident drop in the collected articles from 58 in 2017 to 19 in 2018. Results show that information retrieval is still a common field of application for Stemming Algorithms. A major unexpected set of themes revolves around artificial intelligence, based on an increase in interest in this topic. Results show that a focus on Stemming Algorithms has shifted away from its development and moved towards its application. There is also a high interest in social media as an application of Stemming Algorithms. Future research suggestions include designing a Stemming Algorithm that would automatically and responsively adapt to the historical and morphological changes of language text.Dissertation (MCom)--University of Pretoria, 2019.TM2019InformaticsMComUnrestricte

    Clustering Documents using the 3-Gram Graph Representation Model

    No full text
    corecore