2 research outputs found

    Towards the Optimal Use of Machine Learning Algorithms in Text Mining: A Quick Review

    Get PDF
    This paper aims to provide a quick review to jump-start the research in the field of text mining where Machine Learning (ML) algorithms have been used and several accomplishments have been reported by the research community. There are different categories of text mining, and the implementation of ML algorithms and techniques have been supported in the literature to give promising results. However, in this area of study, most of the research activities in terms of time and efforts are consumed during the initial stages where implementations and experiments are carried out to evaluate various combinations. The accomplishments in this field can be further advanced by presenting early investigations concisely and analytically. Thus, the benefits of this paper are threefold: first, it will provide a platform for the new researchers to start quickly with a shorter literature review and knowing more precisely about the combinations of text mining and ML; secondly, clear analysis has been presented about the text mining categories where the performance of ML algorithms have been reported successful; and lastly, the problems have been identified for which the algorithms were used in various studies. This will enable the new researchers to directly target the problem instead of implementing the existing techniques. With the help of well-structured questions, the results are more analytical and present multidimensional views to this research issue. Main findings include that ML has been widely used in document classification and Support Vector Machine (SVM) is the most successful algorithm reported

    Comparison of Deep Learning based Concept Representations for Biomedical Document Clustering

    Get PDF
    In this research, document representations based on distributed representations of the concepts along with new weighting schemes for the documents are explored. The baseline weighting scheme is the traditional Term Frequency-Inverse Document Frequency (TF-IDF) of the concepts, whereas, the other two newly proposed ones consider both local content using the TF-IDF and associations between concepts. The distributed representations of the concepts are measured using a deep learning algorithm. The evaluation of the proposed document representations is based on the k-means clustering results. The results show that document representation based on TF-IDF in combination with the term based distributed representations for concepts outperforms the other two based on the returned evaluation metrics - F1-measure (80.21%) and Purity (77.1%)
    corecore