114,241 research outputs found

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    TEXT MINING – PREREQUISITE FOR KNOWLEDGE MANAGEMENT SYSTEMS

    Get PDF
    Text mining is an interdisciplinary field with the main purpose of retrieving new knowledge from large collections of text documents. This paper presents the main techniques used for knowledge extraction through text mining and their main areas of applicability and emphasizes the importance of text mining in knowledge management systems.text mining, knowledge systems, information retrieval

    Research on Text Classification Based on Automatically Extracted Keywords

    Get PDF
    Automatic keywords extraction and classification tasks are important research directions in the domains of NLP (natural language processing), information retrieval, and text mining. As the fine granularity abstracted from text data, keywords are also the most important feature of text data, which has great practical and potential value in document classification, topic modeling, information retrieval, and other aspects. The compact representation of documents can be achieved through keywords, which contains massive significant information. Therefore, it may be quite advantageous to realize text classification with high-dimensional feature space. For this reason, this study designed a supervised keyword classification method based on TextRank keyword automatic extraction technology and optimize the model with the genetic algorithm to contribute to modeling the keywords of the topic for text classification

    Statistical keyword detection in literary corpora

    Full text link
    Understanding the complexity of human language requires an appropriate analysis of the statistical distribution of words in texts. We consider the information retrieval problem of detecting and ranking the relevant words of a text by means of statistical information referring to the "spatial" use of the words. Shannon's entropy of information is used as a tool for automatic keyword extraction. By using The Origin of Species by Charles Darwin as a representative text sample, we show the performance of our detector and compare it with another proposals in the literature. The random shuffled text receives special attention as a tool for calibrating the ranking indices.Comment: Published version. 11 pages, 7 figures. SVJour for LaTeX2
    • …
    corecore