112,819 research outputs found

    Research on Text Classification Based on Automatically Extracted Keywords

    Get PDF
    Automatic keywords extraction and classification tasks are important research directions in the domains of NLP (natural language processing), information retrieval, and text mining. As the fine granularity abstracted from text data, keywords are also the most important feature of text data, which has great practical and potential value in document classification, topic modeling, information retrieval, and other aspects. The compact representation of documents can be achieved through keywords, which contains massive significant information. Therefore, it may be quite advantageous to realize text classification with high-dimensional feature space. For this reason, this study designed a supervised keyword classification method based on TextRank keyword automatic extraction technology and optimize the model with the genetic algorithm to contribute to modeling the keywords of the topic for text classification

    Portable extraction of partially structured facts from the web

    Get PDF
    A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two very different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, this partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for the utility of the extracted facts in enhancing image captions

    Improving keyword extraction in multilingual texts

    Get PDF
    The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively

    Text Mining Technique for Driving Potentially Valuable Information from Text

    Get PDF
    With the growing number of digitized documents and having large text databases, text mining will become increasingly important. Text mining can be a huge benefit for finding relevant and desired text data from unstructured data sources. Text Mining is the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. It is an important step of Knowledge Discovery process. The aim of the paper is to study the concept of Text Mining and various techniques with a particular focus on text mining process. In the text mining community have been trying to apply many methods such as rule-based, knowledge based, statistical and machine-learning-based approaches. Finally, the paper discusses issues towards the techniques for driving potentially valuable information from text and also, discuss on integration data mining. The paper ends with conclusion and the future line of works in the combining text mining and data mining techniques into a single system, a combination known as duo-mining, and also be more effective text mining techniques for contextual extraction. Keywords: Data mining, Information Extraction, Information Retrieval, Text Mining DOI: 10.7176/IKM/10-1-01 Publication date: January 31st 202