52,766 research outputs found

    Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art

    Get PDF
    Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover

    Guidance on the key skills units : communication, application of number and information technology

    Get PDF

    Content-Based Book Recommending Using Learning for Text Categorization

    Full text link
    Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use social filtering methods that base recommendations on other users' preferences. By contrast, content-based methods use information about an item itself to make suggestions. This approach has the advantage of being able to recommended previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extraction and a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations.Comment: 8 pages, 3 figures, Submission to Fourth ACM Conference on Digital Librarie

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    Learning to extract relations for protein annotation

    Get PDF
    Motivation: Protein annotation is a task that describes protein X in terms of topic Y. Usually, this is constructed using information from the biomedical literature. Until now, most of literature-based protein annotation work has been done manually by human annotators. However, as the number of biomedical papers grows ever more rapidly, manual annotation becomes more difficult, and there is increasing need to automate the process. Recently, information extraction (IE) has been used to address this problem. Typically, IE requires pre-defined relations and hand-crafted IE rules or annotated corpora, and these requirements are difficult to satisfy in real-world scenarios such as in the biomedical domain. In this article, we describe an IE system that requires only sentences labelled according to their relevance or not to a given topic by domain experts. Results: We applied our system to meet the annotation needs of a well-known protein family database; the results show that our IE system can annotate proteins with a set of extracted relations by learning relations and IE rules for disease, function and structure from only relevant and irrelevant sentences. Contact: [email protected]

    CRYSTAL: Inducing a Conceptual Dictionary

    Full text link
    One of the central knowledge sources of an information extraction system is a dictionary of linguistic patterns that can be used to identify the conceptual content of a text. This paper describes CRYSTAL, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus. Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances. Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules.Comment: 6 pages, Postscript, IJCAI-95 http://ciir.cs.umass.edu/info/psfiles/tepubs/tepubs.htm

    Enhanced services for targeted information retrieval by event extraction and data mining

    Get PDF
    Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents according to a query, users of large document collections would rather like to receive answers. Question-answering from text has already been the goal of the Message Understanding Conferences. Since then, the task of text understanding has been reduced to several more tractable tasks, most prominently Named Entity Recognition (NER) and Relation Extraction. Now, pieces can be put together to form enhanced services added on an IR system. In this paper, we present a framework which combines standard IR with machine learning and (pre-)processing for NER in order to extract events from a large document collection. Some questions can already be answered by particular events. Other questions require an analysis of a set of events. Hence, the extracted events become input to another machine learning process which delivers the final output to the user's question. Our case study is the public collection of minutes of plenary sessions of the German parliament and of petitions to the German parliament. --
    • 

    corecore