52,766 research outputs found
Information Extraction, Data Integration, and Uncertain Data Management: The State of The Art
Information Extraction, data Integration, and uncertain data management are different areas of research that got vast focus in the last two decades. Many researches tackled those areas of research individually. However, information extraction systems should have integrated with data integration methods to make use of the extracted information. Handling uncertainty in extraction and integration process is an important issue to enhance the quality of the data in such integrated systems. This article presents the state of the art of the mentioned areas of research and shows the common grounds and how to integrate information extraction and data integration under uncertainty management cover
Content-Based Book Recommending Using Learning for Text Categorization
Recommender systems improve access to relevant products and information by
making personalized suggestions based on previous examples of a user's likes
and dislikes. Most existing recommender systems use social filtering methods
that base recommendations on other users' preferences. By contrast,
content-based methods use information about an item itself to make suggestions.
This approach has the advantage of being able to recommended previously unrated
items to users with unique interests and to provide explanations for its
recommendations. We describe a content-based book recommending system that
utilizes information extraction and a machine-learning algorithm for text
categorization. Initial experimental results demonstrate that this approach can
produce accurate recommendations.Comment: 8 pages, 3 figures, Submission to Fourth ACM Conference on Digital
Librarie
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Learning to extract relations for protein annotation
Motivation: Protein annotation is a task that describes protein X in terms of topic Y. Usually, this is constructed using information from the biomedical literature. Until now, most of literature-based protein annotation work has been done manually by human annotators. However, as the number of biomedical papers grows ever more rapidly, manual annotation becomes more difficult, and there is increasing need to automate the process. Recently, information extraction (IE) has been used to address this problem. Typically, IE requires pre-defined relations and hand-crafted IE rules or annotated corpora, and these requirements are difficult to satisfy in real-world scenarios such as in the biomedical domain. In this article, we describe an IE system that requires only sentences labelled according to their relevance or not to a given topic by domain experts. Results: We applied our system to meet the annotation needs of a well-known protein family database; the results show that our IE system can annotate proteins with a set of extracted relations by learning relations and IE rules for disease, function and structure from only relevant and irrelevant sentences. Contact: [email protected]
CRYSTAL: Inducing a Conceptual Dictionary
One of the central knowledge sources of an information extraction system is a
dictionary of linguistic patterns that can be used to identify the conceptual
content of a text. This paper describes CRYSTAL, a system which automatically
induces a dictionary of "concept-node definitions" sufficient to identify
relevant information from a training corpus. Each of these concept-node
definitions is generalized as far as possible without producing errors, so that
a minimum number of dictionary entries cover the positive training instances.
Because it tests the accuracy of each proposed definition, CRYSTAL can often
surpass human intuitions in creating reliable extraction rules.Comment: 6 pages, Postscript, IJCAI-95
http://ciir.cs.umass.edu/info/psfiles/tepubs/tepubs.htm
Enhanced services for targeted information retrieval by event extraction and data mining
Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents according to a query, users of large document collections would rather like to receive answers. Question-answering from text has already been the goal of the Message Understanding Conferences. Since then, the task of text understanding has been reduced to several more tractable tasks, most prominently Named Entity Recognition (NER) and Relation Extraction. Now, pieces can be put together to form enhanced services added on an IR system. In this paper, we present a framework which combines standard IR with machine learning and (pre-)processing for NER in order to extract events from a large document collection. Some questions can already be answered by particular events. Other questions require an analysis of a set of events. Hence, the extracted events become input to another machine learning process which delivers the final output to the user's question. Our case study is the public collection of minutes of plenary sessions of the German parliament and of petitions to the German parliament. --
- âŠ