1,778,873 research outputs found
Bayesian Information Extraction Network
Dynamic Bayesian networks (DBNs) offer an elegant way to integrate various
aspects of language in one model. Many existing algorithms developed for
learning and inference in DBNs are applicable to probabilistic language
modeling. To demonstrate the potential of DBNs for natural language processing,
we employ a DBN in an information extraction task. We show how to assemble
wealth of emerging linguistic instruments for shallow parsing, syntactic and
semantic tagging, morphological decomposition, named entity recognition etc. in
order to incrementally build a robust information extraction system. Our method
outperforms previously published results on an established benchmark domain.Comment: 6 page
Information extraction
In this paper we present a new approach to extract relevant information by knowledge graphs from natural language text. We give a multiple level model based on knowledge graphs for describing template information, and investigate the concept of partial structural parsing. Moreover, we point out that expansion of concepts plays an important role in thinking, so we study the expansion of knowledge graphs to use context information for reasoning and merging of templates
Information Extraction in Illicit Domains
Extracting useful entities and attribute values from illicit domains such as
human trafficking is a challenging problem with the potential for widespread
social impact. Such domains employ atypical language models, have `long tails'
and suffer from the problem of concept drift. In this paper, we propose a
lightweight, feature-agnostic Information Extraction (IE) paradigm specifically
designed for such domains. Our approach uses raw, unlabeled text from an
initial corpus, and a few (12-120) seed annotations per domain-specific
attribute, to learn robust IE models for unobserved pages and websites.
Empirically, we demonstrate that our approach can outperform feature-centric
Conditional Random Field baselines by over 18\% F-Measure on five annotated
sets of real-world human trafficking datasets in both low-supervision and
high-supervision settings. We also show that our approach is demonstrably
robust to concept drift, and can be efficiently bootstrapped even in a serial
computing environment.Comment: 10 pages, ACM WWW 201
Ontology-based Information Extraction with SOBA
In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities
Twitter Based Information Extraction
In the modern world of social media dominance, the microblogs like Twitter and Facebook are probably the best source of up-to-date information. The amount of information available on these platforms is huge, although most of it is unstructured and redundant which makes our task of extracting information from it much more challenging. This automatic extraction of information from noisy sources has opened up new opportunities for querying and analyzing data.
This paper is a review of the research that has been done on extracting information like event dates [1] and classification of information from social networking platforms like Twitter. We present a brief study of the work which shows that extracting useful information from Twitter and other social media platforms is indeed feasible. We provide brief study about the extraction techniques applied by the applications based on this subject like the extraction tasks and the input exploited for extraction, the types of methods of extraction used and the type of output produced
An Analysis of Structured Data on the Web
In this paper, we analyze the nature and distribution of structured data on
the Web. Web-scale information extraction, or the problem of creating
structured tables using extraction from the entire web, is gathering lots of
research interest. We perform a study to understand and quantify the value of
Web-scale extraction, and how structured information is distributed amongst top
aggregator websites and tail sites for various interesting domains. We believe
this is the first study of its kind, and gives us new insights for information
extraction over the Web.Comment: VLDB201
- …
