1,345,467 research outputs found

    Information extraction

    Get PDF
    In this paper we present a new approach to extract relevant information by knowledge graphs from natural language text. We give a multiple level model based on knowledge graphs for describing template information, and investigate the concept of partial structural parsing. Moreover, we point out that expansion of concepts plays an important role in thinking, so we study the expansion of knowledge graphs to use context information for reasoning and merging of templates

    Bayesian Information Extraction Network

    Full text link
    Dynamic Bayesian networks (DBNs) offer an elegant way to integrate various aspects of language in one model. Many existing algorithms developed for learning and inference in DBNs are applicable to probabilistic language modeling. To demonstrate the potential of DBNs for natural language processing, we employ a DBN in an information extraction task. We show how to assemble wealth of emerging linguistic instruments for shallow parsing, syntactic and semantic tagging, morphological decomposition, named entity recognition etc. in order to incrementally build a robust information extraction system. Our method outperforms previously published results on an established benchmark domain.Comment: 6 page

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    Twitter Based Information Extraction

    Full text link
    In the modern world of social media dominance, the microblogs like Twitter and Facebook are probably the best source of up-to-date information. The amount of information available on these platforms is huge, although most of it is unstructured and redundant which makes our task of extracting information from it much more challenging. This automatic extraction of information from noisy sources has opened up new opportunities for querying and analyzing data. This paper is a review of the research that has been done on extracting information like event dates [1] and classification of information from social networking platforms like Twitter. We present a brief study of the work which shows that extracting useful information from Twitter and other social media platforms is indeed feasible. We provide brief study about the extraction techniques applied by the applications based on this subject like the extraction tasks and the input exploited for extraction, the types of methods of extraction used and the type of output produced

    Handling uncertainty in information extraction

    Get PDF
    This position paper proposes an interactive approach for developing information extractors based on the ontology definition process with knowledge about possible (in)correctness of annotations. We discuss the problem of managing and manipulating probabilistic dependencies

    Information Extraction in Illicit Domains

    Full text link
    Extracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have `long tails' and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed for such domains. Our approach uses raw, unlabeled text from an initial corpus, and a few (12-120) seed annotations per domain-specific attribute, to learn robust IE models for unobserved pages and websites. Empirically, we demonstrate that our approach can outperform feature-centric Conditional Random Field baselines by over 18\% F-Measure on five annotated sets of real-world human trafficking datasets in both low-supervision and high-supervision settings. We also show that our approach is demonstrably robust to concept drift, and can be efficiently bootstrapped even in a serial computing environment.Comment: 10 pages, ACM WWW 201
    • …
    corecore