26,639 research outputs found

    A hybrid architecture for robust parsing of german

    Get PDF
    This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation

    Latent Syntactic Structure-Based Sentiment Analysis

    Get PDF
    People share their opinions about things like products, movies and services using social media channels. The analysis of these textual contents for sentiments is a gold mine for marketing experts, thus automatic sentiment analysis is a popular area of applied artificial intelligence. We propose a latent syntactic structure-based approach for sentiment analysis which requires only sentence-level polarity labels for training. Our experiments on three domains (movie, IT products, restaurant) show that a sentiment analyzer that exploits syntactic parses and has access only to sentence-level polarity annotation for in-domain sentences can outperform state-of-the-art models that were trained on out-domain parse trees with sentiment annotation for each node of the trees. In practice, millions of sentence-level polarity annotations are usually available for a particular domain thus our approach is applicable for training a sentiment analyzer for a new domain while it can exploit the syntactic structure of sentences as well

    RDF/S)XML Linguistic Annotation of Semantic Web Pages

    Full text link
    Although with the Semantic Web initiative much research on web pages semantic annotation has already done by AI researchers, linguistic text annotation, including the semantic one, was originally developed in Corpus Linguistics and its results have been somehow neglected by AI. ..

    A Data-Oriented Approach to Semantic Interpretation

    Full text link
    In Data-Oriented Parsing (DOP), an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new input sentence is constructed by combining sub-analyses from the corpus in the most probable way. This approach has been succesfully used for syntactic analysis, using corpora with syntactic annotations such as the Penn Treebank. If a corpus with semantically annotated sentences is used, the same approach can also generate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method, and summarizes the results of a preliminary experiment. Semantic annotations were added to the syntactic annotations of most of the sentences of the ATIS corpus. A data-oriented semantic interpretation algorithm was succesfully tested on this semantically enriched corpus.Comment: 10 pages, Postscript; to appear in Proceedings Workshop on Corpus-Oriented Semantic Analysis, ECAI-96, Budapes

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Annotation of SBML Models Through Rule-Based Semantic Integration

    Get PDF
    *Motivation:* The creation of accurate quantitative Systems Biology Markup Language (SBML) models is a time-intensive, manual process often complicated by the many data sources and formats required to annotate even a small and well-scoped model. Ideally, the retrieval and integration of biological knowledge for model annotation should be performed quickly, precisely, and with a minimum of manual effort. Here, we present a method using off-the-shelf semantic web technology which enables this process: the heterogeneous data sources are first syntactically converted into ontologies; these are then aligned to a small domain ontology by applying a rule base. Integrating resources in this way can accommodate multiple formats with different semantics; it provides richly modelled biological knowledge suitable for annotation of SBML models.
*Results:* We demonstrate proof-of-principle for this rule-based mediation with two use cases for SBML model annotation. This was implemented with existing tools, decreasing development time and increasing reusability. This initial work establishes the feasibility of this approach as part of an automated SBML model annotation system.
*Availability:* Detailed information including download and mapping of the ontologies as well as integration results is available from "http://www.cisban.ac.uk/RBM":http://www.cisban.ac.uk/RB

    An Integrated Framework for Treebanks and Multilayer Annotations

    Full text link
    Treebank formats and associated software tools are proliferating rapidly, with little consideration for interoperability. We survey a wide variety of treebank structures and operations, and show how they can be mapped onto the annotation graph model, and leading to an integrated framework encompassing tree and non-tree annotations alike. This development opens up new possibilities for managing and exploiting multilayer annotations.Comment: 8 page
    • …
    corecore