233 research outputs found

    Identity and Granularity of Events in Text

    Full text link
    In this paper we describe a method to detect event descrip- tions in different news articles and to model the semantics of events and their components using RDF representations. We compare these descriptions to solve a cross-document event coreference task. Our com- ponent approach to event semantics defines identity and granularity of events at different levels. It performs close to state-of-the-art approaches on the cross-document event coreference task, while outperforming other works when assuming similar quality of event detection. We demonstrate how granularity and identity are interconnected and we discuss how se- mantic anomaly could be used to define differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201

    Proyecto NewsReader

    Get PDF
    The European project NewsReader develops advanced technology to process daily news streams in 4 languages, extracting what happened, when and where it happened and who was involved. NewsReader reads massive amounts of news coming from thousands of sources. It compares the results across sources to complement information and determine where the different sources disagree. Furthermore, it merges current news with previous news, creating a long-term history rather than separate events. The result is cumulated over time, producing an extremely large knowledge base that is visualized using new techniques to provide more comprehensive access.El proyecto europeo NewsReader desarrolla tecnología avanzada para procesar flujos continuos de noticias diarias en 4 idiomas, extrayendo lo que pasó, cuándo, dónde y quién estuvo involucrado. NewsReader lee grandes cantidades de noticias procedentes de miles de fuentes. Se comparan los resultados a través de las fuentes para complementar la información y determinar en qué están de acuerdo. Además, se fusionan noticias actuales con noticias previas, creando una historia a largo plazo en lugar de eventos separados. El resultado se acumula a lo largo del tiempo, produciendo una inmensa base de conocimiento que puede ser visualizada usando nuevas técnicas que permiten un acceso a la información más exhaustivo.This work has been supported by the EC within the 7th framework programme under grant agreement nr. FP7-IST-316040

    ReferenceNet: a semantic-pragmatic network for capturing reference relations

    Get PDF
    In this paper, we present ReferenceNet: a semantic-pragmatic network of reference relations between synsets. Synonyms are assumed to be exchangeable in similar contexts and also word embeddings are based on sharing of local contexts represented as vectors. Co-referring words, however, tend to occur in the same topical context but in different local contexts. In addition, they may express different concepts related through topical coherence, and through author framing and perspective. In this paper, we describe how reference relations can be added to WordNet and how they can be acquired. We evaluate two methods of extracting event coreference relations using WordNet relations against a manual annotation of 38 documents within the same topical domain of gun violence. We conclude that precision is reasonable but recall is lower because the Word-Net hierarchy does not sufficiently capture the required coherence and perspective relations

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Contrastive analysis of English and Spanish intonation using computer corpora - a preliminary study.

    Get PDF
    The thesis presents an account of the design, construction and analysis of a machine-readable corpus of transcribed spoken Spanish. The corpus was compiled from transcriptions of broadcast and conversational speech and was transcribed with prosodic marks by the researcher. Syllable boundaries were also marked. The design was aimed at compatibility with the Lancaster Spoken English Corpus, which already exists, and the primary objective of the research was to discover comparative information about differences between Spanish and English prosody. Analysis by computer showed differences between the two languages in terms of mean tone-unit lengths and in the frequency of occurrence of different tones. An experiment to investigate the degree to which trained phoneticians (including the researcher) agree in transcribing pitch movements by drawing "pitch curves" showed a reasonable degree of agreement as measured by calculating correlation coefficients, though agreement with computer-extracted fundamental frequency curves was less clear-cut. The thesis discusses the possibility of storing such fundamental frequency information along with the "manual" transcription in the corpus in future development of the work
    corecore