110 research outputs found

    Transfer learning for historical corpora: An assessment on post-OCR correction and named entity recognition

    Get PDF
    Transfer learning in Natural Language Processing, mainly in the form of pre-trained language models, has recently delivered substantial gains across a range of tasks. Scholars and practitioners working with OCRed historical corpora are thus increasingly exploring the use of pre-trained language models. Nevertheless, the specific challenges posed by historical documents, including OCR quality and linguistic change, call for a critical assessment of the use of pre-trained language models in this setting. We consider two shared tasks, ICDAR2019 (post-OCR correction) and CLEF-HIPE-2020 (Named Entity Recognition, NER), and systematically assess using pre-trained language models with data in French, German and English. We find that using pre-trained language models helps with NER but less so with post-OCR correction. Pre-trained language models should therefore be used critically when working with OCRed historical corpora. We release our code base, in order to allow replicating our results and testing other pre-trained representations

    Multimedia ontology matching by using visual and textual modalities

    Get PDF
    International audienceOntologies have been intensively applied for improving multimedia search and retrieval by providing explicit meaning to visual content. Several multimedia ontologies have been recently proposed as knowledge models suitable for narrowing the well known semantic gap and for enabling the semantic interpretation of images. Since these ontologies have been created in different application contexts, establishing links between them, a task known as ontology matching, promises to fully unlock their potential in support of multimedia search and retrieval. This paper proposes and compares empirically two extensional ontology matching techniques applied to an important semantic image retrieval issue: automatically associating common-sense knowledge to multimedia concepts. First, we extend a previously introduced textual concept matching approach to use both textual and visual representation of images. In addition, a novel matching technique based on a multi-modal graph is proposed. We argue that the textual and visual modalities have to be seen as complementary rather than as exclusive sources of extensional information in order to improve the efficiency of the application of an ontology matching approach in the multimedia domain. An experimental evaluation is included in the paper

    Une méthode de detection et modélisation d'événements des messages sur Twitter

    Get PDF
    IRSTEA PUB00045753International audienceThis paper introduces TEWS —Twitter Events on the Semantic Web, pronounced like " news " —a semantic web tool for detection and representation of events taking as an input the social stream Twitter. The tool assists the user throughout a complete processing chain, starting from the detection of events on Twitter, their modeling and representation following the semantic web principles, to their storing in an RDF knowledge base that can be further published on the Web of Data

    Extended Tversky Similarity for Resolving Terminological Heterogeneities across Ontologies

    Get PDF
    International audienceWe propose a novel method to compute similarity between cross-ontology concepts based on the amount of overlap of the information content of their labels. We extend Tversky's similarity measure by using the information content of each term within an ontology label both for the similarity computation and for the weight assignment to tokens. The approach is suitable for handling compound labels. Our experiments showed that it outperforms existing terminological similarity measures for the ontology matching task

    DOREMUS : un graphe d’œuvres musicales interconnectées

    Get PDF
    International audienc

    Measurement of the W boson polarisation in ttˉt\bar{t} events from pp collisions at s\sqrt{s} = 8 TeV in the lepton + jets channel with ATLAS

    Get PDF

    Measurements of top-quark pair differential cross-sections in the eμe\mu channel in pppp collisions at s=13\sqrt{s} = 13 TeV using the ATLAS detector

    Get PDF

    Search for single production of vector-like quarks decaying into Wb in pp collisions at s=8\sqrt{s} = 8 TeV with the ATLAS detector

    Get PDF

    Search for dark matter in association with a Higgs boson decaying to bb-quarks in pppp collisions at s=13\sqrt s=13 TeV with the ATLAS detector

    Get PDF

    ATLAS Run 1 searches for direct pair production of third-generation squarks at the Large Hadron Collider

    Get PDF
    corecore