19 research outputs found

    Tracking Discourses on Public and Hidden People in Historical Newspapers

    Full text link

    A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

    Get PDF
    Recognizing toponyms and resolving them to their real-world referents is required to provide advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a previously recognized toponym. While it has traditionally received little attention, candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors) and assess its performance in the context of geographical candidate selection in English and Spanish. </p

    Living Machines: A study of atypical animacy

    Get PDF
    This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it, we have created the first dataset for atypical animacy detection, based on nineteenth-century sentences in English, with machines represented as either animate or inanimate. Our method builds on recent innovations in language modeling, specifically BERT contextualized word embeddings, to better capture fine-grained contextual properties of words. We present a fully unsupervised pipeline, which can be easily adapted to different contexts, and report its performance on an established animacy dataset and our newly introduced resource. We show that our method provides a substantially more accurate characterization of atypical animacy, especially when applied to highly complex forms of language use

    Jardins per a la salut

    Get PDF
    Facultat de Farmàcia, Universitat de Barcelona. Ensenyament: Grau de Farmàcia. Assignatura: Botànica farmacèutica. Curs: 2014-2015. Coordinadors: Joan Simon, Cèsar Blanché i Maria Bosch.Els materials que aquí es presenten són el recull de les fitxes botàniques de 128 espècies presents en el Jardí Ferran Soldevila de l’Edifici Històric de la UB. Els treballs han estat realitzats manera individual per part dels estudiants dels grups M-3 i T-1 de l’assignatura Botànica Farmacèutica durant els mesos de febrer a maig del curs 2014-15 com a resultat final del Projecte d’Innovació Docent «Jardins per a la salut: aprenentatge servei a Botànica farmacèutica» (codi 2014PID-UB/054). Tots els treballs s’han dut a terme a través de la plataforma de GoogleDocs i han estat tutoritzats pels professors de l’assignatura. L’objectiu principal de l’activitat ha estat fomentar l’aprenentatge autònom i col·laboratiu en Botànica farmacèutica. També s’ha pretès motivar els estudiants a través del retorn de part del seu esforç a la societat a través d’una experiència d’Aprenentatge-Servei, deixant disponible finalment el treball dels estudiants per a poder ser consultable a través d’una Web pública amb la possibilitat de poder-ho fer in-situ en el propi jardí mitjançant codis QR amb un smartphone

    Del poder del llenguatge al llenguatge del poder : anàlisi del llenguatge i de la traducció de dues distopies

    No full text
    Aquest treball ofereix una crítica i una proposta de traducció de dues novel•les distòpiques: Nineteen Eighty-Four de George Orwell i The Handmaid's Tale de Margaret Atwoodm, les quals tracten el tema del llenguatge com a un dels elements clau del sistema totalitari que descriuen

    Del poder del llenguatge al llenguatge del poder : anàlisi del llenguatge i de la traducció de dues distopies

    No full text
    Aquest treball ofereix una crítica i una proposta de traducció de dues novel•les distòpiques: Nineteen Eighty-Four de George Orwell i The Handmaid's Tale de Margaret Atwoodm, les quals tracten el tema del llenguatge com a un dels elements clau del sistema totalitari que descriuen

    Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

    No full text
    We present two datasets, one for the task of toponym recognition and one for the task of toponym disambiguation. The datasets are derived from the "Dataset for Toponym Resolution in Nineteenth-Century English Newspapers" (DOI: https://doi.org/10.23636/r7d4-kw08). The toponym recognition dataset consists of two JSON files (ner_fine_train.json and ner_fine_dev.json), whereas the toponym disambiguation dataset is provided as a TSV file (linking_df_split.tsv)

    DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching

    No full text
    We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach is especially useful where only limited training examples are available. The learned DeezyMatch models can be used to generate rich vector representations from string inputs. The candidate ranker component in DeezyMatch uses these vector representations to find, for a given query, the best matching candidates in a knowledge base. It uses an adaptive searching algorithm applicable to large knowledge bases and query sets. We describe DeezyMatch’s functionality, design and implementation, accompanied by a use case in toponym matching and candidate ranking in realistic noisy datasets
    corecore