4 research outputs found

    Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities

    Get PDF
    Scholars in inter-disciplinary fields like the Digital Humanities are increasingly interested in semantic annotation of specialized corpora. Yet, under-resourced languages, imperfect or noisily structured data, and user-specific classification tasks make it difficult to meet their needs using off-the-shelf models. Manual annotation of large corpora from scratch, meanwhile, can be prohibitively expensive. Thus, we propose an active learning solution for named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. Our system robustly handles any domain or user-defined label set and requires no external resources, enabling quality named entity recognition for Humanities corpora where such resources are not available. Evaluating on typologically disparate languages and datasets, we reduce required annotation by 20-60% and greatly outperform a competitive active learning baseline.New York University–Paris Sciences Lettres Global Alliance grant; National Endowment for the Humanities grant, award HAA-256078-17; Computational Approaches to Modeling Language lab at New York University Abu Dhab

    Cephalopod-omics: emerging fields and technologies in cephalopod biology

    Get PDF
    14 pages, 1 figure.-- This is an Open Access article distributed under the terms of the Creative Commons Attribution LicenseFew animal groups can claim the level of wonder that cephalopods instill in the minds of researchers and the general public. Much of cephalopod biology, however, remains unexplored: the largest invertebrate brain, difficult husbandry conditions, and complex (meta-)genomes, among many other things, have hindered progress in addressing key questions. However, recent technological advancements in sequencing, imaging, and genetic manipulation have opened new avenues for exploring the biology of these extraordinary animals. The cephalopod molecular biology community is thus experiencing a large influx of researchers, emerging from different fields, accelerating the pace of research in this clade. In the first post-pandemic event at the Cephalopod International Advisory Council (CIAC) conference in April 2022, over 40 participants from all over the world met and discussed key challenges and perspectives for current cephalopod molecular biology and evolution. Our particular focus was on the fields of comparative and regulatory genomics, gene manipulation, single-cell transcriptomics, metagenomics, and microbial interactions. This article is a result of this joint effort, summarizing the latest insights from these emerging fields, their bottlenecks, and potential solutions. The article highlights the interdisciplinary nature of the cephalopod-omics community and provides an emphasis on continuous consolidation of efforts and collaboration in this rapidly evolving fieldPeer reviewe

    Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities

    Get PDF
    Scholars in inter-disciplinary fields like the Digital Humanities are increasingly interested in semantic annotation of specialized corpora. Yet, under-resourced languages, imperfect or noisily structured data, and user-specific classification tasks make it difficult to meet their needs using off-the-shelf models. Manual annotation of large corpora from scratch, meanwhile, can be prohibitively expensive. Thus, we propose an active learning solution for named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. Our system robustly handles any domain or user-defined label set and requires no external resources, enabling quality named entity recognition for Humanities corpora where such resources are not available. Evaluating on typologically disparate languages and datasets, we reduce required annotation by 20-60% and greatly outperform a competitive active learning baseline.New York University–Paris Sciences Lettres Global Alliance grant; National Endowment for the Humanities grant, award HAA-256078-17; Computational Approaches to Modeling Language lab at New York University Abu Dhab
    corecore