4 research outputs found
Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities
Scholars in inter-disciplinary fields like the
Digital Humanities are increasingly interested
in semantic annotation of specialized corpora.
Yet, under-resourced languages, imperfect or
noisily structured data, and user-specific classification tasks make it difficult to meet their
needs using off-the-shelf models. Manual annotation of large corpora from scratch, meanwhile, can be prohibitively expensive. Thus,
we propose an active learning solution for
named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. Our system
robustly handles any domain or user-defined
label set and requires no external resources,
enabling quality named entity recognition for
Humanities corpora where such resources are
not available. Evaluating on typologically disparate languages and datasets, we reduce required annotation by 20-60% and greatly outperform a competitive active learning baseline.New York University–Paris Sciences Lettres Global Alliance grant; National Endowment for the Humanities grant, award HAA-256078-17; Computational Approaches to Modeling Language lab
at New York University Abu Dhab
Cephalopod-omics: emerging fields and technologies in cephalopod biology
14 pages, 1 figure.-- This is an Open Access article distributed under the terms of the Creative Commons Attribution LicenseFew animal groups can claim the level of wonder that cephalopods instill in the minds of researchers and the general public. Much of cephalopod biology, however, remains unexplored: the largest invertebrate brain, difficult husbandry conditions, and complex (meta-)genomes, among many other things, have hindered progress in addressing key questions. However, recent technological advancements in sequencing, imaging, and genetic manipulation have opened new avenues for exploring the biology of these extraordinary animals. The cephalopod molecular biology community is thus experiencing a large influx of researchers, emerging from different fields, accelerating the pace of research in this clade. In the first post-pandemic event at the Cephalopod International Advisory Council (CIAC) conference in April 2022, over 40 participants from all over the world met and discussed key challenges and perspectives for current cephalopod molecular biology and evolution. Our particular focus was on the fields of comparative and regulatory genomics, gene manipulation, single-cell transcriptomics, metagenomics, and microbial interactions. This article is a result of this joint effort, summarizing the latest insights from these emerging fields, their bottlenecks, and potential solutions. The article highlights the interdisciplinary nature of the cephalopod-omics community and provides an emphasis on continuous consolidation of efforts and collaboration in this rapidly evolving fieldPeer reviewe
Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities
Scholars in inter-disciplinary fields like the
Digital Humanities are increasingly interested
in semantic annotation of specialized corpora.
Yet, under-resourced languages, imperfect or
noisily structured data, and user-specific classification tasks make it difficult to meet their
needs using off-the-shelf models. Manual annotation of large corpora from scratch, meanwhile, can be prohibitively expensive. Thus,
we propose an active learning solution for
named entity recognition, attempting to maximize a custom model’s improvement per additional unit of manual annotation. Our system
robustly handles any domain or user-defined
label set and requires no external resources,
enabling quality named entity recognition for
Humanities corpora where such resources are
not available. Evaluating on typologically disparate languages and datasets, we reduce required annotation by 20-60% and greatly outperform a competitive active learning baseline.New York University–Paris Sciences Lettres Global Alliance grant; National Endowment for the Humanities grant, award HAA-256078-17; Computational Approaches to Modeling Language lab
at New York University Abu Dhab
Recommended from our members
Cephalopod-omics: Emerging Fields and Technologies in Cephalopod Biology.
Few animal groups can claim the level of wonder that cephalopods instill in the minds of researchers and the general public. Much of cephalopod biology, however, remains unexplored: the largest invertebrate brain, difficult husbandry conditions, and complex (meta-)genomes, among many other things, have hindered progress in addressing key questions. However, recent technological advancements in sequencing, imaging, and genetic manipulation have opened new avenues for exploring the biology of these extraordinary animals. The cephalopod molecular biology community is thus experiencing a large influx of researchers, emerging from different fields, accelerating the pace of research in this clade. In the first post-pandemic event at the Cephalopod International Advisory Council (CIAC) conference in April 2022, over 40 participants from all over the world met and discussed key challenges and perspectives for current cephalopod molecular biology and evolution. Our particular focus was on the fields of comparative and regulatory genomics, gene manipulation, single-cell transcriptomics, metagenomics, and microbial interactions. This article is a result of this joint effort, summarizing the latest insights from these emerging fields, their bottlenecks, and potential solutions. The article highlights the interdisciplinary nature of the cephalopod-omics community and provides an emphasis on continuous consolidation of efforts and collaboration in this rapidly evolving field