12,544 research outputs found
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
Bipartite Flat-Graph Network for Nested Named Entity Recognition
In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for
nested named entity recognition (NER), which contains two subgraph modules: a
flat NER module for outermost entities and a graph module for all the entities
located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional
network (GCN) are adopted to jointly learn flat entities and their inner
dependencies. Different from previous models, which only consider the
unidirectional delivery of information from innermost layers to outer ones (or
outside-to-inside), our model effectively captures the bidirectional
interaction between them. We first use the entities recognized by the flat NER
module to construct an entity graph, which is fed to the next graph module. The
richer representation learned from graph module carries the dependencies of
inner entities and can be exploited to improve outermost entity predictions.
Experimental results on three standard nested NER datasets demonstrate that our
BiFlaG outperforms previous state-of-the-art models.Comment: Accepted by ACL202
A Practical Incremental Learning Framework For Sparse Entity Extraction
This work addresses challenges arising from extracting entities from textual
data, including the high cost of data annotation, model accuracy, selecting
appropriate evaluation criteria, and the overall quality of annotation. We
present a framework that integrates Entity Set Expansion (ESE) and Active
Learning (AL) to reduce the annotation cost of sparse data and provide an
online evaluation method as feedback. This incremental and interactive learning
framework allows for rapid annotation and subsequent extraction of sparse data
while maintaining high accuracy. We evaluate our framework on three publicly
available datasets and show that it drastically reduces the cost of sparse
entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores
respectively. Moreover, the method exhibited robust performance across all
datasets.Comment: https://www.aclweb.org/anthology/C18-1059
Recommended from our members
Embracing Problems, Processes, and Contact Zones: Using Youth Participatory Action Research to Challenge Adultism
Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches
We study the adaptation of Link Grammar Parser to the biomedical sublanguage
with a focus on domain terms not found in a general parser lexicon. Using two
biomedical corpora, we implement and evaluate three approaches to addressing
unknown words: automatic lexicon expansion, the use of morphological clues, and
disambiguation using a part-of-speech tagger. We evaluate each approach
separately for its effect on parsing performance and consider combinations of
these approaches. In addition to a 45% increase in parsing efficiency, we find
that the best approach, incorporating information from a domain part-of-speech
tagger, offers a statistically signicant 10% relative decrease in error. The
adapted parser is available under an open-source license at
http://www.it.utu.fi/biolg
Cell line name recognition in support of the identification of synthetic lethality in cancer from text
Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus.
Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers
Chinese firms entering China's low-income market: Gaining competitive advantage by partnering governments
This paper investigates poverty alleviation efforts in China and the nature of governmententerprise partnerships there. We argue that firms partnering central and local governments can be an effective strategy to overcome resource-based obstacles in low-income markets. In China, local and central governments are owners of rare and valuable resources, thus offering better access to finance, infrastructure, technical and planning expertise, advocacy through government marketing and distribution channels, and links to other stakeholders. The findings are based on 16 case studies of firms entering the low-income market in China, of which two cases in the agricultural and telecommunication sector are studied in depth. --Partnerships,government,poverty alleviation,China,base of the pyramid
Energy service companies in China: The role of social networks and trust
China's energy-service companies (ESCOs) have developed only modestly despite favorable political and market conditions. We argue that with sophisticated market institutions still evolving in China, trust-based relations between ESCOs and energy customers are essential for successful implementation of energy efficiency projects. Chinese ESCOs, who are predominantly small and private enterprises, perform poorly in terms of trust-building because they are disembedded from local business, social, and political networks. We conclude that in the current institutional setting, the ESCO model based on market relations has serious limitations and is unlikely to lead to large-scale implementation of energy efficiency projects in China. --energy policies,energy service companies (ESCO),social networks,trust,China
An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols
We describe an effort to annotate a corpus of natural language instructions
consisting of 622 wet lab protocols to facilitate automatic or semi-automatic
conversion of protocols into a machine-readable format and benefit biological
research. Experimental results demonstrate the utility of our corpus for
developing machine learning approaches to shallow semantic parsing of
instructional texts. We make our annotated Wet Lab Protocol Corpus available to
the research community
- …
