Search CORE

12,544 research outputs found

Using distributional similarity to organise biomedical terminology

Author: Dowdall James
Keller Bill
Schneider Gerold
Weeds Julie
Weir David
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

ZORA

Sussex Research Online

Bipartite Flat-Graph Network for Nested Named Entity Recognition

Author: Luo Ying
Zhao Hai
Publication venue
Publication date: 01/01/2020
Field of study

In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for nested named entity recognition (NER), which contains two subgraph modules: a flat NER module for outermost entities and a graph module for all the entities located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional network (GCN) are adopted to jointly learn flat entities and their inner dependencies. Different from previous models, which only consider the unidirectional delivery of information from innermost layers to outer ones (or outside-to-inside), our model effectively captures the bidirectional interaction between them. We first use the entities recognized by the flat NER module to construct an entity graph, which is fed to the next graph module. The richer representation learned from graph module carries the dependencies of inner entities and can be exploited to improve outermost entity predictions. Experimental results on three standard nested NER datasets demonstrate that our BiFlaG outperforms previous state-of-the-art models.Comment: Accepted by ACL202

arXiv.org e-Print Archive

Crossref

A Practical Incremental Learning Framework For Sparse Entity Extraction

Author: Al-Olimat Hussein S.
Gustafson Steven
Mackay Jason
Sheth Amit
Thirunarayan Krishnaprasad
Publication venue
Publication date: 01/08/2018
Field of study

This work addresses challenges arising from extracting entities from textual data, including the high cost of data annotation, model accuracy, selecting appropriate evaluation criteria, and the overall quality of annotation. We present a framework that integrates Entity Set Expansion (ESE) and Active Learning (AL) to reduce the annotation cost of sparse data and provide an online evaluation method as feedback. This incremental and interactive learning framework allows for rapid annotation and subsequent extraction of sparse data while maintaining high accuracy. We evaluate our framework on three publicly available datasets and show that it drastically reduces the cost of sparse entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores respectively. Moreover, the method exhibited robust performance across all datasets.Comment: https://www.aclweb.org/anthology/C18-1059

arXiv.org e-Print Archive

Scholar Commons - Institutional Repository of the University of South Carolina

CORE

Recommended from our members

Embracing Problems, Processes, and Contact Zones: Using Youth Participatory Action Research to Challenge Adultism

Author: Bettencourt Genia
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2018
Field of study

ScholarWorks@UMass Amherst

Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches

Author: Aubin Sophie
Nazarenko Adeline
Pyysalo Sampo
Salakoski Tapio
Publication venue
Publication date: 01/01/2006
Field of study

We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically signicant 10% relative decrease in error. The adapted parser is available under an open-source license at http://www.it.utu.fi/biolg

arXiv.org e-Print Archive

CiteSeerX

Springer - Publisher Connector

Cell line name recognition in support of the identification of synthetic lethality in cancer from text

Author: Ginter Filip
Kaewphan Suwisa
Ohta Tomoko
Pyysalo Sampo
Van de Peer Yves
Van Landeghem Sofie
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2015
Field of study

Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers

Crossref

Ghent University Academic Bibliography

PubMed Central

UPSpace at the University of Pretoria

Chinese firms entering China's low-income market: Gaining competitive advantage by partnering governments

Author: Kostka Genia
Zhou Jianghua
Publication venue
Publication date
Field of study

This paper investigates poverty alleviation efforts in China and the nature of governmententerprise partnerships there. We argue that firms partnering central and local governments can be an effective strategy to overcome resource-based obstacles in low-income markets. In China, local and central governments are owners of rare and valuable resources, thus offering better access to finance, infrastructure, technical and planning expertise, advocacy through government marketing and distribution channels, and links to other stakeholders. The findings are based on 16 case studies of firms entering the low-income market in China, of which two cases in the agricultural and telecommunication sector are studied in depth. --Partnerships,government,poverty alleviation,China,base of the pyramid

Research Papers in Economics

Energy service companies in China: The role of social networks and trust

Author: Kostka Genia
Shin Kyoung
Publication venue
Publication date
Field of study

China's energy-service companies (ESCOs) have developed only modestly despite favorable political and market conditions. We argue that with sophisticated market institutions still evolving in China, trust-based relations between ESCOs and energy customers are essential for successful implementation of energy efficiency projects. Chinese ESCOs, who are predominantly small and private enterprises, perform poorly in terms of trust-building because they are disembedded from local business, social, and political networks. We conclude that in the current institutional setting, the ESCO model based on market relations has serious limitations and is unlikely to lead to large-scale implementation of energy efficiency projects in China. --energy policies,energy service companies (ESCO),social networks,trust,China

Research Papers in Economics

An Annotated Corpus for Machine Reading of Instructions in Wet Lab Protocols

Author: Kulkarni Chaitanya
Machiraju Raghu
Ritter Alan
Xu Wei
Publication venue
Publication date: 01/01/2018
Field of study

We describe an effort to annotate a corpus of natural language instructions consisting of 622 wet lab protocols to facilitate automatic or semi-automatic conversion of protocols into a machine-readable format and benefit biological research. Experimental results demonstrate the utility of our corpus for developing machine learning approaches to shallow semantic parsing of instructional texts. We make our annotated Wet Lab Protocol Corpus available to the research community

arXiv.org e-Print Archive

Crossref