7,410 research outputs found
Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
This paper presents a Lisp architecture for a portable NLP system, termed
LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard,
customized and in-house developed NLP tools. Our system facilitates portability
across different institutions and data systems by incorporating an enriched
Common Data Model (CDM) to standardize necessary data elements. It utilizes
UMLS to perform domain adaptation when integrating generic domain NLP tools. It
also features stand-off annotations that are specified by positional reference
to the original document. We built an interval tree based search engine to
efficiently query and retrieve the stand-off annotations by specifying
positional requirements. We also developed a utility to convert an inline
annotation format to stand-off annotations to enable the reuse of clinical text
datasets with inline annotations. We experimented with our system on several
NLP facilitated tasks including computational phenotyping for lymphoma patients
and semantic relation extraction for clinical notes. These experiments
showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications
We describe the SemEval task of extracting keyphrases and relations between
them from scientific documents, which is crucial for understanding which
publications describe which processes, tasks and materials. Although this was a
new task, we had a total of 26 submissions across 3 evaluation scenarios. We
expect the task and the findings reported in this paper to be relevant for
researchers working on understanding scientific content, as well as the broader
knowledge base population and information extraction communities
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
Recommended from our members
Coreference resolution in clinical discharge summaries, progress notes, surgical and pathology reports: a unified lexical approach
We developed a lexical rule-based system that uses a unified approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA) provided for the fifth i2b2/VA shared task. Taking the unweighted mean between 4 coreference metrics, validation of the system against the i2b2/VA corpus attained an overall F-score of 87.7% across all mention classes, with a maximum of 93.1% for coreference of persons, and a minimum of 77.2% for coreference of tests. For the ODIE corpus the overall F-score across all mention classes was 79.4%, with a maximum of 82.0% for coreference of persons and a minimum of 13.1% for coreference of diagnostic reagents. For the ODIE corpus our results are comparable to the mean reported inter-annotator agreement with the gold standard. We discuss the four categories of errors we identified, and how these might be addressed. The system uses a number of reusable modules and techniques that may be of benefit to the research community
Crowdsourcing Argumentation Structures in Chinese Hotel Reviews
Argumentation mining aims at automatically extracting the premises-claim
discourse structures in natural language texts. There is a great demand for
argumentation corpora for customer reviews. However, due to the controversial
nature of the argumentation annotation task, there exist very few large-scale
argumentation corpora for customer reviews. In this work, we novelly use the
crowdsourcing technique to collect argumentation annotations in Chinese hotel
reviews. As the first Chinese argumentation dataset, our corpus includes 4814
argument component annotations and 411 argument relation annotations, and its
annotations qualities are comparable to some widely used argumentation corpora
in other languages.Comment: 6 pages,3 figures,This article has been submitted to "The 2017 IEEE
International Conference on Systems, Man, and Cybernetics (SMC2017)
Recommended from our members
Lexical patterns, features and knowledge resources for coreference resolution in clinical notes
Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download
- …