Search CORE

51,199 research outputs found

Automated screening of natural language in electronic health records for the diagnosis septic shock is feasible and outperforms an approach based on explicit administrative codes

Author: Colpaert Kirsten
De Bus Liesbet
Decruyenaere Johan
Depuydt Pieter
Vermassen Joris
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography

Opportunities and Challenges of Using Big Data to Detect Drug-Drug Interaction Risk

Author: Quinney Sara K.
Publication venue: 'Wiley'
Publication date: 01/07/2019
Field of study

IUPUIScholarWorks

Argumentation Mining in User-Generated Web Discourse

Author: Gurevych Iryna
Habernal Ivan
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2015
Field of study

The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

arXiv.org e-Print Archive

TUbiblio

Crossref

Directory of Open Access Journals

TUdatalib Repository (TU Darmstadt)

Extracting information from the text of electronic medical records to improve case detection: a systematic review

Author: Afzal
Afzal
Ananthakrishnan
Baus
Cano
Carroll
Carroll
Carroll
Castro
Chapman
Chen
Chung
Currie
de Lusignan
DeLisle
DeLisle
Donia Scott
Dorr
Elizabeth Ford
Ford
Friedlin
Friedman
Friedman
Graiser
Greenhalgh
Gulliford
Gundlapalli
Hanauer
Hanauer
Hanauer
Harkema
Helen E Smith
Imfeld
Jackie A Cassell
John A Carroll
Jones
Kalra
Karnik
Koeling
Kushida
Li
Liao
Lin
Lindberg
Love
Lovis
Ludvigsson
Manning
Manuel
McPeek Hinz
Mehrabi
Meystre
Nielen
Pakhomov
Pakhomov
Powsner
Rait
Resnik
Roch
Ryan
Savova
Soler
Stein
Stone
Tange
Tate
Tsui
Uzuner
Valkhoff
Walsh
Widdifield
Wilke
Wu
Xia
Xu
Xu
Yadav
Ye
Zeng
Zeng
Zheng
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)

Crossref

PubMed Central

Sussex Research Online

Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

Author: A Arcuri
AL Rector
AM Wood
AS Glas
B Kulis
C Cortes
C Sammut
CC Diamond
CD Kidd
CR MacIntyre
DP Lewis
E Koumoundouros
E Rahm
EM Knorr
ES Fisher
GE Box
GM Weber
H Carter
H He
H Meyer
H Quan
HH Hoos
I Yoo
J Andreu-Perez
J Fan
J Zhao
JD Lafferty
JM Bland
JW Graham
K Lange
KP Murphy
LA King
LM Collins
M Azarm-Daigle
M Kantardzic
M Sokolova
MA Stoto
N Oreskes
PB Jensen
PK Lindenauer
PM Visscher
RJ Little
V López
V Sessions
VN Vapnik
W Raghupathi
Y Luo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/01/2018
Field of study

From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focuses on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it relates to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20 Pages, 1 Figur

arXiv.org e-Print Archive

Crossref

Do peers see more in a paper than its authors?

Author: Divoli Anna
Hearst Marti
Nakov Preslav
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances

Directory of Open Access Journals

PubMed Central

eScholarship - University of California