128,614 research outputs found
Recommended from our members
Discovering body site and severity modifiers in clinical texts
Objective: To research computational methods for discovering body site and severity modifiers in clinical texts. Methods: We cast the task of discovering body site and severity modifiers as a relation extraction problem in the context of a supervised machine learning framework. We utilize rich linguistic features to represent the pairs of relation arguments and delegate the decision about the nature of the relationship between them to a support vector machine model. We evaluate our models using two corpora that annotate body site and severity modifiers. We also compare the model performance to a number of rule-based baselines. We conduct cross-domain portability experiments. In addition, we carry out feature ablation experiments to determine the contribution of various feature groups. Finally, we perform error analysis and report the sources of errors. Results: The performance of our method for discovering body site modifiers achieves F1 of 0.740–0.908 and our method for discovering severity modifiers achieves F1 of 0.905–0.929. Discussion Results indicate that both methods perform well on both in-domain and out-domain data, approaching the performance of human annotators. The most salient features are token and named entity features, although syntactic dependency features also contribute to the overall performance. The dominant sources of errors are infrequent patterns in the data and inability of the system to discern deeper semantic structures. Conclusions: We investigated computational methods for discovering body site and severity modifiers in clinical texts. Our best system is released open source as part of the clinical Text Analysis and Knowledge Extraction System (cTAKES)
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Reveal the Unknown: Out-of-Knowledge-Base Mention Discovery with Entity Linking
Discovering entity mentions that are out of a Knowledge Base (KB) from texts
plays a critical role in KB maintenance, but has not yet been fully explored.
The current methods are mostly limited to the simple threshold-based approach
and feature-based classification; the datasets for evaluation are relatively
rare. In this work, we propose BLINKout, a new BERT-based Entity Linking (EL)
method which can identify mentions that do not have a corresponding KB entity
by matching them to a special NIL entity. To this end, we integrate novel
techniques including NIL representation, NIL classification, and synonym
enhancement. We also propose Ontology Pruning and Versioning strategies to
construct out-of-KB mentions from normal, in-KB EL datasets. Results on four
datasets of clinical notes and publications show that BLINKout outperforms
existing methods to detect out-of-KB mentions for medical ontologies UMLS and
SNOMED CT
Viewpoint Discovery and Understanding in Social Networks
The Web has evolved to a dominant platform where everyone has the opportunity
to express their opinions, to interact with other users, and to debate on
emerging events happening around the world. On the one hand, this has enabled
the presence of different viewpoints and opinions about a - usually
controversial - topic (like Brexit), but at the same time, it has led to
phenomena like media bias, echo chambers and filter bubbles, where users are
exposed to only one point of view on the same topic. Therefore, there is the
need for methods that are able to detect and explain the different viewpoints.
In this paper, we propose a graph partitioning method that exploits social
interactions to enable the discovery of different communities (representing
different viewpoints) discussing about a controversial topic in a social
network like Twitter. To explain the discovered viewpoints, we describe a
method, called Iterative Rank Difference (IRD), which allows detecting
descriptive terms that characterize the different viewpoints as well as
understanding how a specific term is related to a viewpoint (by detecting other
related descriptive terms). The results of an experimental evaluation showed
that our approach outperforms state-of-the-art methods on viewpoint discovery,
while a qualitative analysis of the proposed IRD method on three different
controversial topics showed that IRD provides comprehensive and deep
representations of the different viewpoints
- …