697,853 research outputs found
Severity of asymptomatic carotid stenosis and risk of ipsilateral hemispheric ischaemic events: Results from the ACSRS study
Objectives. This study determines the risk of ipsilateral ischaemic neurological events in relation to the degree of asymptomatic carotid stenosis and other risk factors. Methods. Patients (n = 1115) with asymptomatic internal carotid artery (ICA) stenosis greater than 50% in relation to the bulb diameter were followed up for a period of 6-84 (mean 37.1) months. Stenosis was graded using duplex, and clinical and biochemical risk factors were recorded. Results. The relationship between ICA stenosis and event rate is linear when stenosis is expressed by the ECST method, but S-shaped if expressed by the NASCET method. In addition to the ECST grade of stenosis (RR 1.6; 95% CI 1.21-2.15), history of contralateral TIAs (RR 3.0; 95% CI 1.90-4.73) and creatinine in excess of 85 μmol/L (RR 2.1; 95% CI 1.23-3.65) were independent risk predictors. The combination of these three risk factors can identify a high-risk group (7.3% annual event rate and 4.3% annual stroke rate) and a low risk group (2.3% annual event rate and 0.7% annual stroke rate). Conclusions. Linearity between ECST percent stenosis and risk makes this method for grading stenosis more amenable to risk prediction without any transformation not only in clinical practice but also when multivariable analysis is to be used. Identification of additional risk factors provides a new approach to risk stratification and should help refine the indications for carotid endarterectomy. © 2005 Elsevier Ltd. All rights reserved
Weakly-supervised Learning Approaches for Event Knowledge Acquisition and Event Detection
Capabilities of detecting events and recognizing temporal, subevent, or causality relations among events can facilitate many applications in natural language understanding. However, supervised learning approaches that previous research mainly uses have two problems. First, due to the limited size of annotated data, supervised systems cannot sufficiently capture diverse contexts to distill universal event knowledge. Second, under certain application circumstances such as event recognition during emergent natural disasters, it is infeasible to spend days or weeks to annotate enough data to train a system. My research aims to use weakly-supervised learning to address these problems and to achieve automatic event knowledge acquisition and event recognition.
In this dissertation, I first introduce three weakly-supervised learning approaches that have been shown effective in acquiring event relational knowledge. Firstly, I explore the observation that regular event pairs show a consistent temporal relation despite of their various contexts, and these rich contexts can be used to train a contextual temporal relation classifier to further recognize new temporal relation knowledge. Secondly, inspired by the double temporality characteristic of narrative texts, I propose a weakly supervised approach that identifies 287k narrative paragraphs using narratology principles and then extract rich temporal event knowledge from identified narratives. Lastly, I develop a subevent knowledge acquisition approach by exploiting two observations that 1) subevents are temporally contained by the parent event and 2) the definitions of the parent event can be used to guide the identification of subevents. I collect rich weak supervision to train a contextual BERT classifier and apply the classifier to identify new subevent knowledge.
Recognizing texts that describe specific categories of events is also challenging due to language ambiguity and diverse descriptions of events. So I also propose a novel method to rapidly build a fine-grained event recognition system on social media texts for disaster management. My method creates high-quality weak supervision based on clustering-assisted word sense disambiguation and enriches tweet message representations using preceding context tweets and reply tweets in building event recognition classifiers
N-gram Overlap in Automatic Detection of Document Derivation
Establishing authenticity and independence of documents in relation to others is not a new problem, but in the era of hyper production of e-text it certainly gained even more importance. There is an increased need for automatic methods for determining originality of documents in a digital environment. The method of n-gram overlap is only one of several methods proposed by the literature and is used in a variety of systems for automatic identification of text reuse. Although the aforementioned method is quite trivial, determining the length of n-grams that would be a good indicator of text reuse is a somewhat complex issue. We assume that the optimal length of n-grams is not the same for all languages but that it depends on the particular language properties such as morphological typology, syntactic features, etc. The aim of this study is to find the optimal length of n-grams to be used for determining document derivation in Croatian language. Among the potential areas of implementation of the results of this study, we could point out automatic detection of plagiarism in academic and student papers, citation analysis, information flow tracking and event detection in on-line texts
Empirical Methodology for Crowdsourcing Ground Truth
The process of gathering ground truth data through human annotation is a
major bottleneck in the use of information extraction methods for populating
the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the
attempt to solve the issues related to volume of data and lack of annotators.
Typically these practices use inter-annotator agreement as a measure of
quality. However, in many domains, such as event detection, there is ambiguity
in the data, as well as a multitude of perspectives of the information
examples. We present an empirically derived methodology for efficiently
gathering of ground truth data in a diverse set of use cases covering a variety
of domains and annotation tasks. Central to our approach is the use of
CrowdTruth metrics that capture inter-annotator disagreement. We show that
measuring disagreement is essential for acquiring a high quality ground truth.
We achieve this by comparing the quality of the data aggregated with CrowdTruth
metrics with majority vote, over a set of diverse crowdsourcing tasks: Medical
Relation Extraction, Twitter Event Identification, News Event Extraction and
Sound Interpretation. We also show that an increased number of crowd workers
leads to growth and stabilization in the quality of annotations, going against
the usual practice of employing a small number of annotators.Comment: in publication at the Semantic Web Journa
Predicting Community Evolution in Social Networks
Nowadays, sustained development of different social media can be observed
worldwide. One of the relevant research domains intensively explored recently
is analysis of social communities existing in social media as well as
prediction of their future evolution taking into account collected historical
evolution chains. These evolution chains proposed in the paper contain group
states in the previous time frames and its historical transitions that were
identified using one out of two methods: Stable Group Changes Identification
(SGCI) and Group Evolution Discovery (GED). Based on the observed evolution
chains of various length, structural network features are extracted, validated
and selected as well as used to learn classification models. The experimental
studies were performed on three real datasets with different profile: DBLP,
Facebook and Polish blogosphere. The process of group prediction was analysed
with respect to different classifiers as well as various descriptive feature
sets extracted from evolution chains of different length. The results revealed
that, in general, the longer evolution chains the better predictive abilities
of the classification models. However, chains of length 3 to 7 enabled the
GED-based method to almost reach its maximum possible prediction quality. For
SGCI, this value was at the level of 3 to 5 last periods.Comment: Entropy 2015, 17, 1-x manuscripts; doi:10.3390/e170x000x 46 page
Towards Building a Knowledge Base of Monetary Transactions from a News Collection
We address the problem of extracting structured representations of economic
events from a large corpus of news articles, using a combination of natural
language processing and machine learning techniques. The developed techniques
allow for semi-automatic population of a financial knowledge base, which, in
turn, may be used to support a range of data mining and exploration tasks. The
key challenge we face in this domain is that the same event is often reported
multiple times, with varying correctness of details. We address this challenge
by first collecting all information pertinent to a given event from the entire
corpus, then considering all possible representations of the event, and
finally, using a supervised learning method, to rank these representations by
the associated confidence scores. A main innovative element of our approach is
that it jointly extracts and stores all attributes of the event as a single
representation (quintuple). Using a purpose-built test set we demonstrate that
our supervised learning approach can achieve 25% improvement in F1-score over
baseline methods that consider the earliest, the latest or the most frequent
reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL '17), 201
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
- …