211,778 research outputs found
Uncertainty-Aware Organ Classification for Surgical Data Science Applications in Laparoscopy
Objective: Surgical data science is evolving into a research field that aims
to observe everything occurring within and around the treatment process to
provide situation-aware data-driven assistance. In the context of endoscopic
video analysis, the accurate classification of organs in the field of view of
the camera proffers a technical challenge. Herein, we propose a new approach to
anatomical structure classification and image tagging that features an
intrinsic measure of confidence to estimate its own performance with high
reliability and which can be applied to both RGB and multispectral imaging (MI)
data. Methods: Organ recognition is performed using a superpixel classification
strategy based on textural and reflectance information. Classification
confidence is estimated by analyzing the dispersion of class probabilities.
Assessment of the proposed technology is performed through a comprehensive in
vivo study with seven pigs. Results: When applied to image tagging, mean
accuracy in our experiments increased from 65% (RGB) and 80% (MI) to 90% (RGB)
and 96% (MI) with the confidence measure. Conclusion: Results showed that the
confidence measure had a significant influence on the classification accuracy,
and MI data are better suited for anatomical structure labeling than RGB data.
Significance: This work significantly enhances the state of art in automatic
labeling of endoscopic videos by introducing the use of the confidence metric,
and by being the first study to use MI data for in vivo laparoscopic tissue
classification. The data of our experiments will be released as the first in
vivo MI dataset upon publication of this paper.Comment: 7 pages, 6 images, 2 table
A Neural Approach to Discourse Relation Signal Detection
Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as 'however' or phrases such as 'as a result' has focused on the relative frequencies of signal words within and outside text from each discourse relation. Such approaches do not allow us to quantify the signaling strength of individual instances of a signal on a scale (e.g. more or less discourse-relevant instances of 'and'), to assess the distribution of ambiguity for signals, or to identify words that hinder discourse relation identification in context ('anti-signals' or 'distractors'). In this paper we present a data-driven approach to signal detection using a distantly supervised neural network and develop a metric, Δs (or 'delta-softmax'), to quantify signaling strength. Ranging between -1 and 1 and relying on recent advances in contextualized words embeddings, the metric represents each word's positive or negative contribution to the identifiability of a relation in specific instances in context. Based on an English corpus annotated for discourse relations using Rhetorical Structure Theory and signal type annotations anchored to specific tokens, our analysis examines the reliability of the metric, the places where it overlaps with and differs from human judgments, and the implications for identifying features that neural models may need in order to perform better on automatic discourse relation classification
Broad phonetic class definition driven by phone confusions
Intermediate representations between the speech signal and phones may be used to improve discrimination
among phones that are often confused. These representations are usually found according to broad phonetic
classes, which are defined by a phonetician. This article proposes an alternative data-driven method to generate
these classes. Phone confusion information from the analysis of the output of a phone recognition system is used
to find clusters at high risk of mutual confusion. A metric is defined to compute the distance between phones. The
results, using TIMIT data, show that the proposed confusion-driven phone clustering method is an attractive
alternative to the approaches based on human knowledge. A hierarchical classification structure to improve phone
recognition is also proposed using a discriminative weight training method. Experiments show improvements in
phone recognition on the TIMIT database compared to a baseline system
Threshold Choice Methods: the Missing Link
Many performance metrics have been introduced for the evaluation of
classification performance, with different origins and niches of application:
accuracy, macro-accuracy, area under the ROC curve, the ROC convex hull, the
absolute error, and the Brier score (with its decomposition into refinement and
calibration). One way of understanding the relation among some of these metrics
is the use of variable operating conditions (either in the form of
misclassification costs or class proportions). Thus, a metric may correspond to
some expected loss over a range of operating conditions. One dimension for the
analysis has been precisely the distribution we take for this range of
operating conditions, leading to some important connections in the area of
proper scoring rules. However, we show that there is another dimension which
has not received attention in the analysis of performance metrics. This new
dimension is given by the decision rule, which is typically implemented as a
threshold choice method when using scoring models. In this paper, we explore
many old and new threshold choice methods: fixed, score-uniform, score-driven,
rate-driven and optimal, among others. By calculating the loss of these methods
for a uniform range of operating conditions we get the 0-1 loss, the absolute
error, the Brier score (mean squared error), the AUC and the refinement loss
respectively. This provides a comprehensive view of performance metrics as well
as a systematic approach to loss minimisation, namely: take a model, apply
several threshold choice methods consistent with the information which is (and
will be) available about the operating condition, and compare their expected
losses. In order to assist in this procedure we also derive several connections
between the aforementioned performance metrics, and we highlight the role of
calibration in choosing the threshold choice method
- …