31,497 research outputs found
Active Selection of Classification Features
Some data analysis applications comprise datasets, where explanatory
variables are expensive or tedious to acquire, but auxiliary data are readily
available and might help to construct an insightful training set. An example is
neuroimaging research on mental disorders, specifically learning a
diagnosis/prognosis model based on variables derived from expensive Magnetic
Resonance Imaging (MRI) scans, which often requires large sample sizes.
Auxiliary data, such as demographics, might help in selecting a smaller sample
that comprises the individuals with the most informative MRI scans. In active
learning literature, this problem has not yet been studied, despite promising
results in related problem settings that concern the selection of instances or
instance-feature pairs.
Therefore, we formulate this complementary problem of Active Selection of
Classification Features (ASCF): Given a primary task, which requires to learn a
model f: x-> y to explain/predict the relationship between an
expensive-to-acquire set of variables x and a class label y. Then, the
ASCF-task is to use a set of readily available selection variables z to select
these instances, that will improve the primary task's performance most when
acquiring their expensive features z and including them to the primary training
set.
We propose two utility-based approaches for this problem, and evaluate their
performance on three public real-world benchmark datasets. In addition, we
illustrate the use of these approaches to efficiently acquire MRI scans in the
context of neuroimaging research on mental disorders, based on a simulated
study design with real MRI data.Comment: Accepted for publication at the 19th Intelligent Data Analysis
Symposium, 2021. The final authenticated publication will be made available
online at springer.co
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
ALEC: Active learning with ensemble of classifiers for clinical diagnosis of coronary artery disease
Invasive angiography is the reference standard for coronary artery disease (CAD) diagnosis but is expensive and
associated with certain risks. Machine learning (ML) using clinical and noninvasive imaging parameters can be
used for CAD diagnosis to avoid the side effects and cost of angiography. However, ML methods require labeled
samples for efficient training. The labeled data scarcity and high labeling costs can be mitigated by active
learning. This is achieved through selective query of challenging samples for labeling. To the best of our
knowledge, active learning has not been used for CAD diagnosis yet. An Active Learning with Ensemble of
Classifiers (ALEC) method is proposed for CAD diagnosis, consisting of four classifiers. Three of these classifiers
determine whether a patient’s three main coronary arteries are stenotic or not. The fourth classifier predicts
whether the patient has CAD or not. ALEC is first trained using labeled samples. For each unlabeled sample, if the
outputs of the classifiers are consistent, the sample along with its predicted label is added to the pool of labeled
samples. Inconsistent samples are manually labeled by medical experts before being added to the pool. The
training is performed once more using the samples labeled so far. The interleaved phases of labeling and training
are repeated until all samples are labeled. Compared with 19 other active learning algorithms, ALEC combined
with a support vector machine classifier attained superior performance with 97.01% accuracy. Our method is
justified mathematically as well. We also comprehensively analyze the CAD dataset used in this paper. As part of
dataset analysis, features pairwise correlation is computed. The top 15 features contributing to CAD and stenosis
of the three main coronary arteries are determined. The relationship between stenosis of the main arteries is
presented using conditional probabilities. The effect of considering the number of stenotic arteries on sample
discrimination is investigated. The discrimination power over dataset samples is visualized, assuming each of the
three main coronary arteries as a sample label and considering the two remaining arteries as sample features
Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce
Electronic commerce is revolutionizing the way we think about
data modeling, by making it possible to integrate the processes of
(costly) data acquisition and model induction. The opportunity for
improving modeling through costly data acquisition presents itself
for a diverse set of electronic commerce modeling tasks, from personalization
to customer lifetime value modeling; we illustrate with
the running example of choosing offers to display to web-site visitors,
which captures important aspects in a familiar setting. Considering
data acquisition costs explicitly can allow the building of
predictive models at significantly lower costs, and a modeler may
be able to improve performance via new sources of information that
previously were too expensive to consider. However, existing techniques
for integrating modeling and data acquisition cannot deal
with the rich environment that electronic commerce presents. We
discuss several possible data acquisition settings, the challenges involved
in the integration with modeling, and various research areas
that may supply parts of an ultimate solution. We also present and
demonstrate briefly a unified framework within which one can integrate
acquisitions of different types, with any cost structure and
any predictive modeling objectiveNYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Deep Functional Maps: Structured Prediction for Dense Shape Correspondence
We introduce a new framework for learning dense correspondence between
deformable 3D shapes. Existing learning based approaches model shape
correspondence as a labelling problem, where each point of a query shape
receives a label identifying a point on some reference domain; the
correspondence is then constructed a posteriori by composing the label
predictions of two input shapes. We propose a paradigm shift and design a
structured prediction model in the space of functional maps, linear operators
that provide a compact representation of the correspondence. We model the
learning process via a deep residual network which takes dense descriptor
fields defined on two shapes as input, and outputs a soft map between the two
given objects. The resulting correspondence is shown to be accurate on several
challenging benchmarks comprising multiple categories, synthetic models, real
scans with acquisition artifacts, topological noise, and partiality.Comment: Accepted for publication at ICCV 201
- …