2,103 research outputs found
Personalized Pancreatic Tumor Growth Prediction via Group Learning
Tumor growth prediction, a highly challenging task, has long been viewed as a
mathematical modeling problem, where the tumor growth pattern is personalized
based on imaging and clinical data of a target patient. Though mathematical
models yield promising results, their prediction accuracy may be limited by the
absence of population trend data and personalized clinical characteristics. In
this paper, we propose a statistical group learning approach to predict the
tumor growth pattern that incorporates both the population trend and
personalized data, in order to discover high-level features from multimodal
imaging data. A deep convolutional neural network approach is developed to
model the voxel-wise spatio-temporal tumor progression. The deep features are
combined with the time intervals and the clinical factors to feed a process of
feature selection. Our predictive model is pretrained on a group data set and
personalized on the target patient data to estimate the future spatio-temporal
progression of the patient's tumor. Multimodal imaging data at multiple time
points are used in the learning, personalization and inference stages. Our
method achieves a Dice coefficient of 86.8% +- 3.6% and RVD of 7.9% +- 5.4% on
a pancreatic tumor data set, outperforming the DSC of 84.4% +- 4.0% and RVD
13.9% +- 9.8% obtained by a previous state-of-the-art model-based method
Heuristic Search over a Ranking for Feature Selection
In this work, we suggest a new feature selection technique that lets us use the wrapper approach for finding a well suited feature set for distinguishing experiment classes in high dimensional data sets. Our method is based on the relevance and redundancy idea, in the sense that a ranked-feature is chosen if additional information is gained by adding it. This heuristic leads to considerably better accuracy results, in comparison to the full set, and other representative feature selection algorithms in twelve well–known data sets, coupled with notable dimensionality reduction
Digging into acceptor splice site prediction : an iterative feature selection approach
Feature selection techniques are often used to reduce data dimensionality, increase classification performance, and gain insight into the processes that generated the data. In this paper, we describe an iterative procedure of feature selection and feature construction steps, improving the classification of acceptor splice sites, an important subtask of gene prediction.
We show that acceptor prediction can benefit from feature selection, and describe how feature selection techniques can be used to gain new insights in the classification of acceptor sites. This is illustrated by the identification of a new, biologically motivated feature: the AG-scanning feature.
The results described in this paper contribute both to the domain of gene prediction, and to research in feature selection techniques, describing a new wrapper based feature weighting method that aids in knowledge discovery when dealing with complex datasets
A Methodology for the Diagnostic of Aircraft Engine Based on Indicators Aggregation
Aircraft engine manufacturers collect large amount of engine related data
during flights. These data are used to detect anomalies in the engines in order
to help companies optimize their maintenance costs. This article introduces and
studies a generic methodology that allows one to build automatic early signs of
anomaly detection in a way that is understandable by human operators who make
the final maintenance decision. The main idea of the method is to generate a
very large number of binary indicators based on parametric anomaly scores
designed by experts, complemented by simple aggregations of those scores. The
best indicators are selected via a classical forward scheme, leading to a much
reduced number of indicators that are tuned to a data set. We illustrate the
interest of the method on simulated data which contain realistic early signs of
anomalies.Comment: Proceedings of the 14th Industrial Conference, ICDM 2014, St.
Petersburg : Russian Federation (2014
Is This a Joke? Detecting Humor in Spanish Tweets
While humor has been historically studied from a psychological, cognitive and
linguistic standpoint, its study from a computational perspective is an area
yet to be explored in Computational Linguistics. There exist some previous
works, but a characterization of humor that allows its automatic recognition
and generation is far from being specified. In this work we build a
crowdsourced corpus of labeled tweets, annotated according to its humor value,
letting the annotators subjectively decide which are humorous. A humor
classifier for Spanish tweets is assembled based on supervised learning,
reaching a precision of 84% and a recall of 69%.Comment: Preprint version, without referra
Predicting sentence translation quality using extrinsic and language independent features
We develop a top performing model for automatic, accurate, and language independent prediction of sentence-level statistical machine translation (SMT) quality with or without looking at the translation outputs.
We derive various feature functions measuring the closeness of a given test sentence to the training data and
the difficulty of translating the sentence.
We describe \texttt{mono} feature functions that are based on statistics of only one side of the parallel
training corpora and \texttt{duo} feature functions that incorporate statistics involving both source and
target sides of the training data.
Overall, we describe novel, language independent, and SMT system extrinsic features for predicting the SMT performance, which also rank high during feature ranking evaluations.
We experiment with different learning settings, with or without looking at the translations, which help differentiate the contribution of different feature sets.
We apply partial least squares and feature subset selection, both of which improve the results and we present ranking of the top features selected for each learning setting, providing an exhaustive analysis of the extrinsic features used.
We show that by just looking at the test source sentences and not using the translation outputs at all, we can
achieve better performance than a baseline system using SMT model dependent features that generated the
translations.
Furthermore, our prediction system is able to achieve the nd best performance overall according to the official
results of the Quality Estimation Task (QET) challenge when also looking at the translation outputs.
Our representation and features achieve the top performance in QET among the models using the SVR learning model
Orientational instabilities in nematics with weak anchoring under combined action of steady flow and external fields
We study the homogeneous and the spatially periodic instabilities in a
nematic liquid crystal layer subjected to steady plane {\em Couette} or {\em
Poiseuille} flow. The initial director orientation is perpendicular to the flow
plane. Weak anchoring at the confining plates and the influence of the external
{\em electric} and/or {\em magnetic} field are taken into account. Approximate
expressions for the critical shear rate are presented and compared with
semi-analytical solutions in case of Couette flow and numerical solutions of
the full set of nematodynamic equations for Poiseuille flow. In particular the
dependence of the type of instability and the threshold on the azimuthal and
the polar anchoring strength and external fields is analysed.Comment: 12 pages, 6 figure
Overcoming Calibration Problems in Pattern Labeling with Pairwise Ratings: Application to Personality Traits
We address the problem of calibration of workers whose task is to label patterns with continuous variables, which arises for instance in labeling images of videos of humans with continuous traits. Worker bias is particularly difficult to evaluate and correct when many workers contribute just a few labels, a situation arising typically when labeling is crowd-sourced. In the scenario of labeling short videos of people facing a camera with personality traits, we evaluate the feasibility of the pairwise ranking method to alleviate bias problems. Workers are exposed to pairs of videos at a time and must order by preference. The variable levels are reconstructed by fitting a Bradley-Terry-Luce model with maximum likelihood. This method may at first sight, seem prohibitively expensive because for N videos, p=N(N−1)/2 pairs must be potentially processed by workers rather that N videos. However, by performing extensive simulations, we determine an empirical law for the scaling of the number of pairs needed as a function of the number of videos in order to achieve a given accuracy of score reconstruction and show that the pairwise method is affordable. We apply the method to the labeling of a large scale dataset of 10,000 videos used in the ChaLearn Apparent Personality Trait challenge
- …