70,388 research outputs found
Modeling of Facial Aging and Kinship: A Survey
Computational facial models that capture properties of facial cues related to
aging and kinship increasingly attract the attention of the research community,
enabling the development of reliable methods for age progression, age
estimation, age-invariant facial characterization, and kinship verification
from visual data. In this paper, we review recent advances in modeling of
facial aging and kinship. In particular, we provide an up-to date, complete
list of available annotated datasets and an in-depth analysis of geometric,
hand-crafted, and learned facial representations that are used for facial aging
and kinship characterization. Moreover, evaluation protocols and metrics are
reviewed and notable experimental results for each surveyed task are analyzed.
This survey allows us to identify challenges and discuss future research
directions for the development of robust facial models in real-world
conditions
An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction
Data of sequential nature arise in many application domains in forms of, e.g.
textual data, DNA sequences, and software execution traces. Different research
disciplines have developed methods to learn sequence models from such datasets:
(i) in the machine learning field methods such as (hidden) Markov models and
recurrent neural networks have been developed and successfully applied to a
wide-range of tasks, (ii) in process mining process discovery techniques aim to
generate human-interpretable descriptive models, and (iii) in the grammar
inference field the focus is on finding descriptive models in the form of
formal grammars. Despite their different focuses, these fields share a common
goal - learning a model that accurately describes the behavior in the
underlying data. Those sequence models are generative, i.e, they can predict
what elements are likely to occur after a given unfinished sequence. So far,
these fields have developed mainly in isolation from each other and no
comparison exists. This paper presents an interdisciplinary experimental
evaluation that compares sequence modeling techniques on the task of
next-element prediction on four real-life sequence datasets. The results
indicate that machine learning techniques that generally have no aim at
interpretability in terms of accuracy outperform techniques from the process
mining and grammar inference fields that aim to yield interpretable models
A Compositional Textual Model for Recognition of Imperfect Word Images
Printed text recognition is an important problem for industrial OCR systems.
Printed text is constructed in a standard procedural fashion in most settings.
We develop a mathematical model for this process that can be applied to the
backward inference problem of text recognition from an image. Through ablation
experiments we show that this model is realistic and that a multi-task
objective setting can help to stabilize estimation of its free parameters,
enabling use of conventional deep learning methods. Furthermore, by directly
modeling the geometric perturbations of text synthesis we show that our model
can help recover missing characters from incomplete text regions, the bane of
multicomponent OCR systems, enabling recognition even when the detection
returns incomplete information
Multi-Institutional Deep Learning Modeling Without Sharing Patient Data: A Feasibility Study on Brain Tumor Segmentation
Deep learning models for semantic segmentation of images require large
amounts of data. In the medical imaging domain, acquiring sufficient data is a
significant challenge. Labeling medical image data requires expert knowledge.
Collaboration between institutions could address this challenge, but sharing
medical data to a centralized location faces various legal, privacy, technical,
and data-ownership challenges, especially among international institutions. In
this study, we introduce the first use of federated learning for
multi-institutional collaboration, enabling deep learning modeling without
sharing patient data. Our quantitative results demonstrate that the performance
of federated semantic segmentation models (Dice=0.852) on multimodal brain
scans is similar to that of models trained by sharing data (Dice=0.862). We
compare federated learning with two alternative collaborative learning methods
and find that they fail to match the performance of federated learning.Comment: MICCAI, Brain Lesion (BrainLes) workshop, September 16, 2018,
Granada, Spai
Representation Learning with Autoencoders for Electronic Health Records: A Comparative Study
Increasing volume of Electronic Health Records (EHR) in recent years provides
great opportunities for data scientists to collaborate on different aspects of
healthcare research by applying advanced analytics to these EHR clinical data.
A key requirement however is obtaining meaningful insights from high
dimensional, sparse and complex clinical data. Data science approaches
typically address this challenge by performing feature learning in order to
build more reliable and informative feature representations from clinical data
followed by supervised learning. In this paper, we propose a predictive
modeling approach based on deep learning based feature representations and word
embedding techniques. Our method uses different deep architectures (stacked
sparse autoencoders, deep belief network, adversarial autoencoders and
variational autoencoders) for feature representation in higher-level
abstraction to obtain effective and robust features from EHRs, and then build
prediction models on top of them. Our approach is particularly useful when the
unlabeled data is abundant whereas labeled data is scarce. We investigate the
performance of representation learning through a supervised learning approach.
Our focus is to present a comparative study to evaluate the performance of
different deep architectures through supervised learning and provide insights
in the choice of deep feature representation techniques. Our experiments
demonstrate that for small data sets, stacked sparse autoencoder demonstrates a
superior generality performance in prediction due to sparsity regularization
whereas variational autoencoders outperform the competing approaches for large
data sets due to its capability of learning the representation distribution
Extensions of Morse-Smale Regression with Application to Actuarial Science
The problem of subgroups is ubiquitous in scientific research (ex. disease
heterogeneity, spatial distributions in ecology...), and piecewise regression
is one way to deal with this phenomenon. Morse-Smale regression offers a way to
partition the regression function based on level sets of a defined function and
that function's basins of attraction. This topologically-based piecewise
regression algorithm has shown promise in its initial applications, but the
current implementation in the literature has been limited to elastic net and
generalized linear regression. It is possible that nonparametric methods, such
as random forest or conditional inference trees, may provide better prediction
and insight through modeling interaction terms and other nonlinear
relationships between predictors and a given outcome.
This study explores the use of several machine learning algorithms within a
Morse-Smale piecewise regression framework, including boosted regression with
linear baselearners, homotopy-based LASSO, conditional inference trees, random
forest, and a wide neural network framework called extreme learning machines.
Simulations on Tweedie regression problems with varying Tweedie parameter and
dispersion suggest that many machine learning approaches to Morse-Smale
piecewise regression improve the original algorithm's performance, particularly
for outcomes with lower dispersion and linear or a mix of linear and nonlinear
predictor relationships. On a real actuarial problem, several of these new
algorithms perform as good as or better than the original Morse-Smale
regression algorithm, and most provide information on the nature of predictor
relationships within each partition to provide insight into differences between
dataset partitions.Comment: 14 pages, 10 figure
Relief-Based Feature Selection: Introduction and Review
Feature selection plays a critical role in biomedical data mining, driven by
increasing feature dimensionality in target problems and growing interest in
advanced but computationally expensive methodologies able to model complex
associations. Specifically, there is a need for feature selection methods that
are computationally efficient, yet sensitive to complex patterns of
association, e.g. interactions, so that informative features are not mistakenly
eliminated prior to downstream modeling. This paper focuses on Relief-based
algorithms (RBAs), a unique family of filter-style feature selection algorithms
that have gained appeal by striking an effective balance between these
objectives while flexibly adapting to various data characteristics, e.g.
classification vs. regression. First, this work broadly examines types of
feature selection and defines RBAs within that context. Next, we introduce the
original Relief algorithm and associated concepts, emphasizing the intuition
behind how it works, how feature weights generated by the algorithm can be
interpreted, and why it is sensitive to feature interactions without evaluating
combinations of features. Lastly, we include an expansive review of RBA
methodological research beyond Relief and its popular descendant, ReliefF. In
particular, we characterize branches of RBA research, and provide comparative
summaries of RBA algorithms including contributions, strategies, functionality,
time complexity, adaptation to key data characteristics, and software
availability.Comment: Submitted revisions for publication based on reviews by the Journal
of Biomedical Informatic
Drought Stress Classification using 3D Plant Models
Quantification of physiological changes in plants can capture different
drought mechanisms and assist in selection of tolerant varieties in a high
throughput manner. In this context, an accurate 3D model of plant canopy
provides a reliable representation for drought stress characterization in
contrast to using 2D images. In this paper, we propose a novel end-to-end
pipeline including 3D reconstruction, segmentation and feature extraction,
leveraging deep neural networks at various stages, for drought stress study. To
overcome the high degree of self-similarities and self-occlusions in plant
canopy, prior knowledge of leaf shape based on features from deep siamese
network are used to construct an accurate 3D model using structure from motion
on wheat plants. The drought stress is characterized with a deep network based
feature aggregation. We compare the proposed methodology on several
descriptors, and show that the network outperforms conventional methods.Comment: Appears in Workshop on Computer Vision Problems in Plant Phenotyping
(CVPPP), International Conference on Computer Vision (ICCV) 201
Non-Rigid Point Set Registration Networks
Point set registration is defined as a process to determine the spatial
transformation from the source point set to the target one. Existing methods
often iteratively search for the optimal geometric transformation to register a
given pair of point sets, driven by minimizing a predefined alignment loss
function. In contrast, the proposed point registration neural network (PR-Net)
actively learns the registration pattern as a parametric function from a
training dataset, consequently predict the desired geometric transformation to
align a pair of point sets. PR-Net can transfer the learned knowledge (i.e.
registration pattern) from registering training pairs to testing ones without
additional iterative optimization. Specifically, in this paper, we develop
novel techniques to learn shape descriptors from point sets that help formulate
a clear correlation between source and target point sets. With the defined
correlation, PR-Net tends to predict the transformation so that the source and
target point sets can be statistically aligned, which in turn leads to an
optimal spatial geometric registration. PR-Net achieves robust and superior
performance for non-rigid registration of point sets, even in presence of
Gaussian noise, outliers, and missing points, but requires much less time for
registering large number of pairs. More importantly, for a new pair of point
sets, PR-Net is able to directly predict the desired transformation using the
learned model without repetitive iterative optimization routine. Our code is
available at https://github.com/Lingjing324/PR-Net
A Brief Review of Data Mining Application Involving Protein Sequence Classification
Data mining techniques have been used by researchers for analyzing protein
sequences. In protein analysis, especially in protein sequence classification,
selection of feature is most important. Popular protein sequence classification
techniques involve extraction of specific features from the sequences.
Researchers apply some well-known classification techniques like neural
networks, Genetic algorithm, Fuzzy ARTMAP, Rough Set Classifier etc for
accurate classification. This paper presents a review is with three different
classification models such as neural network model, fuzzy ARTMAP model and
Rough set classifier model. A new technique for classifying protein sequences
have been proposed in the end. The proposed technique tries to reduce the
computational overheads encountered by earlier approaches and increase the
accuracy of classification.Comment: 10 pages, 1 table, 1 figure. arXiv admin note: substantial text
overlap with arXiv:1211.465
- …