1,688 research outputs found
A regression model with a hidden logistic process for signal parametrization
A new approach for signal parametrization, which consists of a specific
regression model incorporating a discrete hidden logistic process, is proposed.
The model parameters are estimated by the maximum likelihood method performed
by a dedicated Expectation Maximization (EM) algorithm. The parameters of the
hidden logistic process, in the inner loop of the EM algorithm, are estimated
using a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm. An
experimental study using simulated and real data reveals good performances of
the proposed approach.Comment: In Proceedings of the XVIIth European Symposium on Artificial Neural
Networks, Computational Intelligence and Machine Learning (ESANN), Pages
503-508, 2009, Bruges, Belgiu
Time series modeling by a regression approach based on a latent process
Time series are used in many domains including finance, engineering,
economics and bioinformatics generally to represent the change of a measurement
over time. Modeling techniques may then be used to give a synthetic
representation of such data. A new approach for time series modeling is
proposed in this paper. It consists of a regression model incorporating a
discrete hidden logistic process allowing for activating smoothly or abruptly
different polynomial regression models. The model parameters are estimated by
the maximum likelihood method performed by a dedicated Expectation Maximization
(EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative
Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process
parameters. To evaluate the proposed approach, an experimental study on
simulated data and real world data was performed using two alternative
approaches: a heteroskedastic piecewise regression model using a global
optimization algorithm based on dynamic programming, and a Hidden Markov
Regression Model whose parameters are estimated by the Baum-Welch algorithm.
Finally, in the context of the remote monitoring of components of the French
railway infrastructure, and more particularly the switch mechanism, the
proposed approach has been applied to modeling and classifying time series
representing the condition measurements acquired during switch operations
Supervised learning of a regression model based on latent process. Application to the estimation of fuel cell life time
This paper describes a pattern recognition approach aiming to estimate fuel
cell duration time from electrochemical impedance spectroscopy measurements. It
consists in first extracting features from both real and imaginary parts of the
impedance spectrum. A parametric model is considered in the case of the real
part, whereas regression model with latent variables is used in the latter
case. Then, a linear regression model using different subsets of extracted
features is used fo r the estimation of fuel cell time duration. The
performances of the proposed approach are evaluated on experimental data set to
show its feasibility. This could lead to interesting perspectives for
predictive maintenance policy of fuel cell.Comment: In Proceeding of the 8th IEEE International Conference on Machine
Learning and Applications (IEEE ICMLA'09), pages 632-637, 2009, Miami Beach,
FL, US
Gene Hunting with Knockoffs for Hidden Markov Models
Modern scientific studies often require the identification of a subset of
relevant explanatory variables, in the attempt to understand an interesting
phenomenon. Several statistical methods have been developed to automate this
task, but only recently has the framework of model-free knockoffs proposed a
general solution that can perform variable selection under rigorous type-I
error control, without relying on strong modeling assumptions. In this paper,
we extend the methodology of model-free knockoffs to a rich family of problems
where the distribution of the covariates can be described by a hidden Markov
model (HMM). We develop an exact and efficient algorithm to sample knockoff
copies of an HMM. We then argue that combined with the knockoffs selective
framework, they provide a natural and powerful tool for performing principled
inference in genome-wide association studies with guaranteed FDR control.
Finally, we apply our methodology to several datasets aimed at studying the
Crohn's disease and several continuous phenotypes, e.g. levels of cholesterol.Comment: 35 pages, 13 figues, 9 table
A Selective Overview of Deep Learning
Deep learning has arguably achieved tremendous success in recent years. In
simple words, deep learning uses the composition of many nonlinear functions to
model the complex dependency between input features and labels. While neural
networks have a long history, recent advances have greatly improved their
performance in computer vision, natural language processing, etc. From the
statistical and scientific perspective, it is natural to ask: What is deep
learning? What are the new characteristics of deep learning, compared with
classical methods? What are the theoretical foundations of deep learning? To
answer these questions, we introduce common neural network models (e.g.,
convolutional neural nets, recurrent neural nets, generative adversarial nets)
and training techniques (e.g., stochastic gradient descent, dropout, batch
normalization) from a statistical point of view. Along the way, we highlight
new characteristics of deep learning (including depth and over-parametrization)
and explain their practical and theoretical benefits. We also sample recent
results on theories of deep learning, many of which are only suggestive. While
a complete understanding of deep learning remains elusive, we hope that our
perspectives and discussions serve as a stimulus for new statistical research
A hidden process regression model for functional data description. Application to curve discrimination
A new approach for functional data description is proposed in this paper. It
consists of a regression model with a discrete hidden logistic process which is
adapted for modeling curves with abrupt or smooth regime changes. The model
parameters are estimated in a maximum likelihood framework through a dedicated
Expectation Maximization (EM) algorithm. From the proposed generative model, a
curve discrimination rule is derived using the Maximum A Posteriori rule. The
proposed model is evaluated using simulated curves and real world curves
acquired during railway switch operations, by performing comparisons with the
piecewise regression approach in terms of curve modeling and classification
Fast Second-Order Stochastic Backpropagation for Variational Inference
We propose a second-order (Hessian or Hessian-free) based optimization method
for variational inference inspired by Gaussian backpropagation, and argue that
quasi-Newton optimization can be developed as well. This is accomplished by
generalizing the gradient computation in stochastic backpropagation via a
reparametrization trick with lower complexity. As an illustrative example, we
apply this approach to the problems of Bayesian logistic regression and
variational auto-encoder (VAE). Additionally, we compute bounds on the
estimator variance of intractable expectations for the family of Lipschitz
continuous function. Our method is practical, scalable and model free. We
demonstrate our method on several real-world datasets and provide comparisons
with other stochastic gradient methods to show substantial enhancement in
convergence rates.Comment: Accepted by NIPS 201
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
We study the properties of common loss surfaces through their Hessian matrix.
In particular, in the context of deep learning, we empirically show that the
spectrum of the Hessian is composed of two parts: (1) the bulk centered near
zero, (2) and outliers away from the bulk. We present numerical evidence and
mathematical justifications to the following conjectures laid out by Sagun et
al. (2016): Fixing data, increasing the number of parameters merely scales the
bulk of the spectrum; fixing the dimension and changing the data (for instance
adding more clusters or making the data less separable) only affects the
outliers. We believe that our observations have striking implications for
non-convex optimization in high dimensions. First, the flatness of such
landscapes (which can be measured by the singularity of the Hessian) implies
that classical notions of basins of attraction may be quite misleading. And
that the discussion of wide/narrow basins may be in need of a new perspective
around over-parametrization and redundancy that are able to create large
connected components at the bottom of the landscape. Second, the dependence of
small number of large eigenvalues to the data distribution can be linked to the
spectrum of the covariance matrix of gradients of model outputs. With this in
mind, we may reevaluate the connections within the data-architecture-algorithm
framework of a model, hoping that it would shed light into the geometry of
high-dimensional and non-convex spaces in modern applications. In particular,
we present a case that links the two observations: small and large batch
gradient descent appear to converge to different basins of attraction but we
show that they are in fact connected through their flat region and so belong to
the same basin.Comment: Minor update for ICLR 2018 Workshop Track presentatio
An overview of latent Markov models for longitudinal categorical data
We provide a comprehensive overview of latent Markov (LM) models for the
analysis of longitudinal categorical data. The main assumption behind these
models is that the response variables are conditionally independent given a
latent process which follows a first-order Markov chain. We first illustrate
the basic LM model in which the conditional distribution of each response
variable given the corresponding latent variable and the initial and transition
probabilities of the latent process are unconstrained. For this model we also
illustrate in detail maximum likelihood estimation through the
Expectation-Maximization algorithm, which may be efficiently implemented by
recursions known in the hidden Markov literature. We then illustrate several
constrained versions of the basic LM model, which make the model more
parsimonious and allow us to include and test hypotheses of interest. These
constraints may be put on the conditional distribution of the response
variables given the latent process (measurement model) or on the distribution
of the latent process (latent model). We also deal with extensions of LM model
for the inclusion of individual covariates and to multilevel data. Covariates
may affect the measurement or the latent model; we discuss the implications of
these two different approaches according to the context of application.
Finally, we outline methods for obtaining standard errors for the parameter
estimates, for selecting the number of states and for path prediction. Models
and related inference are illustrated by the description of relevant
socio-economic applications available in the literature
Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent
In this paper, we propose a novel approach to automatically determine the
batch size in stochastic gradient descent methods. The choice of the batch size
induces a trade-off between the accuracy of the gradient estimate and the cost
in terms of samples of each update. We propose to determine the batch size by
optimizing the ratio between a lower bound to a linear or quadratic Taylor
approximation of the expected improvement and the number of samples used to
estimate the gradient. The performance of the proposed approach is empirically
compared with related methods on popular classification tasks.
The work was presented at the NIPS workshop on Optimizing the Optimizers.
Barcelona, Spain, 2016.Comment: Presented at the NIPS workshop on Optimizing the Optimizers.
Barcelona, Spain, 201
- …