806 research outputs found
Finite mixture regression: A sparse variable selection by model selection for clustering
We consider a finite mixture of Gaussian regression model for high-
dimensional data, where the number of covariates may be much larger than the
sample size. We propose to estimate the unknown conditional mixture density by
a maximum likelihood estimator, restricted on relevant variables selected by an
1-penalized maximum likelihood estimator. We get an oracle inequality satisfied
by this estimator with a Jensen-Kullback-Leibler type loss. Our oracle
inequality is deduced from a general model selection theorem for maximum
likelihood estimators with a random model collection. We can derive the penalty
shape of the criterion, which depends on the complexity of the random model
collection.Comment: 20 pages. arXiv admin note: text overlap with arXiv:1103.2021 by
other author
L'iconographie de la stèle funéraire de T. Exomnius Mansuetus, praefectus cohortis (pl. II A)
Uncertain Trees: Dealing with Uncertain Inputs in Regression Trees
Tree-based ensemble methods, as Random Forests and Gradient Boosted Trees,
have been successfully used for regression in many applications and research
studies. Furthermore, these methods have been extended in order to deal with
uncertainty in the output variable, using for example a quantile loss in Random
Forests (Meinshausen, 2006). To the best of our knowledge, no extension has
been provided yet for dealing with uncertainties in the input variables, even
though such uncertainties are common in practical situations. We propose here
such an extension by showing how standard regression trees optimizing a
quadratic loss can be adapted and learned while taking into account the
uncertainties in the inputs. By doing so, one no longer assumes that an
observation lies into a single region of the regression tree, but rather that
it belongs to each region with a certain probability. Experiments conducted on
several data sets illustrate the good behavior of the proposed extension.Comment: 9 page
Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach
In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance
Data mining: a tool for detecting cyclical disturbances in supply networks.
Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means
clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause
Stable network inference in high-dimensional graphical model using single-linkage
Stability, akin to reproducibility, is crucial in statistical analysis. This
paper examines the stability of sparse network inference in high-dimensional
graphical models, where selected edges should remain consistent across
different samples. Our study focuses on the Graphical Lasso and its
decomposition into two steps, with the first step involving hierarchical
clustering using single linkage.We provide theoretical proof that single
linkage is stable, evidenced by controlled distances between two dendrograms
inferred from two samples. Practical experiments further illustrate the
stability of the Graphical Lasso's various steps, including dendrograms,
variable clusters, and final networks. Our results, validated through both
theoretical analysis and practical experiments using simulated and real
datasets, demonstrate that single linkage is more stable than other methods
when a modular structure is present
Emotion Recognition using Wireless Signals
This paper demonstrates a new technology that can infer a person's emotions from RF signals reflected off his body. EQ-Radio transmits an RF signal and analyzes its reflections off a person's body to recognize his emotional state (happy, sad, etc.). The key enabler underlying EQ-Radio is a new algorithm for extracting the individual heartbeats from the wireless signal at an accuracy comparable to on-body ECG monitors. The resulting beats are then used to compute emotion-dependent features which feed a machine-learning emotion classifier. We describe the design and implementation of EQ-Radio, and demonstrate through a user study that its emotion recognition accuracy is on par with state-of-the-art emotion recognition systems that require a person to be hooked to an ECG monitor. Keywords: Wireless Signals; Wireless Sensing; Emotion Recognition;
Affective Computing; Heart Rate VariabilityNational Science Foundation (U.S.)United States. Air Forc
- …
