77,595 research outputs found
An MDL framework for sparse coding and dictionary learning
The power of sparse signal modeling with learned over-complete dictionaries
has been demonstrated in a variety of applications and fields, from signal
processing to statistical inference and machine learning. However, the
statistical properties of these models, such as under-fitting or over-fitting
given sets of data, are still not well characterized in the literature. As a
result, the success of sparse modeling depends on hand-tuning critical
parameters for each data and application. This work aims at addressing this by
providing a practical and objective characterization of sparse models by means
of the Minimum Description Length (MDL) principle -- a well established
information-theoretic approach to model selection in statistical inference. The
resulting framework derives a family of efficient sparse coding and dictionary
learning algorithms which, by virtue of the MDL principle, are completely
parameter free. Furthermore, such framework allows to incorporate additional
prior information to existing models, such as Markovian dependencies, or to
define completely new problem formulations, including in the matrix analysis
area, in a natural way. These virtues will be demonstrated with parameter-free
algorithms for the classic image denoising and classification problems, and for
low-rank matrix recovery in video applications
Robust exponential smoothing of multivariate time series.
Multivariate time series may contain outliers of different types. In presence of such outliers, applying standard multivariate time series techniques becomes unreliable. A robust version of multivariate exponential smoothing is proposed. The method is affine equivariant, and involves the selection of a smoothing parameter matrix by minimizing a robust loss function. It is shown that the robust method results in much better forecasts than the classic approach in presence of outliers, and performs similar when the data contain no outliers. Moreover, the robust procedure yields an estimator of the smoothing parameter less subject to downward bias. As a byproduct, a cleaned version of the time series is obtained, as is illustrated by means of a real data example.Data cleaning; Exponential smoothing; Forecasting; Multivariate time series; Robustness;
Improved model identification for nonlinear systems using a random subsampling and multifold modelling (RSMM) approach
In nonlinear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of âhold-outâ or âsplit-sampleâ data partitioning
method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To
overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. Firstly, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect
significant model terms and identify a common model structure that fits all the K datasets using a new
proposed common model selection approach, called the multiple orthogonal search algorithm. Finally,
estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance
Two Procedures for Robust Monitoring of Probability Distributions of Economic Data Streams induced by Depth Functions
Data streams (streaming data) consist of transiently observed, evolving in
time, multidimensional data sequences that challenge our computational and/or
inferential capabilities. In this paper we propose user friendly approaches for
robust monitoring of selected properties of unconditional and conditional
distribution of the stream basing on depth functions. Our proposals are robust
to a small fraction of outliers and/or inliers but sensitive to a regime change
of the stream at the same time. Their implementations are available in our free
R package DepthProc.Comment: Operations Research and Decisions, vol. 25, No. 1, 201
Emotion Recognition from Acted and Spontaneous Speech
DizertaÄnĂ prĂĄce se zabĂœvĂĄ rozpoznĂĄnĂm emoÄnĂho stavu mluvÄĂch z ĆeÄovĂ©ho signĂĄlu. PrĂĄce je rozdÄlena do dvou hlavnĂch ÄastĂ, prvnĂ ÄĂĄst popisuju navrĆŸenĂ© metody pro rozpoznĂĄnĂ emoÄnĂho stavu z hranĂœch databĂĄzĂ. V rĂĄmci tĂ©to ÄĂĄsti jsou pĆedstaveny vĂœsledky rozpoznĂĄnĂ pouĆŸitĂm dvou rĆŻznĂœch databĂĄzĂ s rĆŻznĂœmi jazyky. HlavnĂmi pĆĂnosy tĂ©to ÄĂĄsti je detailnĂ analĂœza rozsĂĄhlĂ© ĆĄkĂĄly rĆŻznĂœch pĆĂznakĆŻ zĂskanĂœch z ĆeÄovĂ©ho signĂĄlu, nĂĄvrh novĂœch klasifikaÄnĂch architektur jako je napĆĂklad âemoÄnĂ pĂĄrovĂĄnĂâ a nĂĄvrh novĂ© metody pro mapovĂĄnĂ diskrĂ©tnĂch emoÄnĂch stavĆŻ do dvou dimenzionĂĄlnĂho prostoru. DruhĂĄ ÄĂĄst se zabĂœvĂĄ rozpoznĂĄnĂm emoÄnĂch stavĆŻ z databĂĄze spontĂĄnnĂ ĆeÄi, kterĂĄ byla zĂskĂĄna ze zĂĄznamĆŻ hovorĆŻ z reĂĄlnĂœch call center. Poznatky z analĂœzy a nĂĄvrhu metod rozpoznĂĄnĂ z hranĂ© ĆeÄi byly vyuĆŸity pro nĂĄvrh novĂ©ho systĂ©mu pro rozpoznĂĄnĂ sedmi spontĂĄnnĂch emoÄnĂch stavĆŻ. JĂĄdrem navrĆŸenĂ©ho pĆĂstupu je komplexnĂ klasifikaÄnĂ architektura zaloĆŸena na fĂșzi rĆŻznĂœch systĂ©mĆŻ. PrĂĄce se dĂĄle zabĂœvĂĄ vlivem emoÄnĂho stavu mluvÄĂho na ĂșspÄĆĄnosti rozpoznĂĄnĂ pohlavĂ a nĂĄvrhem systĂ©mu pro automatickou detekci ĂșspÄĆĄnĂœch hovorĆŻ v call centrech na zĂĄkladÄ analĂœzy parametrĆŻ dialogu mezi ĂșÄastnĂky telefonnĂch hovorĆŻ.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as âemotion couplingâ and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speakerâs emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.
Estimation of Single-Index Models Based on Boosting Techniques
In single-index models the link or response function is not considered as fixed. The data determine the form of the unknown link function. In order to obtain a flexible form of the link function we specify the link function as an expansion in basis function and propose to estimate parameters as well as the link function by weak learners within a boosting framework. It is shown that the method is a strong competitor to existing methods. The method is investigated in simulation studies and applied to real data
- âŠ