332 research outputs found
On the smoothness of nonlinear system identification
We shed new light on the \textit{smoothness} of optimization problems arising
in prediction error parameter estimation of linear and nonlinear systems. We
show that for regions of the parameter space where the model is not
contractive, the Lipschitz constant and -smoothness of the objective
function might blow up exponentially with the simulation length, making it hard
to numerically find minima within those regions or, even, to escape from them.
In addition to providing theoretical understanding of this problem, this paper
also proposes the use of multiple shooting as a viable solution. The proposed
method minimizes the error between a prediction model and the observed values.
Rather than running the prediction model over the entire dataset, multiple
shooting splits the data into smaller subsets and runs the prediction model
over each subset, making the simulation length a design parameter and making it
possible to solve problems that would be infeasible using a standard approach.
The equivalence to the original problem is obtained by including constraints in
the optimization. The new method is illustrated by estimating the parameters of
nonlinear systems with chaotic or unstable behavior, as well as neural
networks. We also present a comparative analysis of the proposed method with
multi-step-ahead prediction error minimization
On Merging Feature Engineering and Deep Learning for Diagnosis, Risk-Prediction and Age Estimation Based on the 12-Lead ECG
Objective: Machine learning techniques have been used extensively for 12-lead
electrocardiogram (ECG) analysis. For physiological time series, deep learning
(DL) superiority to feature engineering (FE) approaches based on domain
knowledge is still an open question. Moreover, it remains unclear whether
combining DL with FE may improve performance. Methods: We considered three
tasks intending to address these research gaps: cardiac arrhythmia diagnosis
(multiclass-multilabel classification), atrial fibrillation risk prediction
(binary classification), and age estimation (regression). We used an overall
dataset of 2.3M 12-lead ECG recordings to train the following models for each
task: i) a random forest taking the FE as input was trained as a classical
machine learning approach; ii) an end-to-end DL model; and iii) a merged model
of FE+DL. Results: FE yielded comparable results to DL while necessitating
significantly less data for the two classification tasks and it was
outperformed by DL for the regression task. For all tasks, merging FE with DL
did not improve performance over DL alone. Conclusion: We found that for
traditional 12-lead ECG based diagnosis tasks DL did not yield a meaningful
improvement over FE, while it improved significantly the nontraditional
regression task. We also found that combining FE with DL did not improve over
DL alone which suggests that the FE were redundant with the features learned by
DL. Significance: Our findings provides important recommendations on what
machine learning strategy and data regime to chose with respect to the task at
hand for the development of new machine learning models based on the 12-lead
ECG
Beyond exploding and vanishing gradients:analysing RNN training using attractors and smoothness
The exploding and vanishing gradient problem has been the major conceptual principle behind most architecture and training improvements in recurrent neural networks (RNNs) during the last decade. In this paper, we argue that this principle, while powerful, might need some refinement to explain recent developments. We refine the concept of exploding gradients by reformulating the problem in terms of the cost function smoothness, which gives insight into higher-order derivatives and the existence of regions with many close local minima. We also clarify the distinction between vanishing gradients and the need for the RNN to learn attractors to fully use its expressive power. Through the lens of these refinements, we shed new light on recent developments in the RNN field, namely stable RNN and unitary (or orthogonal) RNNs
Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness
The exploding and vanishing gradient problem has been the major conceptual
principle behind most architecture and training improvements in recurrent
neural networks (RNNs) during the last decade. In this paper, we argue that
this principle, while powerful, might need some refinement to explain recent
developments. We refine the concept of exploding gradients by reformulating the
problem in terms of the cost function smoothness, which gives insight into
higher-order derivatives and the existence of regions with many close local
minima. We also clarify the distinction between vanishing gradients and the
need for the RNN to learn attractors to fully use its expressive power. Through
the lens of these refinements, we shed new light on recent developments in the
RNN field, namely stable RNN and unitary (or orthogonal) RNNs.Comment: To appear in the Proceedings of the 23rd International Conference on
Artificial Intelligence and Statistics (AISTATS), 2020. PMLR: Volume 108.
This paper was previously titled "The trade-off between long-term memory and
smoothness for recurrent networks". The current version subsumes all previous
version
- …