1,445 research outputs found
Efficient Optimization of Echo State Networks for Time Series Datasets
Echo State Networks (ESNs) are recurrent neural networks that only train
their output layer, thereby precluding the need to backpropagate gradients
through time, which leads to significant computational gains. Nevertheless, a
common issue in ESNs is determining its hyperparameters, which are crucial in
instantiating a well performing reservoir, but are often set manually or using
heuristics. In this work we optimize the ESN hyperparameters using Bayesian
optimization which, given a limited budget of function evaluations, outperforms
a grid search strategy. In the context of large volumes of time series data,
such as light curves in the field of astronomy, we can further reduce the
optimization cost of ESNs. In particular, we wish to avoid tuning
hyperparameters per individual time series as this is costly; instead, we want
to find ESNs with hyperparameters that perform well not just on individual time
series but rather on groups of similar time series without sacrificing
predictive performance significantly. This naturally leads to a notion of
clusters, where each cluster is represented by an ESN tuned to model a group of
time series of similar temporal behavior. We demonstrate this approach both on
synthetic datasets and real world light curves from the MACHO survey. We show
that our approach results in a significant reduction in the number of ESN
models required to model a whole dataset, while retaining predictive
performance for the series in each cluster
Noise resistant generalized parametric validity index of clustering for gene expression data
This article has been made available through the Brunel Open Access Publishing Fund.Validity indices have been investigated for decades. However, since there is no study of noise-resistance performance of these indices in the literature, there is no guideline for determining the best clustering in noisy data sets, especially microarray data sets. In this paper, we propose a generalized parametric validity (GPV) index which employs two tunable parameters α and β to control the proportions of objects being considered to calculate the dissimilarities. The greatest advantage of the proposed GPV index is its noise-resistance ability, which results from the flexibility of tuning the parameters. Several rules are set to guide the selection of parameter values. To illustrate the noise-resistance performance of the proposed index, we evaluate the GPV index for assessing five clustering algorithms in two gene expression data simulation models with different noise levels and compare the ability of determining the number of clusters with eight existing indices. We also test the GPV in three groups of real gene expression data sets. The experimental results suggest that the proposed GPV index has superior noise-resistance ability and provides fairly accurate judgements
A robust approach to model-based classification based on trimming and constraints
In a standard classification framework a set of trustworthy learning data are
employed to build a decision rule, with the final aim of classifying unlabelled
units belonging to the test set. Therefore, unreliable labelled observations,
namely outliers and data with incorrect labels, can strongly undermine the
classifier performance, especially if the training size is small. The present
work introduces a robust modification to the Model-Based Classification
framework, employing impartial trimming and constraints on the ratio between
the maximum and the minimum eigenvalue of the group scatter matrices. The
proposed method effectively handles noise presence in both response and
exploratory variables, providing reliable classification even when dealing with
contaminated datasets. A robust information criterion is proposed for model
selection. Experiments on real and simulated data, artificially adulterated,
are provided to underline the benefits of the proposed method
- …