286 research outputs found
Autoregressive time series prediction by means of fuzzy inference systems using nonparametric residual variance estimation
We propose an automatic methodology framework for short- and long-term prediction of time series by means of fuzzy inference systems. In this methodology, fuzzy techniques and statistical techniques for nonparametric residual variance estimation are combined in order to build autoregressive predictive models implemented as fuzzy inference systems. Nonparametric residual variance estimation plays a key role in driving the identification and learning procedures. Concrete criteria and procedures within the proposed methodology framework are applied to a number of time series prediction problems. The learn from examples method introduced by Wang and Mendel (W&M) is used for identification. The Levenberg–Marquardt (L–M) optimization method is then applied for tuning. The W&M method produces compact and potentially accurate inference systems when applied after a proper variable selection stage. The L–M method yields the best compromise between accuracy and interpretability of results, among a set of alternatives. Delta test based residual variance estimations are used in order to select the best subset of inputs to the fuzzy inference systems as well as the number of linguistic labels for the inputs. Experiments on a diverse set of time series prediction benchmarks are compared against least-squares support vector machines (LS-SVM), optimally pruned extreme learning machine (OP-ELM), and k-NN based autoregressors. The advantages of the proposed methodology are shown in terms of linguistic interpretability, generalization capability and computational cost. Furthermore, fuzzy models are shown to be consistently more accurate for prediction in the case of time series coming from real-world applications.Ministerio de Ciencia e Innovación TEC2008-04920Junta de Andalucía P08-TIC-03674, IAC07-I-0205:33080, IAC08-II-3347:5626
Fast missing value imputation using ensemble of SOMs
This report presents a methodology for missing value imputation. The methodology is based on an ensemble of Self-Organizing Maps (SOM), which is weighted using Nonnegative Least Squares algorithm. Instead of a need for lengthy validation procedure as when using single SOMs, the ensemble proceeds straight into final model building. Therefore, the methodology has very low computational time while retaining the accuracy. The performance is compared to other state-of-the-art methodologies using two real world databases from different fields
RMSE-ELM: Recursive Model based Selective Ensemble of Extreme Learning Machines for Robustness Improvement
Extreme learning machine (ELM) as an emerging branch of shallow networks has
shown its excellent generalization and fast learning speed. However, for
blended data, the robustness of ELM is weak because its weights and biases of
hidden nodes are set randomly. Moreover, the noisy data exert a negative
effect. To solve this problem, a new framework called RMSE-ELM is proposed in
this paper. It is a two-layer recursive model. In the first layer, the
framework trains lots of ELMs in different groups concurrently, then employs
selective ensemble to pick out an optimal set of ELMs in each group, which can
be merged into a large group of ELMs called candidate pool. In the second
layer, selective ensemble is recursively used on candidate pool to acquire the
final ensemble. In the experiments, we apply UCI blended datasets to confirm
the robustness of our new approach in two key aspects (mean square error and
standard deviation). The space complexity of our method is increased to some
degree, but the results have shown that RMSE-ELM significantly improves
robustness with slightly computational time compared with representative
methods (ELM, OP-ELM, GASEN-ELM, GASEN-BP and E-GASEN). It becomes a potential
framework to solve robustness issue of ELM for high-dimensional blended data in
the future.Comment: Accepted for publication in Mathematical Problems in Engineering,
09/22/201
optimal pruned K-nearest neighbors: op-knn application to financial modeling
The paper proposes a methodology called OP-KNN, which builds a one hidden- layer feedforward neural network, using nearest neighbors neurons with extremely small com- putational time. The main strategy is to select the most relevant variables beforehand, then to build the model using KNN kernels. Multiresponse Sparse Regression (MRSR) is used as the second step in order to rank each kth nearest neighbor and finally as a third step Leave-One- Out estimation is used to select the number of neighbors and to estimate the generalization performances. This new methodology is tested on a toy example and is applied to financial modeling
Using multiple re-embeddings for quantitative steganalysis and image reliability estimation
The quantitative steganalysis problem aims at estimating the amount of payload embedded inside a document. In this paper, JPEG images are considered, and by the use of a re-embedding based methodology, it is possible to estimate the number of original embedding changes performed on the image by a stego source and to slightly improve the estimation regarding classical quantitative steganalysis methods. The major advance of this methodology is that it also enables to obtain a confidence interval on this estimated payload. This confidence interval then permits to evaluate the difficulty of an image, in terms of steganalysis by estimating the reliability of the output. The regression technique comes from the OP-ELM and the reliability is estimated using linear approximation. The methodology is applied with a publicly available stego algorithm, regression model and database of images. The methodology is generic and can be used for any quantitative steganalysis problem of this class
Residual variance estimation using a nearest neighbor statistic
AbstractIn this paper we consider the problem of estimating E[(Y−E[Y∣X])2] based on a finite sample of independent, but not necessarily identically distributed, random variables (Xi,Yi)i=1M. We analyze the theoretical properties of a recently developed estimator. It is shown that the estimator has many theoretically interesting properties, while the practical implementation is simple
Mutual Information Based Initialization of Forward-Backward Search for Feature Selection in Regression Problems
Pure feature selection, where variables are chosen or not to
be in the training data set, still remains as an unsolved problem, especially
when the dimensionality is high. Recently, the Forward-Backward
Search algorithm using the Delta Test to evaluate a possible solution was
presented, showing a good performance. However, due to the locality of
the search procedure, the initial starting point of the search becomes crucial
in order to obtain good results. This paper presents new heuristics to
find a more adequate starting point that could lead to a better solution.
The heuristic is based on the sorting of the variables using the Mutual
Information criterion, and then performing parallel local searches. These
local searches provide an initial starting point for the actual parallel
Forward-Backward algorithm
A boundary corrected expansion of the moments of nearest neighbor distributions
In this paper, the moments of nearest neighbor distance distributions are examined. While the asymptotic form of such moments is well-known, the boundary effect has this far resisted a rigorous analysis. Our goal is to develop a new technique that allows a closed-form high order expansion, where the boundaries are taken into account up to the first order. The resulting theoretical predictions are tested via simulations and found to be much more accurate than the first order approximation obtained by neglecting the boundaries.
While our results are of theoretical interest, they definitely also have important applications in statistics and physics. As a concrete example, we mention estimating Renyi entropies of probability distributions. Moreover, the algebraic technique developed may turn out to be useful in other, related problems including estimation of the Shannon differential entropy
- …