460,986 research outputs found

    Non-linear dimensionality reduction techniques for classification

    Get PDF
    This thesis project concerns on dimensionality reduction through manifold learning with a focus on non linear techniques. Dimension Reduction (DR) is the process of reducing high dimension dataset with d feature (dimension) to one with a lower number of feature p (p ≪ d) that preserves the information contained in the original higher dimensional space. More in general, the concept of manifold learning is introduced, a generalized approach that involves algorithm for dimensionality reduction. Manifold learning can be divided in two main categories: Linear and Non Linear method. Although, linear method, such as Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) are widely used and well known, there are plenty of non linear techniques i.e. Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Local Tangent Space Alignment (LTSA), which in recent years have been subject of studies. This project is inspired by the work done by [Bahadur et Al., 2017 ], with the aim to estimate the US market dimensionality using Russell 3000 as a proxy of financial market. Since financial markets are high dimensional and complex environment an approach with non linear techniques among linear is proposed.This thesis project concerns on dimensionality reduction through manifold learning with a focus on non linear techniques. Dimension Reduction (DR) is the process of reducing high dimension dataset with d feature (dimension) to one with a lower number of feature p (p ≪ d) that preserves the information contained in the original higher dimensional space. More in general, the concept of manifold learning is introduced, a generalized approach that involves algorithm for dimensionality reduction. Manifold learning can be divided in two main categories: Linear and Non Linear method. Although, linear method, such as Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) are widely used and well known, there are plenty of non linear techniques i.e. Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Local Tangent Space Alignment (LTSA), which in recent years have been subject of studies. This project is inspired by the work done by [Bahadur et Al., 2017 ], with the aim to estimate the US market dimensionality using Russell 3000 as a proxy of financial market. Since financial markets are high dimensional and complex environment an approach with non linear techniques among linear is proposed

    Non-linear dimension reduction in factor-augmented vector autoregressions

    Full text link
    This paper introduces non-linear dimension reduction in factor-augmented vector autoregressions to analyze the effects of different economic shocks. I argue that controlling for non-linearities between a large-dimensional dataset and the latent factors is particularly useful during turbulent times of the business cycle. In simulations, I show that non-linear dimension reduction techniques yield good forecasting performance, especially when data is highly volatile. In an empirical application, I identify a monetary policy as well as an uncertainty shock excluding and including observations of the COVID-19 pandemic. Those two applications suggest that the non-linear FAVAR approaches are capable of dealing with the large outliers caused by the COVID-19 pandemic and yield reliable results in both scenarios.Comment: JEL: C11, C32, C40, C55, E37. Keywords: Dimension reduction, machine learning, non-linear factor-augmented vector autoregression, monetary policy shock, uncertainty shock, impulse response analysis, COVID-1

    Training Process Reduction Based On Potential Weights Linear Analysis To Accelarate Back Propagation Network

    Get PDF
    Learning is the important property of Back Propagation Network (BPN) and finding the suitable weights and thresholds during training in order to improve training time as well as achieve high accuracy. Currently, data pre-processing such as dimension reduction input values and pre-training are the contributing factors in developing efficient techniques for reducing training time with high accuracy and initialization of the weights is the important issue which is random and creates paradox, and leads to low accuracy with high training time. One good data preprocessing technique for accelerating BPN classification is dimension reduction technique but it has problem of missing data. In this paper, we study current pre-training techniques and new preprocessing technique called Potential Weight Linear Analysis (PWLA) which combines normalization, dimension reduction input values and pre-training. In PWLA, the first data preprocessing is performed for generating normalized input values and then applying them by pre-training technique in order to obtain the potential weights. After these phases, dimension of input values matrix will be reduced by using real potential weights. For experiment results XOR problem and three datasets, which are SPECT Heart, SPECTF Heart and Liver disorders (BUPA) will be evaluated. Our results, however, will show that the new technique of PWLA will change BPN to new Supervised Multi Layer Feed Forward Neural Network (SMFFNN) model with high accuracy in one epoch without training cycle. Also PWLA will be able to have power of non linear supervised and unsupervised dimension reduction property for applying by other supervised multi layer feed forward neural network model in future work.Comment: 11 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS 2009, ISSN 1947 5500, Impact factor 0.42

    Non-linear, Sparse Dimensionality Reduction via Path Lasso Penalized Autoencoders

    Full text link
    High-dimensional data sets are often analyzed and explored via the construction of a latent low-dimensional space which enables convenient visualization and efficient predictive modeling or clustering. For complex data structures, linear dimensionality reduction techniques like PCA may not be sufficiently flexible to enable low-dimensional representation. Non-linear dimension reduction techniques, like kernel PCA and autoencoders, suffer from loss of interpretability since each latent variable is dependent of all input dimensions. To address this limitation, we here present path lasso penalized autoencoders. This structured regularization enhances interpretability by penalizing each path through the encoder from an input to a latent variable, thus restricting how many input variables are represented in each latent dimension. Our algorithm uses a group lasso penalty and non-negative matrix factorization to construct a sparse, non-linear latent representation. We compare the path lasso regularized autoencoder to PCA, sparse PCA, autoencoders and sparse autoencoders on real and simulated data sets. We show that the algorithm exhibits much lower reconstruction errors than sparse PCA and parameter-wise lasso regularized autoencoders for low-dimensional representations. Moreover, path lasso representations provide a more accurate reconstruction match, i.e. preserved relative distance between objects in the original and reconstructed spaces

    Effectiveness of landmark analysis for establishing locality in p2p networks

    Get PDF
    Locality to other nodes on a peer-to-peer overlay network can be established by means of a set of landmarks shared among the participating nodes. Each node independently collects a set of latency measures to landmark nodes, which are used as a multi-dimensional feature vector. Each peer node uses the feature vector to generate a unique scalar index which is correlated to its topological locality. A popular dimensionality reduction technique is the space filling Hilbert’s curve, as it possesses good locality preserving properties. However, there exists little comparison between Hilbert’s curve and other techniques for dimensionality reduction. This work carries out a quantitative analysis of their properties. Linear and non-linear techniques for scaling the landmark vectors to a single dimension are investigated. Hilbert’s curve, Sammon’s mapping and Principal Component Analysis have been used to generate a 1d space with locality preserving properties. This work provides empirical evidence to support the use of Hilbert’s curve in the context of locality preservation when generating peer identifiers by means of landmark vector analysis. A comparative analysis is carried out with an artificial 2d network model and with a realistic network topology model with a typical power-law distribution of node connectivity in the Internet. Nearest neighbour analysis confirms Hilbert’s curve to be very effective in both artificial and realistic network topologies. Nevertheless, the results in the realistic network model show that there is scope for improvements and better techniques to preserve locality information are required

    Methods for Estimation of Intrinsic Dimensionality

    Get PDF
    Dimension reduction is an important tool used to describe the structure of complex data (explicitly or implicitly) through a small but sufficient number of variables, and thereby make data analysis more efficient. It is also useful for visualization purposes. Dimension reduction helps statisticians to overcome the ‘curse of dimensionality’. However, most dimension reduction techniques require the intrinsic dimension of the low-dimensional subspace to be fixed in advance. The availability of reliable intrinsic dimension (ID) estimation techniques is of major importance. The main goal of this thesis is to develop algorithms for determining the intrinsic dimensions of recorded data sets in a nonlinear context. Whilst this is a well-researched topic for linear planes, based mainly on principal components analysis, relatively little attention has been paid to ways of estimating this number for non–linear variable interrelationships. The proposed algorithms here are based on existing concepts that can be categorized into local methods, relying on randomly selected subsets of a recorded variable set, and global methods, utilizing the entire data set. This thesis provides an overview of ID estimation techniques, with special consideration given to recent developments in non–linear techniques, such as charting manifold and fractal–based methods. Despite their nominal existence, the practical implementation of these techniques is far from straightforward. The intrinsic dimension is estimated via Brand’s algorithm by examining the growth point process, which counts the number of points in hyper-spheres. The estimation needs to determine the starting point for each hyper-sphere. In this thesis we provide settings for selecting starting points which work well for most data sets. Additionally we propose approaches for estimating dimensionality via Brand’s algorithm, the Dip method and the Regression method. Other approaches are proposed for estimating the intrinsic dimension by fractal dimension estimation methods, which exploit the intrinsic geometry of a data set. The most popular concept from this family of methods is the correlation dimension, which requires the estimation of the correlation integral for a ball of radius tending to 0. In this thesis we propose new approaches to approximate the correlation integral in this limit. The new approaches are the Intercept method, the Slop method and the Polynomial method. In addition we propose a new approach, a localized global method, which could be defined as a local version of global ID methods. The objective of the localized global approach is to improve the algorithm based on a local ID method, which could significantly reduce the negative bias. Experimental results on real world and simulated data are used to demonstrate the algorithms and compare them to other methodology. A simulation study which verifies the effectiveness of the proposed methods is also provided. Finally, these algorithms are contrasted using a recorded data set from an industrial melter process

    Real-time Inflation Forecasting Using Non-linear Dimension Reduction Techniques

    Get PDF
    In this paper, we assess whether using non-linear dimension reduction techniques pays off for forecasting inflation in real-time. Several recent methods from the machine learning literature are adopted to map a large dimensional dataset into a lower dimensional set of latent factors. We model the relationship between inflation and these latent factors using state-of-the-art time-varying parameter (TVP) regressions with shrinkage priors. Using monthly real-time data for the US, our results suggest that adding such non-linearities yields forecasts that are on average highly competitive to ones obtained from methods using linear dimension reduction techniques. Zooming into model performance over time moreover reveals that controlling for non-linear relations in the data is of particular importance during recessionary episodes of the business cycle

    Real-time inflation forecasting using non-linear dimension reduction techniques

    Get PDF
    In this paper, we assess whether using non-linear dimension reduction techniques pays off for forecasting inflation in real-time. Several recent methods from the machine learning literature are adopted to map a large dimensional dataset into a lower-dimensional set of latent factors. We model the relationship between inflation and the latent factors using constant and time-varying parameter (TVP) regressions with shrinkage priors. Our models are then used to forecast monthly US inflation in real-time. The results suggest that sophisticated dimension reduction methods yield inflation forecasts that are highly competitive with linear approaches based on principal components. Among the techniques considered, the Autoencoder and squared principal components yield factors that have high predictive power for one-month- and one-quarter-ahead inflation. Zooming into model performance over time reveals that controlling for non-linear relations in the data is of particular importance during recessionary episodes of the business cycle or the current COVID-19 pandemic
    corecore