1,573 research outputs found

    Non-linear dimensionality reduction techniques for classification

    Get PDF
    This thesis project concerns on dimensionality reduction through manifold learning with a focus on non linear techniques. Dimension Reduction (DR) is the process of reducing high dimension dataset with d feature (dimension) to one with a lower number of feature p (p ≪ d) that preserves the information contained in the original higher dimensional space. More in general, the concept of manifold learning is introduced, a generalized approach that involves algorithm for dimensionality reduction. Manifold learning can be divided in two main categories: Linear and Non Linear method. Although, linear method, such as Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) are widely used and well known, there are plenty of non linear techniques i.e. Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Local Tangent Space Alignment (LTSA), which in recent years have been subject of studies. This project is inspired by the work done by [Bahadur et Al., 2017 ], with the aim to estimate the US market dimensionality using Russell 3000 as a proxy of financial market. Since financial markets are high dimensional and complex environment an approach with non linear techniques among linear is proposed.This thesis project concerns on dimensionality reduction through manifold learning with a focus on non linear techniques. Dimension Reduction (DR) is the process of reducing high dimension dataset with d feature (dimension) to one with a lower number of feature p (p ≪ d) that preserves the information contained in the original higher dimensional space. More in general, the concept of manifold learning is introduced, a generalized approach that involves algorithm for dimensionality reduction. Manifold learning can be divided in two main categories: Linear and Non Linear method. Although, linear method, such as Principal Component Analysis (PCA) and Multidimensional Scaling (MDS) are widely used and well known, there are plenty of non linear techniques i.e. Isometric Feature Mapping (Isomap), Locally Linear Embedding (LLE), Local Tangent Space Alignment (LTSA), which in recent years have been subject of studies. This project is inspired by the work done by [Bahadur et Al., 2017 ], with the aim to estimate the US market dimensionality using Russell 3000 as a proxy of financial market. Since financial markets are high dimensional and complex environment an approach with non linear techniques among linear is proposed

    Supervising Embedding Algorithms Using the Stress

    Full text link
    While classical scaling, just like principal component analysis, is parameter-free, most other methods for embedding multivariate data require the selection of one or several parameters. This tuning can be difficult due to the unsupervised nature of the situation. We propose a simple, almost obvious, approach to supervise the choice of tuning parameter(s): minimize a notion of stress. We substantiate this choice by reference to rigidity theory. We extend a result by Aspnes et al. (IEEE Mobile Computing, 2006), showing that general random geometric graphs are trilateration graphs with high probability. And we provide a stability result \`a la Anderson et al. (SIAM Discrete Mathematics, 2010). We illustrate this approach in the context of the MDS-MAP(P) algorithm of Shang and Ruml (IEEE INFOCOM, 2004). As a prototypical patch-stitching method, it requires the choice of patch size, and we use the stress to make that choice data-driven. In this context, we perform a number of experiments to illustrate the validity of using the stress as the basis for tuning parameter selection. In so doing, we uncover a bias-variance tradeoff, which is a phenomenon which may have been overlooked in the multidimensional scaling literature. By turning MDS-MAP(P) into a method for manifold learning, we obtain a local version of Isomap for which the minimization of the stress may also be used for parameter tuning

    Greedy routing and virtual coordinates for future networks

    Get PDF
    At the core of the Internet, routers are continuously struggling with ever-growing routing and forwarding tables. Although hardware advances do accommodate such a growth, we anticipate new requirements e.g. in data-oriented networking where each content piece has to be referenced instead of hosts, such that current approaches relying on global information will not be viable anymore, no matter the hardware progress. In this thesis, we investigate greedy routing methods that can achieve similar routing performance as today but use much less resources and which rely on local information only. To this end, we add specially crafted name spaces to the network in which virtual coordinates represent the addressable entities. Our scheme enables participating routers to make forwarding decisions using only neighbourhood information, as the overarching pseudo-geometric name space structure already organizes and incorporates "vicinity" at a global level. A first challenge to the application of greedy routing on virtual coordinates to future networks is that of "routing dead-ends" that are local minima due to the difficulty of consistent coordinates attribution. In this context, we propose a routing recovery scheme based on a multi-resolution embedding of the network in low-dimensional Euclidean spaces. The recovery is performed by routing greedily on a blurrier view of the network. The different network detail-levels are obtained though the embedding of clustering-levels of the graph. When compared with higher-dimensional embeddings of a given network, our method shows a significant diminution of routing failures for similar header and control-state sizes. A second challenge to the application of virtual coordinates and greedy routing to future networks is the support of "customer-provider" as well as "peering" relationships between participants, resulting in a differentiated services environment. Although an application of greedy routing within such a setting would combine two very common fields of today's networking literature, such a scenario has, surprisingly, not been studied so far. In this context we propose two approaches to address this scenario. In a first approach we implement a path-vector protocol similar to that of BGP on top of a greedy embedding of the network. This allows each node to build a spatial map associated with each of its neighbours indicating the accessible regions. Routing is then performed through the use of a decision-tree classifier taking the destination coordinates as input. When applied on a real-world dataset (the CAIDA 2004 AS graph) we demonstrate an up to 40% compression ratio of the routing control information at the network's core as well as a computationally efficient decision process comparable to methods such as binary trees and tries. In a second approach, we take inspiration from consensus-finding in social sciences and transform the three-dimensional distance data structure (where the third dimension encodes the service differentiation) into a two-dimensional matrix on which classical embedding tools can be used. This transformation is achieved by agreeing on a set of constraints on the inter-node distances guaranteeing an administratively-correct greedy routing. The computed distances are also enhanced to encode multipath support. We demonstrate a good greedy routing performance as well as an above 90% satisfaction of multipath constraints when relying on the non-embedded obtained distances on synthetic datasets. As various embeddings of the consensus distances do not fully exploit their multipath potential, the use of compression techniques such as transform coding to approximate the obtained distance allows for better routing performances
    • …
    corecore