1,573 research outputs found
Non-linear dimensionality reduction techniques for classification
This thesis project concerns on dimensionality reduction through
manifold learning with a focus on non linear techniques.
Dimension Reduction (DR) is the process of reducing high dimension
dataset with d feature (dimension) to one with a lower number of feature p (p ≪ d) that preserves the information contained in the original
higher dimensional space. More in general, the concept of manifold
learning is introduced, a generalized approach that involves algorithm
for dimensionality reduction.
Manifold learning can be divided in two main categories: Linear and
Non Linear method. Although, linear method, such as Principal
Component Analysis (PCA) and Multidimensional Scaling (MDS) are
widely used and well known, there are plenty of non linear techniques
i.e. Isometric Feature Mapping (Isomap), Locally Linear Embedding
(LLE), Local Tangent Space Alignment (LTSA), which in recent years
have been subject of studies.
This project is inspired by the work done by [Bahadur et Al., 2017 ],
with the aim to estimate the US market dimensionality using Russell
3000 as a proxy of financial market.
Since financial markets are high dimensional and complex environment
an approach with non linear techniques among linear is proposed.This thesis project concerns on dimensionality reduction through
manifold learning with a focus on non linear techniques.
Dimension Reduction (DR) is the process of reducing high dimension
dataset with d feature (dimension) to one with a lower number of feature p (p ≪ d) that preserves the information contained in the original
higher dimensional space. More in general, the concept of manifold
learning is introduced, a generalized approach that involves algorithm
for dimensionality reduction.
Manifold learning can be divided in two main categories: Linear and
Non Linear method. Although, linear method, such as Principal
Component Analysis (PCA) and Multidimensional Scaling (MDS) are
widely used and well known, there are plenty of non linear techniques
i.e. Isometric Feature Mapping (Isomap), Locally Linear Embedding
(LLE), Local Tangent Space Alignment (LTSA), which in recent years
have been subject of studies.
This project is inspired by the work done by [Bahadur et Al., 2017 ],
with the aim to estimate the US market dimensionality using Russell
3000 as a proxy of financial market.
Since financial markets are high dimensional and complex environment
an approach with non linear techniques among linear is proposed
Supervising Embedding Algorithms Using the Stress
While classical scaling, just like principal component analysis, is
parameter-free, most other methods for embedding multivariate data require the
selection of one or several parameters. This tuning can be difficult due to the
unsupervised nature of the situation. We propose a simple, almost obvious,
approach to supervise the choice of tuning parameter(s): minimize a notion of
stress. We substantiate this choice by reference to rigidity theory. We extend
a result by Aspnes et al. (IEEE Mobile Computing, 2006), showing that general
random geometric graphs are trilateration graphs with high probability. And we
provide a stability result \`a la Anderson et al. (SIAM Discrete Mathematics,
2010). We illustrate this approach in the context of the MDS-MAP(P) algorithm
of Shang and Ruml (IEEE INFOCOM, 2004). As a prototypical patch-stitching
method, it requires the choice of patch size, and we use the stress to make
that choice data-driven. In this context, we perform a number of experiments to
illustrate the validity of using the stress as the basis for tuning parameter
selection. In so doing, we uncover a bias-variance tradeoff, which is a
phenomenon which may have been overlooked in the multidimensional scaling
literature. By turning MDS-MAP(P) into a method for manifold learning, we
obtain a local version of Isomap for which the minimization of the stress may
also be used for parameter tuning
Greedy routing and virtual coordinates for future networks
At the core of the Internet, routers are continuously struggling with
ever-growing routing and forwarding tables. Although hardware advances
do accommodate such a growth, we anticipate new requirements e.g. in
data-oriented networking where each content piece has to be referenced
instead of hosts, such that current approaches relying on global
information will not be viable anymore, no matter the hardware
progress. In this thesis, we investigate greedy routing methods that
can achieve similar routing performance as today but use much less
resources and which rely on local information only. To this end, we
add specially crafted name spaces to the network in which virtual
coordinates represent the addressable entities. Our scheme enables participating
routers to make forwarding decisions using only neighbourhood information,
as the overarching pseudo-geometric name space structure already
organizes and incorporates "vicinity" at a global level.
A first challenge to the application of greedy routing on virtual
coordinates to future networks is that of "routing dead-ends"
that are local minima due to the difficulty of consistent coordinates
attribution. In this context, we propose a routing recovery scheme
based on a multi-resolution embedding of the network in low-dimensional Euclidean spaces.
The recovery is performed by routing greedily on a blurrier view of the network. The
different network detail-levels are obtained though the embedding of
clustering-levels of the graph. When compared with
higher-dimensional embeddings of a given network, our method shows a
significant diminution of routing failures for similar header and
control-state sizes.
A second challenge to the application of virtual coordinates and
greedy routing to future networks is the support of
"customer-provider" as well as "peering" relationships between
participants, resulting in a differentiated services
environment. Although an application of greedy routing within such a
setting would combine two very common fields of today's networking
literature, such a scenario has, surprisingly, not been studied so
far. In this context we propose two approaches to address this scenario.
In a first approach we implement a path-vector protocol similar to
that of BGP on top of a greedy embedding of the network. This allows
each node to build a spatial map associated with each of its
neighbours indicating the accessible regions. Routing is then
performed through the use of a decision-tree classifier taking the
destination coordinates as input. When applied on a real-world dataset
(the CAIDA 2004 AS graph) we demonstrate an up to 40% compression ratio of
the routing control information at the network's core as well as a computationally efficient
decision process comparable to methods such as binary trees and tries.
In a second approach, we take inspiration from consensus-finding in social
sciences and transform the three-dimensional distance data structure
(where the third dimension encodes the service differentiation) into a
two-dimensional matrix on which classical embedding tools can be used.
This transformation is achieved by agreeing on a set of
constraints on the inter-node distances guaranteeing an
administratively-correct greedy routing. The computed distances are
also enhanced to encode multipath support. We demonstrate a good
greedy routing performance as well as an above 90% satisfaction of multipath constraints
when relying on the non-embedded obtained distances on synthetic datasets.
As various embeddings of the consensus distances do not fully exploit their multipath potential, the use of compression techniques such as transform coding to
approximate the obtained distance allows for better routing performances
- …