1,705 research outputs found
A Euclidean Distance Matrix Model for Convex Clustering
Clustering has been one of the most basic and essential problems in
unsupervised learning due to various applications in many critical fields. The
recently proposed sum-of-nums (SON) model by Pelckmans et al. (2005), Lindsten
et al. (2011) and Hocking et al. (2011) has received a lot of attention. The
advantage of the SON model is the theoretical guarantee in terms of perfect
recovery, established by Sun et al. (2018). It also provides great
opportunities for designing efficient algorithms for solving the SON model. The
semismooth Newton based augmented Lagrangian method by Sun et al. (2018) has
demonstrated its superior performance over the alternating direction method of
multipliers (ADMM) and the alternating minimization algorithm (AMA). In this
paper, we propose a Euclidean distance matrix model based on the SON model. An
efficient majorization penalty algorithm is proposed to solve the resulting
model. Extensive numerical experiments are conducted to demonstrate the
efficiency of the proposed model and the majorization penalty algorithm.Comment: 32 pages, 3 figures, 3 table
Greedy routing and virtual coordinates for future networks
At the core of the Internet, routers are continuously struggling with
ever-growing routing and forwarding tables. Although hardware advances
do accommodate such a growth, we anticipate new requirements e.g. in
data-oriented networking where each content piece has to be referenced
instead of hosts, such that current approaches relying on global
information will not be viable anymore, no matter the hardware
progress. In this thesis, we investigate greedy routing methods that
can achieve similar routing performance as today but use much less
resources and which rely on local information only. To this end, we
add specially crafted name spaces to the network in which virtual
coordinates represent the addressable entities. Our scheme enables participating
routers to make forwarding decisions using only neighbourhood information,
as the overarching pseudo-geometric name space structure already
organizes and incorporates "vicinity" at a global level.
A first challenge to the application of greedy routing on virtual
coordinates to future networks is that of "routing dead-ends"
that are local minima due to the difficulty of consistent coordinates
attribution. In this context, we propose a routing recovery scheme
based on a multi-resolution embedding of the network in low-dimensional Euclidean spaces.
The recovery is performed by routing greedily on a blurrier view of the network. The
different network detail-levels are obtained though the embedding of
clustering-levels of the graph. When compared with
higher-dimensional embeddings of a given network, our method shows a
significant diminution of routing failures for similar header and
control-state sizes.
A second challenge to the application of virtual coordinates and
greedy routing to future networks is the support of
"customer-provider" as well as "peering" relationships between
participants, resulting in a differentiated services
environment. Although an application of greedy routing within such a
setting would combine two very common fields of today's networking
literature, such a scenario has, surprisingly, not been studied so
far. In this context we propose two approaches to address this scenario.
In a first approach we implement a path-vector protocol similar to
that of BGP on top of a greedy embedding of the network. This allows
each node to build a spatial map associated with each of its
neighbours indicating the accessible regions. Routing is then
performed through the use of a decision-tree classifier taking the
destination coordinates as input. When applied on a real-world dataset
(the CAIDA 2004 AS graph) we demonstrate an up to 40% compression ratio of
the routing control information at the network's core as well as a computationally efficient
decision process comparable to methods such as binary trees and tries.
In a second approach, we take inspiration from consensus-finding in social
sciences and transform the three-dimensional distance data structure
(where the third dimension encodes the service differentiation) into a
two-dimensional matrix on which classical embedding tools can be used.
This transformation is achieved by agreeing on a set of
constraints on the inter-node distances guaranteeing an
administratively-correct greedy routing. The computed distances are
also enhanced to encode multipath support. We demonstrate a good
greedy routing performance as well as an above 90% satisfaction of multipath constraints
when relying on the non-embedded obtained distances on synthetic datasets.
As various embeddings of the consensus distances do not fully exploit their multipath potential, the use of compression techniques such as transform coding to
approximate the obtained distance allows for better routing performances
A Facial Reduction Approach for the Single Source Localization Problem
The single source localization problem (SSLP) appears in several fields such
as signal processing and global positioning systems. The optimization problem
of SSLP is nonconvex and difficult to find its globally optima solution. It can
be reformulated as a rank constrained Euclidean distance matrix (EDM)
completion problem with a number of equality constraints. In this paper, we
propose a facial reduction approach to solve such an EDM completion problem.
For the constraints of fixed distances between sensors, we reduce them to a
face of the EDM cone and derive the closed formulation of the face. We prove
constraint nondegeneracy for each feasible point of the resulting EDM
optimization problem without a rank constraint, which guarantees the quadratic
convergence of semismooth Newton's method. To tackle the nonconvex rank
constraint, we apply the majorized penalty approach developed by Zhou et al.
(IEEE Trans Signal Process 66(3):4331-4346, 2018). Numerical results verify the
fast speed of the proposed approach while giving comparable quality of
solutions as other methods
Euclidean distance geometry and applications
Euclidean distance geometry is the study of Euclidean geometry based on the
concept of distance. This is useful in several applications where the input
data consists of an incomplete set of distances, and the output is a set of
points in Euclidean space that realizes the given distances. We survey some of
the theory of Euclidean distance geometry and some of the most important
applications: molecular conformation, localization of sensor networks and
statics.Comment: 64 pages, 21 figure
Optimization Methods for Tabular Data Protection
In this thesis we consider a minimum distance Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation (control) of tabular data. The goal of the CTA model is to find the closest safe table to some original tabular data set that contains sensitive information. The measure of closeness is usually measured using l1 or l2 norm; with each measure having its advantages and disadvantages. According to the given norm CTA can be formulated as an optimization problem: Liner Programing (LP) for l1, Quadratic Programing (QP) for l2. In this thesis we present an alternative reformulation of l1-CTA as Second-Order Cone (SOC) optimization problems. All three models can be solved using appropriate versions of Interior-Point Methods (IPM). The validity of the new approach was tested on the randomly generated two-dimensional tabular data sets. It was shown numerically, that SOC formulation compares favorably to QP and LP formulations
Sparsity with sign-coherent groups of variables via the cooperative-Lasso
We consider the problems of estimation and selection of parameters endowed
with a known group structure, when the groups are assumed to be sign-coherent,
that is, gathering either nonnegative, nonpositive or null parameters. To
tackle this problem, we propose the cooperative-Lasso penalty. We derive the
optimality conditions defining the cooperative-Lasso estimate for generalized
linear models, and propose an efficient active set algorithm suited to
high-dimensional problems. We study the asymptotic consistency of the estimator
in the linear regression setup and derive its irrepresentable conditions, which
are milder than the ones of the group-Lasso regarding the matching of groups
with the sparsity pattern of the true parameters. We also address the problem
of model selection in linear regression by deriving an approximation of the
degrees of freedom of the cooperative-Lasso estimator. Simulations comparing
the proposed estimator to the group and sparse group-Lasso comply with our
theoretical results, showing consistent improvements in support recovery for
sign-coherent groups. We finally propose two examples illustrating the wide
applicability of the cooperative-Lasso: first to the processing of ordinal
variables, where the penalty acts as a monotonicity prior; second to the
processing of genomic data, where the set of differentially expressed probes is
enriched by incorporating all the probes of the microarray that are related to
the corresponding genes.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS520 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Insights into Ordinal Embedding Algorithms: A Systematic Evaluation
The objective of ordinal embedding is to find a Euclidean representation of a
set of abstract items, using only answers to triplet comparisons of the form
"Is item closer to the item or item ?". In recent years, numerous
algorithms have been proposed to solve this problem. However, there does not
exist a fair and thorough assessment of these embedding methods and therefore
several key questions remain unanswered: Which algorithms scale better with
increasing sample size or dimension? Which ones perform better when the
embedding dimension is small or few triplet comparisons are available? In our
paper, we address these questions and provide the first comprehensive and
systematic empirical evaluation of existing algorithms as well as a new neural
network approach. In the large triplet regime, we find that simple, relatively
unknown, non-convex methods consistently outperform all other algorithms,
including elaborate approaches based on neural networks or landmark approaches.
This finding can be explained by our insight that many of the non-convex
optimization approaches do not suffer from local optima. In the low triplet
regime, our neural network approach is either competitive or significantly
outperforms all the other methods. Our comprehensive assessment is enabled by
our unified library of popular embedding algorithms that leverages GPU
resources and allows for fast and accurate embeddings of millions of data
points
- …