17,719 research outputs found
Learning with Algebraic Invariances, and the Invariant Kernel Trick
When solving data analysis problems it is important to integrate prior
knowledge and/or structural invariances. This paper contributes by a novel
framework for incorporating algebraic invariance structure into kernels. In
particular, we show that algebraic properties such as sign symmetries in data,
phase independence, scaling etc. can be included easily by essentially
performing the kernel trick twice. We demonstrate the usefulness of our theory
in simulations on selected applications such as sign-invariant spectral
clustering and underdetermined ICA
Fast Approximate Spectral Clustering for Dynamic Networks
Spectral clustering is a widely studied problem, yet its complexity is
prohibitive for dynamic graphs of even modest size. We claim that it is
possible to reuse information of past cluster assignments to expedite
computation. Our approach builds on a recent idea of sidestepping the main
bottleneck of spectral clustering, i.e., computing the graph eigenvectors, by
using fast Chebyshev graph filtering of random signals. We show that the
proposed algorithm achieves clustering assignments with quality approximating
that of spectral clustering and that it can yield significant complexity
benefits when the graph dynamics are appropriately bounded
A Practical Algorithm for Reconstructing Level-1 Phylogenetic Networks
Recently much attention has been devoted to the construction of phylogenetic
networks which generalize phylogenetic trees in order to accommodate complex
evolutionary processes. Here we present an efficient, practical algorithm for
reconstructing level-1 phylogenetic networks - a type of network slightly more
general than a phylogenetic tree - from triplets. Our algorithm has been made
publicly available as the program LEV1ATHAN. It combines ideas from several
known theoretical algorithms for phylogenetic tree and network reconstruction
with two novel subroutines. Namely, an exponential-time exact and a greedy
algorithm both of which are of independent theoretical interest. Most
importantly, LEV1ATHAN runs in polynomial time and always constructs a level-1
network. If the data is consistent with a phylogenetic tree, then the algorithm
constructs such a tree. Moreover, if the input triplet set is dense and, in
addition, is fully consistent with some level-1 network, it will find such a
network. The potential of LEV1ATHAN is explored by means of an extensive
simulation study and a biological data set. One of our conclusions is that
LEV1ATHAN is able to construct networks consistent with a high percentage of
input triplets, even when these input triplets are affected by a low to
moderate level of noise
Robust hierarchical k-center clustering
One of the most popular and widely used methods for data clustering is hierarchical clustering. This clustering technique has proved useful to reveal interesting structure in the data in several applications ranging from computational biology to computer vision. Robustness is an important feature of a clustering technique if we require the clustering to be stable against small perturbations in the input data. In most applications, getting a clustering output that is robust against adversarial outliers or stochastic noise is a necessary condition for the applicability and effectiveness of the clustering technique. This is even more critical in hierarchical clustering where a small change at the bottom of the hierarchy may propagate all the way through to the top. Despite all the previous work [2, 3, 6, 8], our theoretical understanding of robust hierarchical clustering is still limited and several hierarchical clustering algorithms are not known to satisfy such robustness properties. In this paper, we study the limits of robust hierarchical k-center clustering by introducing the concept of universal hierarchical clustering and provide (almost) tight lower and upper bounds for the robust hierarchical k-center clustering problem with outliers and variants of the stochastic clustering problem. Most importantly we present a constant-factor approximation for optimal hierarchical k-center with at most z outliers using a universal set of at most O(z2) set of outliers and show that this result is tight. Moreover we show the necessity of using a universal set of outliers in order to compute an approximately optimal hierarchical k-center with a diffierent set of outliers for each k
Trajectory and Policy Aware Sender Anonymity in Location Based Services
We consider Location-based Service (LBS) settings, where a LBS provider logs
the requests sent by mobile device users over a period of time and later wants
to publish/share these logs. Log sharing can be extremely valuable for
advertising, data mining research and network management, but it poses a
serious threat to the privacy of LBS users. Sender anonymity solutions prevent
a malicious attacker from inferring the interests of LBS users by associating
them with their service requests after gaining access to the anonymized logs.
With the fast-increasing adoption of smartphones and the concern that historic
user trajectories are becoming more accessible, it becomes necessary for any
sender anonymity solution to protect against attackers that are
trajectory-aware (i.e. have access to historic user trajectories) as well as
policy-aware (i.e they know the log anonymization policy). We call such
attackers TP-aware.
This paper introduces a first privacy guarantee against TP-aware attackers,
called TP-aware sender k-anonymity. It turns out that there are many possible
TP-aware anonymizations for the same LBS log, each with a different utility to
the consumer of the anonymized log. The problem of finding the optimal TP-aware
anonymization is investigated. We show that trajectory-awareness renders the
problem computationally harder than the trajectory-unaware variants found in
the literature (NP-complete in the size of the log, versus PTIME). We describe
a PTIME l-approximation algorithm for trajectories of length l and empirically
show that it scales to large LBS logs (up to 2 million users)
- …