578 research outputs found
Low-Rank Matrices on Graphs: Generalized Recovery & Applications
Many real world datasets subsume a linear or non-linear low-rank structure in
a very low-dimensional space. Unfortunately, one often has very little or no
information about the geometry of the space, resulting in a highly
under-determined recovery problem. Under certain circumstances,
state-of-the-art algorithms provide an exact recovery for linear low-rank
structures but at the expense of highly inscalable algorithms which use nuclear
norm. However, the case of non-linear structures remains unresolved. We revisit
the problem of low-rank recovery from a totally different perspective,
involving graphs which encode pairwise similarity between the data samples and
features. Surprisingly, our analysis confirms that it is possible to recover
many approximate linear and non-linear low-rank structures with recovery
guarantees with a set of highly scalable and efficient algorithms. We call such
data matrices as \textit{Low-Rank matrices on graphs} and show that many real
world datasets satisfy this assumption approximately due to underlying
stationarity. Our detailed theoretical and experimental analysis unveils the
power of the simple, yet very novel recovery framework \textit{Fast Robust PCA
on Graphs
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Fast Robust PCA on Graphs
Mining useful clusters from high dimensional data has received significant
attention of the computer vision and pattern recognition community in the
recent years. Linear and non-linear dimensionality reduction has played an
important role to overcome the curse of dimensionality. However, often such
methods are accompanied with three different problems: high computational
complexity (usually associated with the nuclear norm minimization),
non-convexity (for matrix factorization methods) and susceptibility to gross
corruptions in the data. In this paper we propose a principal component
analysis (PCA) based solution that overcomes these three issues and
approximates a low-rank recovery method for high dimensional datasets. We
target the low-rank recovery by enforcing two types of graph smoothness
assumptions, one on the data samples and the other on the features by designing
a convex optimization problem. The resulting algorithm is fast, efficient and
scalable for huge datasets with O(nlog(n)) computational complexity in the
number of data samples. It is also robust to gross corruptions in the dataset
as well as to the model parameters. Clustering experiments on 7 benchmark
datasets with different types of corruptions and background separation
experiments on 3 video datasets show that our proposed model outperforms 10
state-of-the-art dimensionality reduction models. Our theoretical analysis
proves that the proposed model is able to recover approximate low-rank
representations with a bounded error for clusterable data
Hypergraph Modelling for Geometric Model Fitting
In this paper, we propose a novel hypergraph based method (called HF) to fit
and segment multi-structural data. The proposed HF formulates the geometric
model fitting problem as a hypergraph partition problem based on a novel
hypergraph model. In the hypergraph model, vertices represent data points and
hyperedges denote model hypotheses. The hypergraph, with large and
"data-determined" degrees of hyperedges, can express the complex relationships
between model hypotheses and data points. In addition, we develop a robust
hypergraph partition algorithm to detect sub-hypergraphs for model fitting. HF
can effectively and efficiently estimate the number of, and the parameters of,
model instances in multi-structural data heavily corrupted with outliers
simultaneously. Experimental results show the advantages of the proposed method
over previous methods on both synthetic data and real images.Comment: Pattern Recognition, 201
Computationally Comparing Biological Networks and Reconstructing Their Evolution
Biological networks, such as protein-protein interaction, regulatory, or metabolic networks, provide information about biological function, beyond what can be gleaned from sequence alone. Unfortunately, most computational problems associated with these networks are NP-hard. In this dissertation, we develop algorithms to tackle numerous fundamental problems in the study of biological networks.
First, we present a system for classifying the binding affinity of peptides to a diverse array of immunoglobulin antibodies. Computational approaches to this problem are integral to virtual screening and modern drug discovery. Our system is based on an ensemble of support vector machines and exhibits state-of-the-art performance. It placed 1st in the 2010 DREAM5 competition.
Second, we investigate the problem of biological network alignment. Aligning the biological networks of different species allows for the discovery of shared structures and conserved pathways. We introduce an original procedure for network alignment based on a novel topological node signature. The pairwise global alignments of biological networks produced by our procedure, when evaluated under multiple metrics, are both more accurate and more robust to noise than those of previous work.
Next, we explore the problem of ancestral network reconstruction. Knowing the state of ancestral networks allows us to examine how biological pathways have evolved, and how pathways in extant species have diverged from that of their common ancestor. We describe a novel framework for representing the evolutionary histories of biological networks and present efficient algorithms for reconstructing either a single parsimonious evolutionary history, or an ensemble of near-optimal histories. Under multiple models of network evolution, our approaches are effective at inferring the ancestral network interactions. Additionally, the ensemble approach is robust to noisy input, and can be used to impute missing interactions in experimental data.
Finally, we introduce a framework, GrowCode, for learning network growth models. While previous work focuses on developing growth models manually, or on procedures for learning parameters for existing models, GrowCode learns fundamentally new growth models that match target networks in a flexible and user-defined way. We show that models learned by GrowCode produce networks whose target properties match those of real-world networks more closely than existing models
A Combinatorial Framework for Designing (Pseudoknotted) RNA Algorithms
We extend an hypergraph representation, introduced by Finkelstein and
Roytberg, to unify dynamic programming algorithms in the context of RNA folding
with pseudoknots. Classic applications of RNA dynamic programming energy
minimization, partition function, base-pair probabilities...) are reformulated
within this framework, giving rise to very simple algorithms. This
reformulation allows one to conceptually detach the conformation space/energy
model -- captured by the hypergraph model -- from the specific application,
assuming unambiguity of the decomposition. To ensure the latter property, we
propose a new combinatorial methodology based on generating functions. We
extend the set of generic applications by proposing an exact algorithm for
extracting generalized moments in weighted distribution, generalizing a prior
contribution by Miklos and al. Finally, we illustrate our full-fledged
programme on three exemplary conformation spaces (secondary structures,
Akutsu's simple type pseudoknots and kissing hairpins). This readily gives sets
of algorithms that are either novel or have complexity comparable to classic
implementations for minimization and Boltzmann ensemble applications of dynamic
programming
Addressing Computational Bottlenecks in Higher-Order Graph Matching with Tensor Kronecker Product Structure
Graph matching, also known as network alignment, is the problem of finding a
correspondence between the vertices of two separate graphs with strong
applications in image correspondence and functional inference in protein
networks. One class of successful techniques is based on tensor Kronecker
products and tensor eigenvectors. A challenge with these techniques are memory
and computational demands that are quadratic (or worse) in terms of problem
size. In this manuscript we present and apply a theory of tensor Kronecker
products to tensor based graph alignment algorithms to reduce their runtime
complexity from quadratic to linear with no appreciable loss of quality. In
terms of theory, we show that many matrix Kronecker product identities
generalize to straightforward tensor counterparts, which is rare in tensor
literature. Improved computation codes for two existing algorithms that utilize
this new theory achieve a minimum 10 fold runtime improvement.Comment: 14 pages, 2 pages Supplemental, 5 figure
- …