16,463 research outputs found
Fast multi-image matching via density-based clustering
We consider the problem of finding consistent matches
across multiple images. Previous state-of-the-art solutions
use constraints on cycles of matches together with convex
optimization, leading to computationally intensive iterative
algorithms. In this paper, we propose a clustering-based
formulation. We first rigorously show its equivalence with
the previous one, and then propose QuickMatch, a novel
algorithm that identifies multi-image matches from a density
function in feature space. We use the density to order the
points in a tree, and then extract the matches by breaking this
tree using feature distances and measures of distinctiveness.
Our algorithm outperforms previous state-of-the-art methods
(such as MatchALS) in accuracy, and it is significantly faster
(up to 62 times faster on some bechmarks), and can scale to
large datasets (with more than twenty thousands features).Accepted manuscriptSupporting documentatio
Automatic Image Segmentation by Dynamic Region Merging
This paper addresses the automatic image segmentation problem in a region
merging style. With an initially over-segmented image, in which the many
regions (or super-pixels) with homogeneous color are detected, image
segmentation is performed by iteratively merging the regions according to a
statistical test. There are two essential issues in a region merging algorithm:
order of merging and the stopping criterion. In the proposed algorithm, these
two issues are solved by a novel predicate, which is defined by the sequential
probability ratio test (SPRT) and the maximum likelihood criterion. Starting
from an over-segmented image, neighboring regions are progressively merged if
there is an evidence for merging according to this predicate. We show that the
merging order follows the principle of dynamic programming. This formulates
image segmentation as an inference problem, where the final segmentation is
established based on the observed image. We also prove that the produced
segmentation satisfies certain global properties. In addition, a faster
algorithm is developed to accelerate the region merging process, which
maintains a nearest neighbor graph in each iteration. Experiments on real
natural images are conducted to demonstrate the performance of the proposed
dynamic region merging algorithm.Comment: 28 pages. This paper is under review in IEEE TI
Reconstruction of an in silico metabolic model of _Arabidopsis thaliana_ through database integration
The number of genome-scale metabolic models has been rising quickly in recent years, and the scope of their utilization encompasses a broad range of applications from metabolic engineering to biological discovery. However the reconstruction of such models remains an arduous process requiring a high level of human intervention. Their utilization is further hampered by the absence of standardized data and annotation formats and the lack of recognized quality and validation standards.

Plants provide a particularly rich range of perspectives for applications of metabolic modeling. We here report the first effort to the reconstruction of a genome-scale model of the metabolic network of the plant _Arabidopsis thaliana_, including over 2300 reactions and compounds. Our reconstruction was performed using a semi-automatic methodology based on the integration of two public genome-wide databases, significantly accelerating the process. Database entries were compared and integrated with each other, allowing us to resolve discrepancies and enhance the quality of the reconstruction. This process lead to the construction of three models based on different quality and validation standards, providing users with the possibility to choose the standard that is most appropriate for a given application. First, a _core metabolic model_ containing only consistent data provides a high quality model that was shown to be stoichiometrically consistent. Second, an _intermediate metabolic model_ attempts to fill gaps and provides better continuity. Third, a _complete metabolic model_ contains the full set of known metabolic reactions and compounds in _Arabidopsis thaliana_.

We provide an annotated SBML file of our core model to enable the maximum level of compatibility with existing tools and databases. We eventually discuss a series of principles to raise awareness of the need to develop coordinated efforts and common standards for the reconstruction of genome-scale metabolic models, with the aim of enabling their widespread diffusion, frequent update, maximum compatibility and convenience of use by the wider research community and industry
Learning Generative Models across Incomparable Spaces
Generative Adversarial Networks have shown remarkable success in learning a
distribution that faithfully recovers a reference distribution in its entirety.
However, in some cases, we may want to only learn some aspects (e.g., cluster
or manifold structure), while modifying others (e.g., style, orientation or
dimension). In this work, we propose an approach to learn generative models
across such incomparable spaces, and demonstrate how to steer the learned
distribution towards target properties. A key component of our model is the
Gromov-Wasserstein distance, a notion of discrepancy that compares
distributions relationally rather than absolutely. While this framework
subsumes current generative models in identically reproducing distributions,
its inherent flexibility allows application to tasks in manifold learning,
relational learning and cross-domain learning.Comment: International Conference on Machine Learning (ICML
On NP-Hardness of the Paired de Bruijn Sound Cycle Problem
The paired de Bruijn graph is an extension of de Bruijn graph incorporating
mate pair information for genome assembly proposed by Mevdedev et al. However,
unlike in an ordinary de Bruijn graph, not every path or cycle in a paired de
Bruijn graph will spell a string, because there is an additional soundness
constraint on the path. In this paper we show that the problem of checking if
there is a sound cycle in a paired de Bruijn graph is NP-hard in general case.
We also explore some of its special cases, as well as a modified version where
the cycle must also pass through every edge.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
Matching Dynamics with Constraints
We study uncoordinated matching markets with additional local constraints
that capture, e.g., restricted information, visibility, or externalities in
markets. Each agent is a node in a fixed matching network and strives to be
matched to another agent. Each agent has a complete preference list over all
other agents it can be matched with. However, depending on the constraints and
the current state of the game, not all possible partners are available for
matching at all times. For correlated preferences, we propose and study a
general class of hedonic coalition formation games that we call coalition
formation games with constraints. This class includes and extends many recently
studied variants of stable matching, such as locally stable matching, socially
stable matching, or friendship matching. Perhaps surprisingly, we show that all
these variants are encompassed in a class of "consistent" instances that always
allow a polynomial improvement sequence to a stable state. In addition, we show
that for consistent instances there always exists a polynomial sequence to
every reachable state. Our characterization is tight in the sense that we
provide exponential lower bounds when each of the requirements for consistency
is violated. We also analyze matching with uncorrelated preferences, where we
obtain a larger variety of results. While socially stable matching always
allows a polynomial sequence to a stable state, for other classes different
additional assumptions are sufficient to guarantee the same results. For the
problem of reaching a given stable state, we show NP-hardness in almost all
considered classes of matching games.Comment: Conference Version in WINE 201
Hybrid tractability of soft constraint problems
The constraint satisfaction problem (CSP) is a central generic problem in
computer science and artificial intelligence: it provides a common framework
for many theoretical problems as well as for many real-life applications. Soft
constraint problems are a generalisation of the CSP which allow the user to
model optimisation problems. Considerable effort has been made in identifying
properties which ensure tractability in such problems. In this work, we
initiate the study of hybrid tractability of soft constraint problems; that is,
properties which guarantee tractability of the given soft constraint problem,
but which do not depend only on the underlying structure of the instance (such
as being tree-structured) or only on the types of soft constraints in the
instance (such as submodularity). We present several novel hybrid classes of
soft constraint problems, which include a machine scheduling problem,
constraint problems of arbitrary arities with no overlapping nogoods, and the
SoftAllDiff constraint with arbitrary unary soft constraints. An important tool
in our investigation will be the notion of forbidden substructures.Comment: A full version of a CP'10 paper, 26 page
Finding conserved patterns in biological sequences, networks and genomes
Biological patterns are widely used for identifying biologically interesting regions
within macromolecules, classifying biological objects, predicting functions and studying
evolution. Good pattern finding algorithms will help biologists to formulate and
validate hypotheses in an attempt to obtain important insights into the complex
mechanisms of living things.
In this dissertation, we aim to improve and develop algorithms for five biological
pattern finding problems. For the multiple sequence alignment problem, we propose
an alternative formulation in which a final alignment is obtained by preserving pairwise
alignments specified by edges of a given tree. In contrast with traditional NPhard
formulations, our preserving alignment formulation can be solved in polynomial
time without using a heuristic, while having very good accuracy.
For the path matching problem, we take advantage of the linearity of the query
path to reduce the problem to finding a longest weighted path in a directed acyclic
graph. We can find k paths with top scores in a network from the query path in
polynomial time. As many biological pathways are not linear, our graph matching
approach allows a non-linear graph query to be given. Our graph matching formulation
overcomes the common weakness of previous approaches that there is no
guarantee on the quality of the results.
For the gene cluster finding problem, we investigate a formulation based on constraining the overall size of a cluster and develop statistical significance estimates that
allow direct comparisons of clusters of different sizes. We explore both a restricted
version which requires that orthologous genes are strictly ordered within each cluster,
and the unrestricted problem that allows paralogous genes within a genome and clusters
that may not appear in every genome. We solve the first problem in polynomial
time and develop practical exact algorithms for the second one.
In the gene cluster querying problem, based on a querying strategy, we propose
an efficient approach for investigating clustering of related genes across multiple
genomes for a given gene cluster. By analyzing gene clustering in 400 bacterial
genomes, we show that our algorithm is efficient enough to study gene clusters across
hundreds of genomes
- …