45,317 research outputs found
Pattern vectors from algebraic graph theory
Graphstructures have proven computationally cumbersome for pattern analysis. The reason for this is that, before graphs can be converted to pattern vectors, correspondences must be established between the nodes of structures which are potentially of different size. To overcome this problem, in this paper, we turn to the spectral decomposition of the Laplacian matrix. We show how the elements of the spectral matrix for the Laplacian can be used to construct symmetric polynomials that are permutation invariants. The coefficients of these polynomials can be used as graph features which can be encoded in a vectorial manner. We extend this representation to graphs in which there are unary attributes on the nodes and binary attributes on the edges by using the spectral decomposition of a Hermitian property matrix that can be viewed as a complex analogue of the Laplacian. To embed the graphs in a pattern space, we explore whether the vectors of invariants can be embedded in a low- dimensional space using a number of alternative strategies, including principal components analysis ( PCA), multidimensional scaling ( MDS), and locality preserving projection ( LPP). Experimentally, we demonstrate that the embeddings result in well- defined graph clusters. Our experiments with the spectral representation involve both synthetic and real- world data. The experiments with synthetic data demonstrate that the distances between spectral feature vectors can be used to discriminate between graphs on the basis of their structure. The real- world experiments show that the method can be used to locate clusters of graphs
Phylogenetic mixtures on a single tree can mimic a tree of another topology
Phylogenetic mixtures model the inhomogeneous molecular evolution commonly
observed in data. The performance of phylogenetic reconstruction methods where
the underlying data is generated by a mixture model has stimulated considerable
recent debate. Much of the controversy stems from simulations of mixture model
data on a given tree topology for which reconstruction algorithms output a tree
of a different topology; these findings were held up to show the shortcomings
of particular tree reconstruction methods. In so doing, the underlying
assumption was that mixture model data on one topology can be distinguished
from data evolved on an unmixed tree of another topology given enough data and
the ``correct'' method. Here we show that this assumption can be false. For
biologists our results imply that, for example, the combined data from two
genes whose phylogenetic trees differ only in terms of branch lengths can
perfectly fit a tree of a different topology
Partial Homology Relations - Satisfiability in terms of Di-Cographs
Directed cographs (di-cographs) play a crucial role in the reconstruction of
evolutionary histories of genes based on homology relations which are binary
relations between genes. A variety of methods based on pairwise sequence
comparisons can be used to infer such homology relations (e.g.\ orthology,
paralogy, xenology). They are \emph{satisfiable} if the relations can be
explained by an event-labeled gene tree, i.e., they can simultaneously co-exist
in an evolutionary history of the underlying genes. Every gene tree is
equivalently interpreted as a so-called cotree that entirely encodes the
structure of a di-cograph. Thus, satisfiable homology relations must
necessarily form a di-cograph. The inferred homology relations might not cover
each pair of genes and thus, provide only partial knowledge on the full set of
homology relations. Moreover, for particular pairs of genes, it might be known
with a high degree of certainty that they are not orthologs (resp.\ paralogs,
xenologs) which yields forbidden pairs of genes. Motivated by this observation,
we characterize (partial) satisfiable homology relations with or without
forbidden gene pairs, provide a quadratic-time algorithm for their recognition
and for the computation of a cotree that explains the given relations
Vote-boosting ensembles
Vote-boosting is a sequential ensemble learning method in which the
individual classifiers are built on different weighted versions of the training
data. To build a new classifier, the weight of each training instance is
determined in terms of the degree of disagreement among the current ensemble
predictions for that instance. For low class-label noise levels, especially
when simple base learners are used, emphasis should be made on instances for
which the disagreement rate is high. When more flexible classifiers are used
and as the noise level increases, the emphasis on these uncertain instances
should be reduced. In fact, at sufficiently high levels of class-label noise,
the focus should be on instances on which the ensemble classifiers agree. The
optimal type of emphasis can be automatically determined using
cross-validation. An extensive empirical analysis using the beta distribution
as emphasis function illustrates that vote-boosting is an effective method to
generate ensembles that are both accurate and robust
Commutative combinatorial Hopf algebras
We propose several constructions of commutative or cocommutative Hopf
algebras based on various combinatorial structures, and investigate the
relations between them. A commutative Hopf algebra of permutations is obtained
by a general construction based on graphs, and its non-commutative dual is
realized in three different ways, in particular as the Grossman-Larson algebra
of heap ordered trees.
Extensions to endofunctions, parking functions, set compositions, set
partitions, planar binary trees and rooted forests are discussed. Finally, we
introduce one-parameter families interpolating between different structures
constructed on the same combinatorial objects.Comment: 29 pages, LaTEX; expanded and updated version of math.CO/050245
- …