6,457 research outputs found
Ternary Syndrome Decoding with Large Weight
The Syndrome Decoding problem is at the core of many code-based
cryptosystems. In this paper, we study ternary Syndrome Decoding in large
weight. This problem has been introduced in the Wave signature scheme but has
never been thoroughly studied. We perform an algorithmic study of this problem
which results in an update of the Wave parameters. On a more fundamental level,
we show that ternary Syndrome Decoding with large weight is a really harder
problem than the binary Syndrome Decoding problem, which could have several
applications for the design of code-based cryptosystems
A Fast Quartet Tree Heuristic for Hierarchical Clustering
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree
from the weighted quartet topologies on objects, where
optimality means that the summed weight of the embedded quartet topologies is
optimal (so it can be the case that the optimal tree embeds all quartets as
nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized
hill climbing, for approximating the optimal weight tree, given the quartet
topology weights. The method repeatedly transforms a dendrogram, with all
objects involved as leaves, achieving a monotonic approximation to the exact
single globally optimal tree. The problem and the solution heuristic has been
extensively used for general hierarchical clustering of nontree-like
(non-phylogeny) data in various domains and across domains with heterogeneous
data. We also present a greatly improved heuristic, reducing the running time
by a factor of order a thousand to ten thousand. All this is implemented and
available, as part of the CompLearn package. We compare performance and running
time of the original and improved versions with those of UPGMA, BioNJ, and NJ,
as implemented in the SplitsTree package on genomic data for which the latter
are optimized.
Keywords: Data and knowledge visualization, Pattern
matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering,
Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with
arXiv:cs/0606048 in cs.D
Inflammatory Bowel Disease Diagnosis Using Metagenomic Classification
Inflammatory bowel disease (IBD) is a set of disorders that involve chronic inflammation of digestive tracts, e.g., Crohn\u27s disease (CD) and ulcerative colitis (UC). Millions of people around the world have inflammatory bowel disease. However, it is still difficult to treat IBD due to its unknown cause. In fact, accurately diagnosing inflammatory bowel disease (IBD) can be very challenging too since some of IBD symptoms can mimic those of other conditions. In this work, we apply classification methods to help improve the success rate of diagnosis. We study four formulations of IBD classification: i) IBD and non-IBD (binary classification), ii) CD and non-IBD (binary classification), iii) UC and non-IBD (binary classification), and iv) UC, and non-IBD (ternary classification). We have applied a number of classification methods, including decision tree, Naive Bayes, K-nearest neighbor, and rule-based classifier, to the two IBD classification problems using a metagenomic dataset collected from stool samples. Our study shows that a rule-based classifier achieves the best combination of classification accuracy and readability. We also explored the roles of attributes in the diagnosis of IBD based on interpretation of learned models. Studying the importance of specific attributes could lead to a better understanding of IBD by either discovering new connections or reinforcing known ones
On the inducibility of small trees
The quantity that captures the asymptotic value of the maximum number of
appearances of a given topological tree (a rooted tree with no vertices of
outdegree ) with leaves in an arbitrary tree with sufficiently large
number of leaves is called the inducibility of . Its precise value is known
only for some specific families of trees, most of them exhibiting a symmetrical
configuration. In an attempt to answer a recent question posed by Czabarka,
Sz\'ekely, and the second author of this article, we provide bounds for the
inducibility of the -leaf binary tree whose branches are a
single leaf and the complete binary tree of height . It was indicated before
that appears to be `close' to . We can make this precise by
showing that . Furthermore, we
also consider the problem of determining the inducibility of the tree ,
which is the only tree among -leaf topological trees for which the
inducibility is unknown
Are galaxy distributions scale invariant? A perspective from dynamical systems theory
Unless there is evidence for fractal scaling with a single exponent over
distances .1 <= r <= 100 h^-1 Mpc then the widely accepted notion of scale
invariance of the correlation integral for .1 <= r <= 10 h^-1 Mpc must be
questioned. The attempt to extract a scaling exponent \nu from the correlation
integral n(r) by plotting log(n(r)) vs. log(r) is unreliable unless the
underlying point set is approximately monofractal. The extraction of a spectrum
of generalized dimensions \nu_q from a plot of the correlation integral
generating function G_n(q) by a similar procedure is probably an indication
that G_n(q) does not scale at all. We explain these assertions after defining
the term multifractal, mutually--inconsistent definitions having been confused
together in the cosmology literature. Part of this confusion is traced to a
misleading speculation made earlier in the dynamical systems theory literature,
while other errors follow from confusing together entirely different
definitions of ``multifractal'' from two different schools of thought. Most
important are serious errors in data analysis that follow from taking for
granted a largest term approximation that is inevitably advertised in the
literature on both fractals and dynamical systems theory.Comment: 39 pages, Latex with 17 eps-files, using epsf.sty and a4wide.sty
(included) <[email protected]
- …