6,457 research outputs found

    Ternary Syndrome Decoding with Large Weight

    Get PDF
    The Syndrome Decoding problem is at the core of many code-based cryptosystems. In this paper, we study ternary Syndrome Decoding in large weight. This problem has been introduced in the Wave signature scheme but has never been thoroughly studied. We perform an algorithmic study of this problem which results in an update of the Wave parameters. On a more fundamental level, we show that ternary Syndrome Decoding with large weight is a really harder problem than the binary Syndrome Decoding problem, which could have several applications for the design of code-based cryptosystems

    A Fast Quartet Tree Heuristic for Hierarchical Clustering

    Get PDF
    The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the 3(n4)3{n \choose 4} weighted quartet topologies on nn objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improved versions with those of UPGMA, BioNJ, and NJ, as implemented in the SplitsTree package on genomic data for which the latter are optimized. Keywords: Data and knowledge visualization, Pattern matching--Clustering--Algorithms/Similarity measures, Hierarchical clustering, Global optimization, Quartet tree, Randomized hill-climbing,Comment: LaTeX, 40 pages, 11 figures; this paper has substantial overlap with arXiv:cs/0606048 in cs.D

    Inflammatory Bowel Disease Diagnosis Using Metagenomic Classification

    Get PDF
    Inflammatory bowel disease (IBD) is a set of disorders that involve chronic inflammation of digestive tracts, e.g., Crohn\u27s disease (CD) and ulcerative colitis (UC). Millions of people around the world have inflammatory bowel disease. However, it is still difficult to treat IBD due to its unknown cause. In fact, accurately diagnosing inflammatory bowel disease (IBD) can be very challenging too since some of IBD symptoms can mimic those of other conditions. In this work, we apply classification methods to help improve the success rate of diagnosis. We study four formulations of IBD classification: i) IBD and non-IBD (binary classification), ii) CD and non-IBD (binary classification), iii) UC and non-IBD (binary classification), and iv) UC, and non-IBD (ternary classification). We have applied a number of classification methods, including decision tree, Naive Bayes, K-nearest neighbor, and rule-based classifier, to the two IBD classification problems using a metagenomic dataset collected from stool samples. Our study shows that a rule-based classifier achieves the best combination of classification accuracy and readability. We also explored the roles of attributes in the diagnosis of IBD based on interpretation of learned models. Studying the importance of specific attributes could lead to a better understanding of IBD by either discovering new connections or reinforcing known ones

    On the inducibility of small trees

    Get PDF
    The quantity that captures the asymptotic value of the maximum number of appearances of a given topological tree (a rooted tree with no vertices of outdegree 11) SS with kk leaves in an arbitrary tree with sufficiently large number of leaves is called the inducibility of SS. Its precise value is known only for some specific families of trees, most of them exhibiting a symmetrical configuration. In an attempt to answer a recent question posed by Czabarka, Sz\'ekely, and the second author of this article, we provide bounds for the inducibility J(A5)J(A_5) of the 55-leaf binary tree A5A_5 whose branches are a single leaf and the complete binary tree of height 22. It was indicated before that J(A5)J(A_5) appears to be `close' to 1/41/4. We can make this precise by showing that 0.24707J(A5)0.247450.24707\ldots \leq J(A_5) \leq 0.24745\ldots. Furthermore, we also consider the problem of determining the inducibility of the tree Q4Q_4, which is the only tree among 44-leaf topological trees for which the inducibility is unknown

    Are galaxy distributions scale invariant? A perspective from dynamical systems theory

    Get PDF
    Unless there is evidence for fractal scaling with a single exponent over distances .1 <= r <= 100 h^-1 Mpc then the widely accepted notion of scale invariance of the correlation integral for .1 <= r <= 10 h^-1 Mpc must be questioned. The attempt to extract a scaling exponent \nu from the correlation integral n(r) by plotting log(n(r)) vs. log(r) is unreliable unless the underlying point set is approximately monofractal. The extraction of a spectrum of generalized dimensions \nu_q from a plot of the correlation integral generating function G_n(q) by a similar procedure is probably an indication that G_n(q) does not scale at all. We explain these assertions after defining the term multifractal, mutually--inconsistent definitions having been confused together in the cosmology literature. Part of this confusion is traced to a misleading speculation made earlier in the dynamical systems theory literature, while other errors follow from confusing together entirely different definitions of ``multifractal'' from two different schools of thought. Most important are serious errors in data analysis that follow from taking for granted a largest term approximation that is inevitably advertised in the literature on both fractals and dynamical systems theory.Comment: 39 pages, Latex with 17 eps-files, using epsf.sty and a4wide.sty (included) <[email protected]
    corecore