224 research outputs found

    A Hall-type theorem for triplet set systems based on medians in trees

    Get PDF
    Given a collection \C of subsets of a finite set XX, let \bigcup \C = \cup_{S \in \C}S. Philip Hall's celebrated theorem \cite{hall} concerning `systems of distinct representatives' tells us that for any collection \C of subsets of XX there exists an injective (i.e. one-to-one) function f: \C \to X with f(S)∈Sf(S) \in S for all S \in \C if and and only if \C satisfies the property that for all non-empty subsets \C' of \C we have |\bigcup \C'| \geq |\C'|. Here we show that if the condition |\bigcup \C'| \geq |\C'| is replaced by the stronger condition |\bigcup \C'| \geq |\C'|+2, then we obtain a characterization of this condition for a collection of 3-element subsets of XX in terms of the existence of an injective function from \C to the vertices of a tree whose vertex set includes XX and that satisfies a certain median condition. We then describe an extension of this result to collections of arbitrary-cardinality subsets of XX.Comment: 6 pages, no figure

    Minimum triplet covers of binary phylogenetic X-trees

    Get PDF
    Trees with labelled leaves and with all other vertices of degree three play an important role in systematic biology and other areas of classification. A classical combinatorial result ensures that such trees can be uniquely reconstructed from the distances between the leaves (when the edges are given any strictly positive lengths). Moreover, a linear number of these pairwise distance values suffices to determine both the tree and its edge lengths. A natural set of pairs of leaves is provided by any `triplet cover' of the tree (based on the fact that each non-leaf vertex is the median vertex of three leaves). In this paper we describe a number of new results concerning triplet covers of minimum size. In particular, we characterize such covers in terms of an associated graph being a 2-tree. Also, we show that minimum triplet covers are `shellable' and thereby provide a set of pairs for which the inter-leaf distance values will uniquely determine the underlying tree and its associated branch lengths

    On Patchworks and Hierarchies

    Full text link
    Motivated by questions in biological classification, we discuss some elementary combinatorial and computational properties of certain set systems that generalize hierarchies, namely, 'patchworks', 'weak patchworks', 'ample patchworks' and 'saturated patchworks' and also outline how these concepts relate to an apparently new 'duality theory' for cluster systems that is based on the fundamental concept of 'compatibility' of clusters.Comment: 17 pages, 2 figure

    Combinatorial properties of triplet covers for binary trees

    Get PDF
    It is a classical result that an unrooted tree TT having positive real-valued edge lengths and no vertices of degree two can be reconstructed from the induced distance between each pair of leaves. Moreover, if each non-leaf vertex of TT has degree 3 then the number of distance values required is linear in the number of leaves. A canonical candidate for such a set of pairs of leaves in TT is the following: for each non-leaf vertex vv, choose a leaf in each of the three components of T−vT-v, group these three leaves into three pairs, and take the union of this set over all choices of vv. This forms a so-called `triplet cover' for TT. In the first part of this paper we answer an open question (from 2012) by showing that the induced leaf-to-leaf distances for any triplet cover for TT uniquely determine TT and its edge lengths. We then investigate the finer combinatorial properties of triplet covers. In particular, we describe the structure of triplet covers that satisfy one or more of the following properties of being minimal, `sparse', and `shellable'

    Phylogenetic Flexibility via Hall-Type Inequalities and Submodularity

    Get PDF
    Given a collection τ of subsets of a finite set X, we say that τ is phylogenetically flexible if, for any collection R of rooted phylogenetic trees whose leaf sets comprise the collection τ , R is compatible (i.e. there is a rooted phylogenetic X-tree that displays each tree in R). We show that τ is phylogenetically flexible if and only if it satisfies a Hall-type inequality condition of being ‘slim’. Using submodularity arguments, we show that there is a polynomial-time algorithm for determining whether or not τ is slim. This ‘slim’ condition reduces to a simpler inequality in the case where all of the sets in τ have size 3, a property we call ‘thin’. Thin sets were recently shown to be equivalent to the existence of an (unrooted) tree for which the median function provides an injective mapping to its vertex set; we show here that the unrooted tree in this representation can always be chosen to be a caterpillar tree. We also characterise when a collection τ of subsets of size 2 is thin (in terms of the flexibility of total orders rather than phylogenies) and show that this holds if and only if an associated bipartite graph is a forest. The significance of our results for phylogenetics is in providing precise and efficiently verifiable conditions under which supertree methods that require consistent inputs of trees can be applied to any input trees on given subsets of species

    Acta Cybernetica : Volume 14. Number 2.

    Get PDF

    Some Bayesian and multivariate analysis methods in statistical machine learning and applications

    Get PDF
    In this dissertation, we consider some Bayesian and multivariate analysis methods in statistical machine learning as well as some applications of Bayesian methodology with differential equation models to study dynamics during co-infections by Leishmania major and Leishmania amazonensis based on longitudinal data. First, we developed a new MCMC algorithm to integrate the curvature information of a target distribution to sample the target distribution accurately and efficiently. We then introduced a Bayesian Hierarchical Topographic Clustering method (BHTC) motivated by the well-known self-organizing map (SOM) using stationary isotropic Gaussian processes and principal component approximations. We constructed a computationally tractable MCMC algorithm to sample posterior distributions of the covariance matrices, as well as the posterior distributions of remaining BHTC parameters. To summarize the posterior distributions of BHTC parameters in a coherent fashion for the purpose of data clustering, we adopted a posterior risk framework that accounts for both data partitioning and topographic preservation. We also proposed a classification method based on the weighted bootstrap and ensemble mechanism to deal with covariate shifts in classifications, the Active Set Selections based Classification (ASSC). This procedure is flexible to be combined with classification methods including support vector machine (SVM), classification trees, and Fisher\u27s discriminant classifier (LDA) etc. to improve their performances. We adopted Bayesian methodologies to study longitudinal data from co-infections by Leishmania major and Leishmania amazonensis. In the proposed Bayesian analysis, we modeled the immunobiological dynamics and data variations by Lotka-Volterra equations and the linear mixed model, respectively. Using the posterior distributions of differential equation parameters and the concept of asymptotic stable equilibrium of differential equations, we successfully quantified the immune efficiency

    16th Scandinavian Symposium and Workshops on Algorithm Theory: SWAT 2018, June 18-20, 2018, Malmö University, Malmö, Sweden

    Get PDF

    Stealthy attacks and defense strategies in competing sensor networks

    Get PDF
    The fundamental objective of sensor networks underpinning a variety of applications is the collection of reliable information from the surrounding environment. The correctness of the collected data is especially important in applications involving societal welfare and safety, in which the acquired information may be utilized by end-users for decision-making. The distributed nature of sensor networks and their deployment in unattended and potentially hostile environments, however, renders this collection task challenging for both scalar and visual data. In this work we propose and address the twin problem of carrying out and defending against a stealthy attack on the information gathered by a sensor network at the physical sensing layer as perpetrated by a competing hostile network. A stealthy attack in this context is an intelligent attempt to disinform a sensor network in a manner that mitigates attack discovery. In comparison with previous sensor network security studies, we explicitly model the attack scenario as an active competition between two networks where difficulties arise from the pervasive nature of the attack, the possibility of tampering during data acquisition prior to encryption, and the lack of prior knowledge regarding the characteristics of the attack. We examine the problem from the perspective of both the hostile and the legitimate network. The interaction between the networks is modeled as a game where a stealth utility is derived and shown to be consistent for both players in the case of stealthy direct attacks and stealthy cross attacks. Based on the stealth utility, the optimal attack and defense strategies are obtained for each network. For the legitimate network, minimization of the attacker’s stealth results in the possibility of attack detection through established paradigms and the ability to mitigate the power of the attack. For the hostile network, maximization of the stealth utility translates into the optimal attack avoidance. This attack avoidance does not require active communication among the hostile nodes but rather relies on a level of coordination which we quantify. We demonstrate the significance and effectiveness of the solution for sensor networks acquiring scalar and multidimensional data such as surveillance sequences and relate the results to existing image sensor networks. Finally we discuss the implications of these results for achieving secure event acquisition in unattended environments

    Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

    Get PDF
    Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps
    • 

    corecore