126,487 research outputs found

    On the occurrences of motifs in recursive trees, with applications to random structures

    Get PDF
    In this dissertation we study three problems related to motifs and recursive trees. In the first problem we consider a collection of uncorrelated motifs and their occurrences on the fringe of random recursive trees. We compute the exact mean and variance of the multivariate random vector of the counts of occurrences of the motifs. We further use the Cramér-Wold device and the contraction method to show an asymptotic convergence in distribution to a multivariate normal random variable with this mean and variance. ^ The second problem we study is that of the probability that a collection of motifs (of the same size) do not occur on the fringe of recursive trees. Here we use analytic and complex-valued methods to characterize this asymptotic probability. The asymptotics are complemented with human assisted Maple computation. We are able to completely characterize the asymptotic probability for two families of growing motifs. ^ In the third problem we introduce a new tree model where at each time step a new block (motif) is joined to the tree. This is one of the earlier investigations in the random tree literature where such a model is studied, i.e., in which trees grow from building blocks which are themselves trees. We consider the building blocks to be of the same size and characterize the number of leaves, the depth of insertion, the total path length and the height of such trees. The tools used in this analysis include stochastic recurrences, Pólya urn theory, moment generating functions and martingales

    Convex minorant trees associated with Brownian paths and the continuum limit of the minimum spanning tree

    Full text link
    We give an explicit construction of the scaling limit of the minimum spanning tree of the complete graph. The limit object is described using a recursive construction involving the convex minorants of a Brownian motion with parabolic drift (and countably many i.i.d. uniform random variables); we call it the Brownian parabolic tree. Aside from the new representation, this point of view has multiple consequences. For instance, it permits us to prove that its Hausdorff dimension is almost surely 3. It also intrinsically contains information related to some underlying dynamics: one notable by-product is the construction of a standard metric multiplicative coalescent which couples the scaling limits of random graphs at different points of the critical window in terms of the same simple building blocks. The above results actually fit in a more general framework. They result from the introduction of a new family of continuum random trees associated with functions via their convex minorants, that we call convex minorant trees. We initiate the study of these structures in the case of Brownian-like paths. In passing, we prove that the convex minorant tree of a Brownian excursion is a Brownian continuum ranndom tree, and that it provides a coupling between the Aldous--Pitman fragmentation of the Brownian continuum random tree and its representation by Bertoin.Comment: 56 pages, 2 figure

    Automatic generation of hardware Tree Classifiers

    Full text link
    Machine Learning is growing in popularity and spreading across different fields for various applications. Due to this trend, machine learning algorithms use different hardware platforms and are being experimented to obtain high test accuracy and throughput. FPGAs are well-suited hardware platform for machine learning because of its re-programmability and lower power consumption. Programming using FPGAs for machine learning algorithms requires substantial engineering time and effort compared to software implementation. We propose a software assisted design flow to program FPGA for machine learning algorithms using our hardware library. The hardware library is highly parameterized and it accommodates Tree Classifiers. As of now, our library consists of the components required to implement decision trees and random forests. The whole automation is wrapped around using a python script which takes you from the first step of having a dataset and design choices to the last step of having a hardware descriptive code for the trained machine learning model

    Quantifying loopy network architectures

    Get PDF
    Biology presents many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture containing closed loops at many different levels. Although a number of methods have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework, the hierarchical loop decomposition, that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated graphs, such as artificial models and optimal distribution networks, as well as natural graphs extracted from digitized images of dicotyledonous leaves and vasculature of rat cerebral neocortex. We calculate various metrics based on the Asymmetry, the cumulative size distribution and the Strahler bifurcation ratios of the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information (exact location of edges and nodes) from the metric topology (connectivity and edge weight) and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.Comment: 17 pages, 8 figures. During preparation of this manuscript the authors became aware of the work of Mileyko at al., concurrently submitted for publicatio

    Finiteness theorems in stochastic integer programming

    Full text link
    We study Graver test sets for families of linear multi-stage stochastic integer programs with varying number of scenarios. We show that these test sets can be decomposed into finitely many ``building blocks'', independent of the number of scenarios, and we give an effective procedure to compute these building blocks. The paper includes an introduction to Nash-Williams' theory of better-quasi-orderings, which is used to show termination of our algorithm. We also apply this theory to finiteness results for Hilbert functions.Comment: 36 p

    Repeated patterns in tree genetic programming

    Get PDF
    We extend our analysis of repetitive patterns found in genetic programming genomes to tree based GP. As in linear GP, repetitive patterns are present in large numbers. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail: e.g. using depth v. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, syntactic and semantic fitness correlations and diffuse introns. We relate this emergent phenomenon to considerations about building blocks in GP and how GP works

    COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

    Full text link
    COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

    An Overview of Schema Theory

    Full text link
    The purpose of this paper is to give an introduction to the field of Schema Theory written by a mathematician and for mathematicians. In particular, we endeavor to to highlight areas of the field which might be of interest to a mathematician, to point out some related open problems, and to suggest some large-scale projects. Schema theory seeks to give a theoretical justification for the efficacy of the field of genetic algorithms, so readers who have studied genetic algorithms stand to gain the most from this paper. However, nothing beyond basic probability theory is assumed of the reader, and for this reason we write in a fairly informal style. Because the mathematics behind the theorems in schema theory is relatively elementary, we focus more on the motivation and philosophy. Many of these results have been proven elsewhere, so this paper is designed to serve a primarily expository role. We attempt to cast known results in a new light, which makes the suggested future directions natural. This involves devoting a substantial amount of time to the history of the field. We hope that this exposition will entice some mathematicians to do research in this area, that it will serve as a road map for researchers new to the field, and that it will help explain how schema theory developed. Furthermore, we hope that the results collected in this document will serve as a useful reference. Finally, as far as the author knows, the questions raised in the final section are new.Comment: 27 pages. Originally written in 2009 and hosted on my website, I've decided to put it on the arXiv as a more permanent home. The paper is primarily expository, so I don't really know where to submit it, but perhaps one day I will find an appropriate journa

    Coagulation--fragmentation duality, Poisson--Dirichlet distributions and random recursive trees

    Full text link
    In this paper we give a new example of duality between fragmentation and coagulation operators. Consider the space of partitions of mass (i.e., decreasing sequences of nonnegative real numbers whose sum is 1) and the two-parameter family of Poisson--Dirichlet distributions PD(α,θ)\operatorname {PD}(\alpha,\theta) that take values in this space. We introduce families of random fragmentation and coagulation operators Fragα\mathrm {Frag}_{\alpha} and Coagα,θ\mathrm {Coag}_{\alpha,\theta}, respectively, with the following property: if the input to Fragα\mathrm {Frag}_{\alpha} has PD(α,θ)\operatorname {PD}(\alpha,\theta) distribution, then the output has PD(α,θ+1)\operatorname {PD}(\alpha,\theta+1) distribution, while the reverse is true for Coagα,θ\mathrm {Coag}_{\alpha,\theta}. This result may be proved using a subordinator representation and it provides a companion set of relations to those of Pitman between PD(α,θ)\operatorname {PD}(\alpha,\theta) and PD(αβ,θ)\operatorname {PD}(\alpha\beta,\theta). Repeated application of the Fragα\mathrm {Frag}_{\alpha} operators gives rise to a family of fragmentation chains. We show that these Markov chains can be encoded naturally by certain random recursive trees, and use this representation to give an alternative and more concrete proof of the coagulation--fragmentation duality.Comment: Published at http://dx.doi.org/10.1214/105051606000000655 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
    corecore