126,487 research outputs found
On the occurrences of motifs in recursive trees, with applications to random structures
In this dissertation we study three problems related to motifs and recursive trees. In the first problem we consider a collection of uncorrelated motifs and their occurrences on the fringe of random recursive trees. We compute the exact mean and variance of the multivariate random vector of the counts of occurrences of the motifs. We further use the Cramér-Wold device and the contraction method to show an asymptotic convergence in distribution to a multivariate normal random variable with this mean and variance. ^ The second problem we study is that of the probability that a collection of motifs (of the same size) do not occur on the fringe of recursive trees. Here we use analytic and complex-valued methods to characterize this asymptotic probability. The asymptotics are complemented with human assisted Maple computation. We are able to completely characterize the asymptotic probability for two families of growing motifs. ^ In the third problem we introduce a new tree model where at each time step a new block (motif) is joined to the tree. This is one of the earlier investigations in the random tree literature where such a model is studied, i.e., in which trees grow from building blocks which are themselves trees. We consider the building blocks to be of the same size and characterize the number of leaves, the depth of insertion, the total path length and the height of such trees. The tools used in this analysis include stochastic recurrences, Pólya urn theory, moment generating functions and martingales
Convex minorant trees associated with Brownian paths and the continuum limit of the minimum spanning tree
We give an explicit construction of the scaling limit of the minimum spanning
tree of the complete graph. The limit object is described using a recursive
construction involving the convex minorants of a Brownian motion with parabolic
drift (and countably many i.i.d. uniform random variables); we call it the
Brownian parabolic tree.
Aside from the new representation, this point of view has multiple
consequences. For instance, it permits us to prove that its Hausdorff dimension
is almost surely 3. It also intrinsically contains information related to some
underlying dynamics: one notable by-product is the construction of a standard
metric multiplicative coalescent which couples the scaling limits of random
graphs at different points of the critical window in terms of the same simple
building blocks.
The above results actually fit in a more general framework. They result from
the introduction of a new family of continuum random trees associated with
functions via their convex minorants, that we call convex minorant trees. We
initiate the study of these structures in the case of Brownian-like paths. In
passing, we prove that the convex minorant tree of a Brownian excursion is a
Brownian continuum ranndom tree, and that it provides a coupling between the
Aldous--Pitman fragmentation of the Brownian continuum random tree and its
representation by Bertoin.Comment: 56 pages, 2 figure
Automatic generation of hardware Tree Classifiers
Machine Learning is growing in popularity and spreading across different fields for various applications. Due to this trend, machine learning algorithms use different hardware platforms and are being experimented to obtain high test accuracy and throughput. FPGAs are well-suited hardware platform for machine learning because of its re-programmability and lower power consumption. Programming using FPGAs for machine learning algorithms requires substantial engineering time and effort compared to software implementation. We propose a software assisted design flow to program FPGA for machine learning algorithms using our hardware library. The hardware library is highly parameterized and it accommodates Tree Classifiers. As of now, our library consists of the components required to implement decision trees and random forests. The whole automation is wrapped around using a python script which takes you from the first step of having a dataset and design choices to the last step of having a hardware descriptive code for the trained machine learning model
Quantifying loopy network architectures
Biology presents many examples of planar distribution and structural networks
having dense sets of closed loops. An archetype of this form of network
organization is the vasculature of dicotyledonous leaves, which showcases a
hierarchically-nested architecture containing closed loops at many different
levels. Although a number of methods have been proposed to measure aspects of
the structure of such networks, a robust metric to quantify their hierarchical
organization is still lacking. We present an algorithmic framework, the
hierarchical loop decomposition, that allows mapping loopy networks to binary
trees, preserving in the connectivity of the trees the architecture of the
original graph. We apply this framework to investigate computer generated
graphs, such as artificial models and optimal distribution networks, as well as
natural graphs extracted from digitized images of dicotyledonous leaves and
vasculature of rat cerebral neocortex. We calculate various metrics based on
the Asymmetry, the cumulative size distribution and the Strahler bifurcation
ratios of the corresponding trees and discuss the relationship of these
quantities to the architectural organization of the original graphs. This
algorithmic framework decouples the geometric information (exact location of
edges and nodes) from the metric topology (connectivity and edge weight) and it
ultimately allows us to perform a quantitative statistical comparison between
predictions of theoretical models and naturally occurring loopy graphs.Comment: 17 pages, 8 figures. During preparation of this manuscript the
authors became aware of the work of Mileyko at al., concurrently submitted
for publicatio
Finiteness theorems in stochastic integer programming
We study Graver test sets for families of linear multi-stage stochastic
integer programs with varying number of scenarios. We show that these test sets
can be decomposed into finitely many ``building blocks'', independent of the
number of scenarios, and we give an effective procedure to compute these
building blocks. The paper includes an introduction to Nash-Williams' theory of
better-quasi-orderings, which is used to show termination of our algorithm. We
also apply this theory to finiteness results for Hilbert functions.Comment: 36 p
Repeated patterns in tree genetic programming
We extend our analysis of repetitive patterns found in genetic programming genomes to tree based GP.
As in linear GP, repetitive patterns are present in large numbers. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail: e.g. using depth v. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, syntactic and semantic fitness correlations and diffuse introns. We relate this emergent phenomenon to considerations about building blocks in GP and how GP works
COMET: A Recipe for Learning and Using Large Ensembles on Massive Data
COMET is a single-pass MapReduce algorithm for learning on large-scale data.
It builds multiple random forest ensembles on distributed blocks of data and
merges them into a mega-ensemble. This approach is appropriate when learning
from massive-scale data that is too large to fit on a single machine. To get
the best accuracy, IVoting should be used instead of bagging to generate the
training subset for each decision tree in the random forest. Experiments with
two large datasets (5GB and 50GB compressed) show that COMET compares favorably
(in both accuracy and training time) to learning on a subsample of data using a
serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble
evaluation which dynamically decides how many ensemble members to evaluate per
data point; this can reduce evaluation cost by 100X or more
An Overview of Schema Theory
The purpose of this paper is to give an introduction to the field of Schema
Theory written by a mathematician and for mathematicians. In particular, we
endeavor to to highlight areas of the field which might be of interest to a
mathematician, to point out some related open problems, and to suggest some
large-scale projects. Schema theory seeks to give a theoretical justification
for the efficacy of the field of genetic algorithms, so readers who have
studied genetic algorithms stand to gain the most from this paper. However,
nothing beyond basic probability theory is assumed of the reader, and for this
reason we write in a fairly informal style.
Because the mathematics behind the theorems in schema theory is relatively
elementary, we focus more on the motivation and philosophy. Many of these
results have been proven elsewhere, so this paper is designed to serve a
primarily expository role. We attempt to cast known results in a new light,
which makes the suggested future directions natural. This involves devoting a
substantial amount of time to the history of the field.
We hope that this exposition will entice some mathematicians to do research
in this area, that it will serve as a road map for researchers new to the
field, and that it will help explain how schema theory developed. Furthermore,
we hope that the results collected in this document will serve as a useful
reference. Finally, as far as the author knows, the questions raised in the
final section are new.Comment: 27 pages. Originally written in 2009 and hosted on my website, I've
decided to put it on the arXiv as a more permanent home. The paper is
primarily expository, so I don't really know where to submit it, but perhaps
one day I will find an appropriate journa
Coagulation--fragmentation duality, Poisson--Dirichlet distributions and random recursive trees
In this paper we give a new example of duality between fragmentation and
coagulation operators. Consider the space of partitions of mass (i.e.,
decreasing sequences of nonnegative real numbers whose sum is 1) and the
two-parameter family of Poisson--Dirichlet distributions that take values in this space. We introduce families of
random fragmentation and coagulation operators and
, respectively, with the following property: if
the input to has
distribution, then the output has
distribution, while the reverse is true for .
This result may be proved using a subordinator representation and it provides a
companion set of relations to those of Pitman between and . Repeated
application of the operators gives rise to a family
of fragmentation chains. We show that these Markov chains can be encoded
naturally by certain random recursive trees, and use this representation to
give an alternative and more concrete proof of the coagulation--fragmentation
duality.Comment: Published at http://dx.doi.org/10.1214/105051606000000655 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
- …