80,998 research outputs found
On the sub-permutations of pattern avoiding permutations
There is a deep connection between permutations and trees. Certain
sub-structures of permutations, called sub-permutations, bijectively map to
sub-trees of binary increasing trees. This opens a powerful tool set to study
enumerative and probabilistic properties of sub-permutations and to investigate
the relationships between 'local' and 'global' features using the concept of
pattern avoidance. First, given a pattern {\mu}, we study how the avoidance of
{\mu} in a permutation {\pi} affects the presence of other patterns in the
sub-permutations of {\pi}. More precisely, considering patterns of length 3, we
solve instances of the following problem: given a class of permutations K and a
pattern {\mu}, we ask for the number of permutations whose
sub-permutations in K satisfy certain additional constraints on their size.
Second, we study the probability for a generic pattern to be contained in a
random permutation {\pi} of size n without being present in the
sub-permutations of {\pi} generated by the entry . These
theoretical results can be useful to define efficient randomized pattern-search
procedures based on classical algorithms of pattern-recognition, while the
general problem of pattern-search is NP-complete
Maximal clades in random binary search trees
We study maximal clades in random phylogenetic trees with the Yule-Harding
model or, equivalently, in binary search trees. We use probabilistic methods to
reprove and extend earlier results on moment asymptotics and asymptotic
normality. In particular, we give an explanation of the curious phenomenon
observed by Drmota, Fuchs and Lee (2014) that asymptotic normality holds, but
one should normalize using half the variance.Comment: 25 page
Limit Laws for Functions of Fringe trees for Binary Search Trees and Recursive Trees
We prove limit theorems for sums of functions of subtrees of binary search
trees and random recursive trees. In particular, we give simple new proofs of
the fact that the number of fringe trees of size in the binary search
tree and the random recursive tree (of total size ) asymptotically has a
Poisson distribution if , and that the distribution is
asymptotically normal for . Furthermore, we prove similar
results for the number of subtrees of size with some required property , for example the number of copies of a certain fixed subtree . Using
the Cram\'er-Wold device, we show also that these random numbers for different
fixed subtrees converge jointly to a multivariate normal distribution. As an
application of the general results, we obtain a normal limit law for the number
of -protected nodes in a binary search tree or random recursive tree.
The proofs use a new version of a representation by Devroye, and Stein's
method (for both normal and Poisson approximation) together with certain
couplings
Dynamic load balancing in parallel KD-tree k-means
One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis.
Techniques for improving the efficiency of k-Means have been
largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing
issue. Three solutions have been developed and tested. Two
approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy
The Case for Learned Index Structures
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the
position of a record within a sorted array, a Hash-Index as a model to map a
key to a position of a record within an unsorted array, and a BitMap-Index as a
model to indicate if a data record exists or not. In this exploratory research
paper, we start from this premise and posit that all existing index structures
can be replaced with other types of models, including deep-learning models,
which we term learned indexes. The key idea is that a model can learn the sort
order or structure of lookup keys and use this signal to effectively predict
the position or existence of records. We theoretically analyze under which
conditions learned indexes outperform traditional index structures and describe
the main challenges in designing learned index structures. Our initial results
show, that by using neural nets we are able to outperform cache-optimized
B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over
several real-world data sets. More importantly though, we believe that the idea
of replacing core components of a data management system through learned models
has far reaching implications for future systems designs and that this work
just provides a glimpse of what might be possible
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
Repeated patterns in tree genetic programming
We extend our analysis of repetitive patterns found in genetic programming genomes to tree based GP.
As in linear GP, repetitive patterns are present in large numbers. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail: e.g. using depth v. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, syntactic and semantic fitness correlations and diffuse introns. We relate this emergent phenomenon to considerations about building blocks in GP and how GP works
- …