Search CORE

80,998 research outputs found

Patterns in random binary search trees

Author: Conrado Mart�nez
Philippe Flajolet
Xavier Gourdon
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

On the sub-permutations of pattern avoiding permutations

Author: Disanto Filippo
Wiehe Thomas
Publication venue
Publication date: 01/01/2014
Field of study

There is a deep connection between permutations and trees. Certain sub-structures of permutations, called sub-permutations, bijectively map to sub-trees of binary increasing trees. This opens a powerful tool set to study enumerative and probabilistic properties of sub-permutations and to investigate the relationships between 'local' and 'global' features using the concept of pattern avoidance. First, given a pattern {\mu}, we study how the avoidance of {\mu} in a permutation {\pi} affects the presence of other patterns in the sub-permutations of {\pi}. More precisely, considering patterns of length 3, we solve instances of the following problem: given a class of permutations K and a pattern {\mu}, we ask for the number of permutations

\pi \in Av_n(\mu)

whose sub-permutations in K satisfy certain additional constraints on their size. Second, we study the probability for a generic pattern to be contained in a random permutation {\pi} of size n without being present in the sub-permutations of {\pi} generated by the entry

1 \leq k \leq n

. These theoretical results can be useful to define efficient randomized pattern-search procedures based on classical algorithms of pattern-recognition, while the general problem of pattern-search is NP-complete

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

Archivio della Ricerca - Università di Pisa

Maximal clades in random binary search trees

Author: Janson Svante
Publication venue
Publication date: 27/08/2014
Field of study

We study maximal clades in random phylogenetic trees with the Yule-Harding model or, equivalently, in binary search trees. We use probabilistic methods to reprove and extend earlier results on moment asymptotics and asymptotic normality. In particular, we give an explanation of the curious phenomenon observed by Drmota, Fuchs and Lee (2014) that asymptotic normality holds, but one should normalize using half the variance.Comment: 25 page

arXiv.org e-Print Archive

CiteSeerX

Limit Laws for Functions of Fringe trees for Binary Search Trees and Recursive Trees

Author: Holmgren Cecilia
Janson Svante
Publication venue
Publication date: 26/06/2014
Field of study

We prove limit theorems for sums of functions of subtrees of binary search trees and random recursive trees. In particular, we give simple new proofs of the fact that the number of fringe trees of size

k=k_n

in the binary search tree and the random recursive tree (of total size

n

) asymptotically has a Poisson distribution if

k\rightarrow\infty

, and that the distribution is asymptotically normal for

k=o(\sqrt{n})

. Furthermore, we prove similar results for the number of subtrees of size

k

with some required property

P

, for example the number of copies of a certain fixed subtree

T

. Using the Cram\'er-Wold device, we show also that these random numbers for different fixed subtrees converge jointly to a multivariate normal distribution. As an application of the general results, we obtain a normal limit law for the number of

\ell

-protected nodes in a binary search tree or random recursive tree. The proofs use a new version of a representation by Devroye, and Stein's method (for both normal and Poisson approximation) together with certain couplings

arXiv.org e-Print Archive

CiteSeerX

Dynamic load balancing in parallel KD-tree k-means

Author: Di Fatta Giuseppe
Pettinger David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/06/2010
Field of study

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

Central Archive at the University of Reading

Crossref

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

Prospects and limitations of full-text index structures in genome analysis

Author: Dawyndt Peter
De Baets Bernard
Fack Veerle
Vyverman Michaël
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

Ghent University Academic Bibliography

PubMed Central

Repeated patterns in tree genetic programming

Author: Banzhaf W.
Langdon W.B.
Publication venue: Springer-Verlag GmbH
Publication date: 01/01/2005
Field of study

We extend our analysis of repetitive patterns found in genetic programming genomes to tree based GP. As in linear GP, repetitive patterns are present in large numbers. Size fair crossover limits bloat in automatic programming, preventing the evolution of recurring motifs. We examine these complex properties in detail: e.g. using depth v. size Catalan binary tree shape plots, subgraph and subtree matching, information entropy, syntactic and semantic fitness correlations and diffuse introns. We relate this emergent phenomenon to considerations about building blocks in GP and how GP works

UCL Discovery