68,853 research outputs found
Martingales and Profile of Binary Search Trees
We are interested in the asymptotic analysis of the binary search tree (BST)
under the random permutation model. Via an embedding in a continuous time
model, we get new results, in particular the asymptotic behavior of the
profile
Update statistics in conservative parallel discrete event simulations of asynchronous systems
We model the performance of an ideal closed chain of L processing elements
that work in parallel in an asynchronous manner. Their state updates follow a
generic conservative algorithm. The conservative update rule determines the
growth of a virtual time surface. The physics of this growth is reflected in
the utilization (the fraction of working processors) and in the interface
width. We show that it is possible to nake an explicit connection between the
utilization and the macroscopic structure of the virtual time interface. We
exploit this connection to derive the theoretical probability distribution of
updates in the system within an approximate model. It follows that the
theoretical lower bound for the computational speed-up is s=(L+1)/4 for L>3.
Our approach uses simple statistics to count distinct surface configuration
classes consistent with the model growth rule. It enables one to compute
analytically microscopic properties of an interface, which are unavailable by
continuum methods.Comment: 15 pages, 12 figure
From Hammersley's lines to Hammersley's trees
We construct a stationary random tree, embedded in the upper half plane, with
prescribed offspring distribution and whose vertices are the atoms of a unit
Poisson point process. This process which we call Hammersley's tree process
extends the usual Hammersley's line process. Just as Hammersley's process is
related to the problem of the longest increasing subsequence, this model also
has a combinatorial interpretation: it counts the number of heaps (i.e.
increasing trees) required to store a random permutation. This problem was
initially considered by Byers et. al (2011) and Istrate and Bonchis (2015) in
the case of regular trees. We show, in particular, that the number of heaps
grows logarithmically with the size of the permutation
Statistical methods of SNP data analysis with applications
Various statistical methods important for genetic analysis are considered and
developed. Namely, we concentrate on the multifactor dimensionality reduction,
logic regression, random forests and stochastic gradient boosting. These
methods and their new modifications, e.g., the MDR method with "independent
rule", are used to study the risk of complex diseases such as cardiovascular
ones. The roles of certain combinations of single nucleotide polymorphisms and
external risk factors are examined. To perform the data analysis concerning the
ischemic heart disease and myocardial infarction the supercomputer SKIF
"Chebyshev" of the Lomonosov Moscow State University was employed
Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees
This paper investigates an important problem in stream mining, i.e.,
classification under streaming emerging new classes or SENC. The common
approach is to treat it as a classification problem and solve it using either a
supervised learner or a semi-supervised learner. We propose an alternative
approach by using unsupervised learning as the basis to solve this problem. The
SENC problem can be decomposed into three sub problems: detecting emerging new
classes, classifying for known classes, and updating models to enable
classification of instances of the new class and detection of more emerging new
classes. The proposed method employs completely random trees which have been
shown to work well in unsupervised learning and supervised learning
independently in the literature. This is the first time, as far as we know,
that completely random trees are used as a single common core to solve all
three sub problems: unsupervised learning, supervised learning and model update
in data streams. We show that the proposed unsupervised-learning-focused method
often achieves significantly better outcomes than existing
classification-focused methods
- …