66,629 research outputs found

    Martingales and Profile of Binary Search Trees

    Full text link
    We are interested in the asymptotic analysis of the binary search tree (BST) under the random permutation model. Via an embedding in a continuous time model, we get new results, in particular the asymptotic behavior of the profile

    Update statistics in conservative parallel discrete event simulations of asynchronous systems

    Full text link
    We model the performance of an ideal closed chain of L processing elements that work in parallel in an asynchronous manner. Their state updates follow a generic conservative algorithm. The conservative update rule determines the growth of a virtual time surface. The physics of this growth is reflected in the utilization (the fraction of working processors) and in the interface width. We show that it is possible to nake an explicit connection between the utilization and the macroscopic structure of the virtual time interface. We exploit this connection to derive the theoretical probability distribution of updates in the system within an approximate model. It follows that the theoretical lower bound for the computational speed-up is s=(L+1)/4 for L>3. Our approach uses simple statistics to count distinct surface configuration classes consistent with the model growth rule. It enables one to compute analytically microscopic properties of an interface, which are unavailable by continuum methods.Comment: 15 pages, 12 figure

    From Hammersley's lines to Hammersley's trees

    Full text link
    We construct a stationary random tree, embedded in the upper half plane, with prescribed offspring distribution and whose vertices are the atoms of a unit Poisson point process. This process which we call Hammersley's tree process extends the usual Hammersley's line process. Just as Hammersley's process is related to the problem of the longest increasing subsequence, this model also has a combinatorial interpretation: it counts the number of heaps (i.e. increasing trees) required to store a random permutation. This problem was initially considered by Byers et. al (2011) and Istrate and Bonchis (2015) in the case of regular trees. We show, in particular, that the number of heaps grows logarithmically with the size of the permutation

    Statistical methods of SNP data analysis with applications

    Get PDF
    Various statistical methods important for genetic analysis are considered and developed. Namely, we concentrate on the multifactor dimensionality reduction, logic regression, random forests and stochastic gradient boosting. These methods and their new modifications, e.g., the MDR method with "independent rule", are used to study the risk of complex diseases such as cardiovascular ones. The roles of certain combinations of single nucleotide polymorphisms and external risk factors are examined. To perform the data analysis concerning the ischemic heart disease and myocardial infarction the supercomputer SKIF "Chebyshev" of the Lomonosov Moscow State University was employed

    Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees

    Get PDF
    This paper investigates an important problem in stream mining, i.e., classification under streaming emerging new classes or SENC. The common approach is to treat it as a classification problem and solve it using either a supervised learner or a semi-supervised learner. We propose an alternative approach by using unsupervised learning as the basis to solve this problem. The SENC problem can be decomposed into three sub problems: detecting emerging new classes, classifying for known classes, and updating models to enable classification of instances of the new class and detection of more emerging new classes. The proposed method employs completely random trees which have been shown to work well in unsupervised learning and supervised learning independently in the literature. This is the first time, as far as we know, that completely random trees are used as a single common core to solve all three sub problems: unsupervised learning, supervised learning and model update in data streams. We show that the proposed unsupervised-learning-focused method often achieves significantly better outcomes than existing classification-focused methods
    • …
    corecore