10,856 research outputs found

    The Bases of Association Rules of High Confidence

    Full text link
    We develop a new approach for distributed computing of the association rules of high confidence in a binary table. It is derived from the D-basis algorithm in K. Adaricheva and J.B. Nation (TCS 2017), which is performed on multiple sub-tables of a table given by removing several rows at a time. The set of rules is then aggregated using the same approach as the D-basis is retrieved from a larger set of implications. This allows to obtain a basis of association rules of high confidence, which can be used for ranking all attributes of the table with respect to a given fixed attribute using the relevance parameter introduced in K. Adaricheva et al. (Proceedings of ICFCA-2015). This paper focuses on the technical implementation of the new algorithm. Some testing results are performed on transaction data and medical data.Comment: Presented at DTMN, Sydney, Australia, July 28, 201

    From aggressive driving to molecular motor traffic

    Get PDF
    Motivated by recent experimental results for the step sizes of dynein motor proteins, we develope a cellular automata model for intra-cellular traffic of dynein motors incorporating special features of the hindrance-dependent step size of the individual motors. We begin by investigating the properties of the aggressive driving model (ADM), a simple cellular automata-based model of vehicular traffic, a unique feature of which is that it allows a natural extension to capture the essential features of dynein motor traffic. We first calculate several collective properties of the ADM, under both periodic and open boundary conditions, analytically using two different mean-field approaches as well as by carrying out computer simulations. Then we extend the ADM by incorporating the possibilities of attachment and detachment of motors on the track which is a common feature of a large class of motor proteins that are collectively referred to as cytoskeletal motors. The interplay of the boundary and bulk dynamics of attachment and detachment of the motors to the track gives rise a phase where high and low density phases separated by a stable domain wall coexist. We also compare and contrast our results with the model of Parmeggiani et. al. (Phys. Rev. Lett. {\bf 90}, 086601 (2003)) which can be regarded as a minimal model for traffic of a closely related family of motor proteins called kinesin. Finally, we compare the transportation efficiencies of dynein and kinesin motors over a range of values of the model parameters.Comment: Final Version Accepted for Publication in J. Phys. A (IOP, UK

    High performance subgraph mining in molecular compounds

    Get PDF
    Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

    Cluster counting: The Hoshen-Kopelman algorithm vs. spanning tree approaches

    Full text link
    Two basic approaches to the cluster counting task in the percolation and related models are discussed. The Hoshen-Kopelman multiple labeling technique for cluster statistics is redescribed. Modifications for random and aperiodic lattices are sketched as well as some parallelised versions of the algorithm are mentioned. The graph-theoretical basis for the spanning tree approaches is given by describing the "breadth-first search" and "depth-first search" procedures. Examples are given for extracting the elastic and geometric "backbone" of a percolation cluster. An implementation of the "pebble game" algorithm using a depth-first search method is also described.Comment: LaTeX, uses ijmpc1.sty(included), 18 pages, 3 figures, submitted to Intern. J. of Modern Physics
    corecore