10,856 research outputs found
The Bases of Association Rules of High Confidence
We develop a new approach for distributed computing of the association rules
of high confidence in a binary table. It is derived from the D-basis algorithm
in K. Adaricheva and J.B. Nation (TCS 2017), which is performed on multiple
sub-tables of a table given by removing several rows at a time. The set of
rules is then aggregated using the same approach as the D-basis is retrieved
from a larger set of implications. This allows to obtain a basis of association
rules of high confidence, which can be used for ranking all attributes of the
table with respect to a given fixed attribute using the relevance parameter
introduced in K. Adaricheva et al. (Proceedings of ICFCA-2015). This paper
focuses on the technical implementation of the new algorithm. Some testing
results are performed on transaction data and medical data.Comment: Presented at DTMN, Sydney, Australia, July 28, 201
From aggressive driving to molecular motor traffic
Motivated by recent experimental results for the step sizes of dynein motor
proteins, we develope a cellular automata model for intra-cellular traffic of
dynein motors incorporating special features of the hindrance-dependent step
size of the individual motors. We begin by investigating the properties of the
aggressive driving model (ADM), a simple cellular automata-based model of
vehicular traffic, a unique feature of which is that it allows a natural
extension to capture the essential features of dynein motor traffic. We first
calculate several collective properties of the ADM, under both periodic and
open boundary conditions, analytically using two different mean-field
approaches as well as by carrying out computer simulations. Then we extend the
ADM by incorporating the possibilities of attachment and detachment of motors
on the track which is a common feature of a large class of motor proteins that
are collectively referred to as cytoskeletal motors. The interplay of the
boundary and bulk dynamics of attachment and detachment of the motors to the
track gives rise a phase where high and low density phases separated by a
stable domain wall coexist. We also compare and contrast our results with the
model of Parmeggiani et. al. (Phys. Rev. Lett. {\bf 90}, 086601 (2003)) which
can be regarded as a minimal model for traffic of a closely related family of
motor proteins called kinesin. Finally, we compare the transportation
efficiencies of dynein and kinesin motors over a range of values of the model
parameters.Comment: Final Version Accepted for Publication in J. Phys. A (IOP, UK
High performance subgraph mining in molecular compounds
Structured data represented in the form of graphs arises in
several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining
problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main
aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing
algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network
of workstations
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Cluster counting: The Hoshen-Kopelman algorithm vs. spanning tree approaches
Two basic approaches to the cluster counting task in the percolation and
related models are discussed. The Hoshen-Kopelman multiple labeling technique
for cluster statistics is redescribed. Modifications for random and aperiodic
lattices are sketched as well as some parallelised versions of the algorithm
are mentioned. The graph-theoretical basis for the spanning tree approaches is
given by describing the "breadth-first search" and "depth-first search"
procedures. Examples are given for extracting the elastic and geometric
"backbone" of a percolation cluster. An implementation of the "pebble game"
algorithm using a depth-first search method is also described.Comment: LaTeX, uses ijmpc1.sty(included), 18 pages, 3 figures, submitted to
Intern. J. of Modern Physics
- …