14,473 research outputs found
High performance subgraph mining in molecular compounds
Structured data represented in the form of graphs arises in
several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining
problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main
aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing
algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network
of workstations
Visual and computational analysis of structure-activity relationships in high-throughput screening data
Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets
Efficient mining of discriminative molecular fragments
Frequent pattern discovery in structured data is receiving
an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Recommended from our members
Distributed mining of molecular fragments
In real world applications sequential algorithms of
data mining and data exploration are often unsuitable for
datasets with enormous size, high-dimensionality and complex
data structure. Grid computing promises unprecedented
opportunities for unlimited computing and storage resources. In this context there is the necessity to develop
high performance distributed data mining algorithms.
However, the computational complexity of the problem and
the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment
Resistance-gene-directed discovery of a natural-product herbicide with a new mode of action.
Bioactive natural products have evolved to inhibit specific cellular targets and have served as lead molecules for health and agricultural applications for the past century1-3. The post-genomics era has brought a renaissance in the discovery of natural products using synthetic-biology tools4-6. However, compared to traditional bioactivity-guided approaches, genome mining of natural products with specific and potent biological activities remains challenging4. Here we present the discovery and validation of a potent herbicide that targets a critical metabolic enzyme that is required for plant survival. Our approach is based on the co-clustering of a self-resistance gene in the natural-product biosynthesis gene cluster7-9, which provides insight into the potential biological activity of the encoded compound. We targeted dihydroxy-acid dehydratase in the branched-chain amino acid biosynthetic pathway in plants; the last step in this pathway is often targeted for herbicide development10. We show that the fungal sesquiterpenoid aspterric acid, which was discovered using the method described above, is a sub-micromolar inhibitor of dihydroxy-acid dehydratase that is effective as a herbicide in spray applications. The self-resistance gene astD was validated to be insensitive to aspterric acid and was deployed as a transgene in the establishment of plants that are resistant to aspterric acid. This herbicide-resistance gene combination complements the urgent ongoing efforts to overcome weed resistance11. Our discovery demonstrates the potential of using a resistance-gene-directed approach in the discovery of bioactive natural products
Recommended from our members
A customizable multi-agent system for distributed data mining
We present a general Multi-Agent System framework for
distributed data mining based on a Peer-to-Peer model. Agent
protocols are implemented through message-based asynchronous
communication. The framework adopts a dynamic load balancing
policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances
- …