14,473 research outputs found

    High performance subgraph mining in molecular compounds

    Get PDF
    Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations

    Visual and computational analysis of structure-activity relationships in high-throughput screening data

    Get PDF
    Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

    Efficient mining of discriminative molecular fragments

    Get PDF
    Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset

    Dynamic load balancing for the distributed mining of molecular structures

    Get PDF
    In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

    Resistance-gene-directed discovery of a natural-product herbicide with a new mode of action.

    Get PDF
    Bioactive natural products have evolved to inhibit specific cellular targets and have served as lead molecules for health and agricultural applications for the past century1-3. The post-genomics era has brought a renaissance in the discovery of natural products using synthetic-biology tools4-6. However, compared to traditional bioactivity-guided approaches, genome mining of natural products with specific and potent biological activities remains challenging4. Here we present the discovery and validation of a potent herbicide that targets a critical metabolic enzyme that is required for plant survival. Our approach is based on the co-clustering of a self-resistance gene in the natural-product biosynthesis gene cluster7-9, which provides insight into the potential biological activity of the encoded compound. We targeted dihydroxy-acid dehydratase in the branched-chain amino acid biosynthetic pathway in plants; the last step in this pathway is often targeted for herbicide development10. We show that the fungal sesquiterpenoid aspterric acid, which was discovered using the method described above, is a sub-micromolar inhibitor of dihydroxy-acid dehydratase that is effective as a herbicide in spray applications. The self-resistance gene astD was validated to be insensitive to aspterric acid and was deployed as a transgene in the establishment of plants that are resistant to aspterric acid. This herbicide-resistance gene combination complements the urgent ongoing efforts to overcome weed resistance11. Our discovery demonstrates the potential of using a resistance-gene-directed approach in the discovery of bioactive natural products
    corecore