113 research outputs found

    High performance subgraph mining in molecular compounds

    Get PDF
    Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations

    DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model

    Get PDF
    Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the data-generating process of variables. Recently, it was shown that use of non-Gaussianity identifies the full structure of a linear acyclic model, i.e., a causal ordering of variables and their connection strengths, without using any prior knowledge on the network structure, which is not the case with conventional methods. However, existing estimation methods are based on iterative search algorithms and may not converge to a correct solution in a finite number of steps. In this paper, we propose a new direct method to estimate a causal ordering and connection strengths based on non-Gaussianity. In contrast to the previous methods, our algorithm requires no algorithmic parameters and is guaranteed to converge to the right solution within a small fixed number of steps if the data strictly follows the model

    Fine mapping of qSTV11KAS, a major QTL for rice stripe disease resistance

    Get PDF
    Rice stripe disease, caused by rice stripe virus (RSV), is one of the most serious diseases in temperate rice-growing areas. In the present study, we performed quantitative trait locus (QTL) analysis for RSV resistance using 98 backcross inbred lines derived from the cross between the highly resistant variety, Kasalath, and the highly susceptible variety, Nipponbare. Under artificial inoculation in the greenhouse, two QTLs for RSV resistance, designated qSTV7 and qSTV11KAS, were detected on chromosomes 7 and 11 respectively, whereas only one QTL was detected in the same location of chromosome 11 under natural inoculation in the field. The stability of qSTV11KAS was validated using 39 established chromosome segment substitution lines. Fine mapping of qSTV11KAS was carried out using 372 BC3F2:3 recombinants and 399 BC3F3:4 lines selected from 7,018 BC3F2 plants of the cross SL-234/Koshihikari. The qSTV11KAS was localized to a 39.2 kb region containing seven annotated genes. The most likely candidate gene, LOC_Os11g30910, is predicted to encode a sulfotransferase domain-containing protein. The predicted protein encoded by the Kasalath allele differs from Nipponbare by a single amino acid substitution and the deletion of two amino acids within the sulfotransferase domain. Marker-resistance association analysis revealed that the markers L104-155 bp and R48-194 bp were highly correlated with RSV resistance in the 148 landrace varieties. These results provide a basis for the cloning of qSTV11KAS, and the markers may be used for molecular breeding of RSV resistant rice varieties
    corecore