4 research outputs found

    Finding unique PCR products on distributed databases

    Get PDF
    Thanks to the development of genetic engineering, various kinds of genomic information are being unveiled. Hence, now, it becomes feasible to study in molecular biology by analyzing the entire genomic information. On the other hand, the quantity of the genomic information stocked in database is increasing day after day. In order to process the whole information, we have to develop an effective method to deal with lots of data. It is indispensable not only to make an effective and rapid algorithm but also to use high-speed computer resource so as to analyze the biological information. For this purpose, as one of the most promised computing environments, the grid computing architecture has appeared recently. The European Data Grid (EDG) is one of the grid com-puting environments. In the first stage of designing hybridization probes and PCR primers, it is extremely important to find genuinely unique sequence on a target genome. We deployed a novel method to design PCR primers, which takes into account not only the specificity of the primer itself but also the uniqueness of the product length. In this paper, we improve our proposed method to find unique PCR products on distributed databases. We show also the sequences found by our method, which can not be uniquely observed by any probe sequence but by a pair of PCR primers on S. cerevisiae genome. 1

    Secret Sequence Comparison on Public Grid Computing Resources

    Get PDF
    Once a new gene has been sequenced, it must be verified whether or not it is similar to previously sequenced genes. In many cases, the organization that sequenced a potentially novel gene needs to keep the sequence itself in confidence. However, to compare the potentially novel sequence with known sequences, it must either be sent as a query to public databases, or these databases must be downloaded onto a local computer. In both cases, the potentially new sequence is exposed to the public. In this work, we propose a novel method to compare sequences without any exact sequence information leaks to the public. This method is based on our previous proposed method [1] to find unique sequences on grid computing environments, which is well-parallelized in reasonable performance. In order to keep the exact sequence information in confidence, this method samples intervals (subsequences) from a sequence, and these intervals are hashed. Any key cryptosystem is not used. The hashed data are open to the public to verify the novelty of the sequence. The experimental results for 19797 h.sapiens genes show that the parallel implementation of this method performs reasonably well in terms of speed and memory usage. In this paper, the implementation on the world-wide testbeds of European Data Grid (EDG) and its results are describe
    corecore