1,468 research outputs found

    Two Poems

    Get PDF
    Poetry by Brook Pearso

    Seismic/Ley lines

    Get PDF
    Poetry by Brook Pearso

    Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty

    Get PDF
    Background: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. Conclusion: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search

    Testing statistical significance scores of sequence comparison methods with structure similarity

    Get PDF
    BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons

    CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

    Get PDF
    Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches

    From Programme Theory to Logic Models for Multispecialty Community Providers: A Realist Evidence Synthesis

    Get PDF
    Background: The NHS policy of constructing multispecialty community providers (MCPs) rests on a complex set of assumptions about how health systems can replace hospital use with enhanced primary care for people with complex, chronic or multiple health problems, while contributing savings to health-care budgets. Objectives: To use policy-makers’ assumptions to elicit an initial programme theory (IPT) of how MCPs can achieve their outcomes and to compare this with published secondary evidence and revise the programme theory accordingly. Design: Realist synthesis with a three-stage method: (1) for policy documents, elicit the IPT underlying the MCP policy, (2) review and synthesise secondary evidence relevant to those assumptions and (3) compare the programme theory with the secondary evidence and, when necessary, reformulate the programme theory in a more evidence-based way. Data sources: Systematic searches and data extraction using (1) the Health Management Information Consortium (HMIC) database for policy statements and (2) topically appropriate databases, including MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, the Cumulative Index to Nursing and Allied Health Literature (CINAHL) and Applied Social Sciences Index and Abstracts (ASSIA). A total of 1319 titles and abstracts were reviewed in two rounds and 116 were selected for full-text data extraction. We extracted data using a formal data extraction tool and synthesised them using a framework reflecting the main policy assumptions. Results: The IPT of MCPs contained 28 interconnected context–mechanism–outcome relationships. Few policy statements specified what contexts the policy mechanisms required. We found strong evidence supporting the IPT assumptions concerning organisational culture, interorganisational network management, multidisciplinary teams (MDTs), the uses and effects of health information technology (HIT) in MCP-like settings, planned referral networks, care planning for individual patients and the diversion of patients from inpatient to primary care. The evidence was weaker, or mixed (supporting some of the constituent assumptions but not others), concerning voluntary sector involvement, the effects of preventative care on hospital admissions and patient experience, planned referral networks and demand management systems. The evidence about the effects of referral reductions on costs was equivocal. We found no studies confirming that the development of preventative care would reduce demands on inpatient services. The IPT had overlooked certain mechanisms relevant to MCPs, mostly concerning MDTs and the uses of HITs. Limitations: The studies reviewed were limited to Organisation for Economic Co-operation and Development countries and, because of the large amount of published material, the period 2014–16, assuming that later studies, especially systematic reviews, already include important earlier findings. No empirical studies of MCPs yet existed. Conclusions: Multidisciplinary teams are a central mechanism by which MCPs (and equivalent networks and organisations) work, provided that the teams include the relevant professions (hence, organisations) and, for care planning, individual patients. Further primary research would be required to test elements of the revised logic model, in particular about (1) how MDTs and enhanced general practice compare and interact, or can be combined, in managing referral networks and (2) under what circumstances diverting patients from in-patient to primary care reduces NHS costs and improves the quality of patient experience

    Clustering exact matches of pairwise sequence alignments by weighted linear regression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless.</p> <p>Results</p> <p>We have developed an algorithm that uses the coordinates of all the exact matches or high similarity local alignments, clusters them with respect to the main diagonal in the dot plot using a weighted linear regression technique, and identifies the starting and ending coordinates of the region of interest.</p> <p>Conclusion</p> <p>This algorithm complements existing pairwise sequence alignment packages by replacing the time-consuming seed extension phase with a weighted linear regression for the alignment seeds. It was experimentally shown that the gain in execution time can be outstanding without compromising the accuracy. This method should be of great utility to sequence assembly and genome comparison projects.</p

    CBESW: Sequence Alignment on the Playstation 3

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation<sup>® </sup>3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm.</p> <p>Results</p> <p>For large datasets, our implementation on the PlayStation<sup>® </sup>3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS.</p> <p>Conclusion</p> <p>The results from our experiments demonstrate that the PlayStation<sup>® </sup>3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.</p
    • …
    corecore