Search CORE

Testing statistical significance scores of sequence comparison methods with structure similarity

Author: AA Schaffer
AD Kester
EV Kriventseva
G Salton
GA Price
HS Booth
J Park
Jack AM Leunissen
Jacob de Vlieg
JJ Codani
JP Comet
JT Reese
M Gribskov
O Bastien
P Agarwal
Peter MA Groenen
R Apweiler
RF Doolittle
S Henikoff
SE Brenner
SE Brenner
SF Altschul
T Hulsen
T Rognes
TF Smith
Tim Hulsen
WR Pearson
WR Pearson
WR Pearson
WR Pearson
Z Chen
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons

Wageningen University & Research Publications

Radboud Repository

CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

Author: A Bairoch
Giorgio Valle
KM Chao
M Farrar
M Gribskov
O Gotoh
S Henikoff
SB Needleman
SF Altschul
Svetlin A Manavski
T Rognes
TF Smith
W Liu
W Pearson
W Pearson
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches

Archivio istituzionale della ricerca - Università di Padova

From Programme Theory to Logic Models for Multispecialty Community Providers: A Realist Evidence Synthesis

Author: Brand S
Briscoe S
Byng R
Fornasiero M
LLoyd H
Pearson M
Sheaff WR
Valderas J
Wanner A
Publication venue: NIHR Health Services and Delivery Programme
Publication date: 01/03/2018
Field of study

Background: The NHS policy of constructing multispecialty community providers (MCPs) rests on a complex set of assumptions about how health systems can replace hospital use with enhanced primary care for people with complex, chronic or multiple health problems, while contributing savings to health-care budgets. Objectives: To use policy-makers’ assumptions to elicit an initial programme theory (IPT) of how MCPs can achieve their outcomes and to compare this with published secondary evidence and revise the programme theory accordingly. Design: Realist synthesis with a three-stage method: (1) for policy documents, elicit the IPT underlying the MCP policy, (2) review and synthesise secondary evidence relevant to those assumptions and (3) compare the programme theory with the secondary evidence and, when necessary, reformulate the programme theory in a more evidence-based way. Data sources: Systematic searches and data extraction using (1) the Health Management Information Consortium (HMIC) database for policy statements and (2) topically appropriate databases, including MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, the Cumulative Index to Nursing and Allied Health Literature (CINAHL) and Applied Social Sciences Index and Abstracts (ASSIA). A total of 1319 titles and abstracts were reviewed in two rounds and 116 were selected for full-text data extraction. We extracted data using a formal data extraction tool and synthesised them using a framework reflecting the main policy assumptions. Results: The IPT of MCPs contained 28 interconnected context–mechanism–outcome relationships. Few policy statements specified what contexts the policy mechanisms required. We found strong evidence supporting the IPT assumptions concerning organisational culture, interorganisational network management, multidisciplinary teams (MDTs), the uses and effects of health information technology (HIT) in MCP-like settings, planned referral networks, care planning for individual patients and the diversion of patients from inpatient to primary care. The evidence was weaker, or mixed (supporting some of the constituent assumptions but not others), concerning voluntary sector involvement, the effects of preventative care on hospital admissions and patient experience, planned referral networks and demand management systems. The evidence about the effects of referral reductions on costs was equivocal. We found no studies confirming that the development of preventative care would reduce demands on inpatient services. The IPT had overlooked certain mechanisms relevant to MCPs, mostly concerning MDTs and the uses of HITs. Limitations: The studies reviewed were limited to Organisation for Economic Co-operation and Development countries and, because of the large amount of published material, the period 2014–16, assuming that later studies, especially systematic reviews, already include important earlier findings. No empirical studies of MCPs yet existed. Conclusions: Multidisciplinary teams are a central mechanism by which MCPs (and equivalent networks and organisations) work, provided that the teams include the relevant professions (hence, organisations) and, for care planning, individual patients. Further primary research would be required to test elements of the revised logic model, in particular about (1) how MDTs and enhanced general practice compare and interact, or can be combined, in managing referral networks and (2) under what circumstances diverting patients from in-patient to primary care reduces NHS costs and improves the quality of patient experience

Repository@Hull - Worktribe

Online Research @ Cardiff

PEARL (Univ. of Plymouth)

Clustering exact matches of pairwise sequence alignments by weighted linear regression

Author: Alvaro J González
F Sanger
Li Liao
PA Pevzner
S Kurtz
SF Altschul
TF Smith
WJ Kent
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background At intermediate stages of genome assembly projects, when a number of contigs have been generated and their validity needs to be verified, it is desirable to align these contigs to a reference genome when it is available. The interest is not to analyze a detailed alignment between a contig and the reference genome at the base level, but rather to have a rough estimate of where the contig aligns to the reference genome, specifically, by identifying the starting and ending positions of such a region. This information is very useful in ordering the contigs, facilitating post-assembly analysis such as gap closure and resolving repeats. There exist programs, such as BLAST and MUMmer, that can quickly align and identify high similarity segments between two sequences, which, when seen in a dot plot, tend to agglomerate along a diagonal but can also be disrupted by gaps or shifted away from the main diagonal due to mismatches between the contig and the reference. It is a tedious and practically impossible task to visually inspect the dot plot to identify the regions covered by a large number of contigs from sequence assembly projects. A forced global alignment between a contig and the reference is not only time consuming but often meaningless. Results We have developed an algorithm that uses the coordinates of all the exact matches or high similarity local alignments, clusters them with respect to the main diagonal in the dot plot using a weighted linear regression technique, and identifies the starting and ending coordinates of the region of interest. Conclusion This algorithm complements existing pairwise sequence alignment packages by replacing the time-consuming seed extension phase with a weighted linear regression for the alignment seeds. It was experimentally shown that the gain in execution time can be outstanding without compromising the accuracy. This method should be of great utility to sequence assembly and genome comparison projects.</p

pp-Blast: a "pseudo-parallel" Blast

Author: A.C. Zaiats
Altschul S
Camargo AA
E.C. Osório
J.E. de Souza
P.S.L. de Oliveira
Pearson WR
S.J. de Souza
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

CBESW: Sequence Alignment on the Playstation 3

Author: A Stamatakis
A Wozniak
Adrianto Wirawan
Bertil Schmidt
Chee Keong Kwoh
D Pham
DA Benson
IBM
International Business Machines
ITS Li
JA Kahle
M Farrar
Nim Tri Hieu
O Gotoh
R Durbin
SA Manavski
T Rognes
T Smith
TF Oliver
V Pande
V Sachdeva
W Liu
W Liu
WR Pearson
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The exponential growth of available biological data has caused bioinformatics to be rapidly moving towards a data-intensive, computational science. As a result, the computational power needed by bioinformatics applications is growing exponentially as well. The recent emergence of accelerator technologies has made it possible to achieve an excellent improvement in execution time for many bioinformatics applications, compared to current general-purpose platforms. In this paper, we demonstrate how the PlayStation® 3, powered by the Cell Broadband Engine, can be used as a computational platform to accelerate the Smith-Waterman algorithm. Results For large datasets, our implementation on the PlayStation® 3 provides a significant improvement in running time compared to other implementations such as SSEARCH, Striped Smith-Waterman and CUDA. Our implementation achieves a peak performance of up to 3,646 MCUPS. Conclusion The results from our experiments demonstrate that the PlayStation® 3 console can be used as an efficient low cost computational platform for high performance sequence alignment applications.</p