1,259 research outputs found
Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty
Background: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets.
Results: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty.
Conclusion: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search
Testing statistical significance scores of sequence comparison methods with structure similarity
BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons
From Programme Theory to Logic Models for Multispecialty Community Providers: A Realist Evidence Synthesis
Background:
The NHS policy of constructing multispecialty community providers (MCPs) rests on a complex set of assumptions about how health systems can replace hospital use with enhanced primary care for people with complex, chronic or multiple health problems, while contributing savings to health-care budgets.
Objectives:
To use policy-makers’ assumptions to elicit an initial programme theory (IPT) of how MCPs can achieve their outcomes and to compare this with published secondary evidence and revise the programme theory accordingly.
Design:
Realist synthesis with a three-stage method: (1) for policy documents, elicit the IPT underlying the MCP policy, (2) review and synthesise secondary evidence relevant to those assumptions and (3) compare the programme theory with the secondary evidence and, when necessary, reformulate the programme theory in a more evidence-based way.
Data sources:
Systematic searches and data extraction using (1) the Health Management Information Consortium (HMIC) database for policy statements and (2) topically appropriate databases, including MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, the Cumulative Index to Nursing and Allied Health Literature (CINAHL) and Applied Social Sciences Index and Abstracts (ASSIA). A total of 1319 titles and abstracts were reviewed in two rounds and 116 were selected for full-text data extraction. We extracted data using a formal data extraction tool and synthesised them using a framework reflecting the main policy assumptions.
Results:
The IPT of MCPs contained 28 interconnected context–mechanism–outcome relationships. Few policy statements specified what contexts the policy mechanisms required. We found strong evidence supporting the IPT assumptions concerning organisational culture, interorganisational network management, multidisciplinary teams (MDTs), the uses and effects of health information technology (HIT) in MCP-like settings, planned referral networks, care planning for individual patients and the diversion of patients from inpatient to primary care. The evidence was weaker, or mixed (supporting some of the constituent assumptions but not others), concerning voluntary sector involvement, the effects of preventative care on hospital admissions and patient experience, planned referral networks and demand management systems. The evidence about the effects of referral reductions on costs was equivocal. We found no studies confirming that the development of preventative care would reduce demands on inpatient services. The IPT had overlooked certain mechanisms relevant to MCPs, mostly concerning MDTs and the uses of HITs.
Limitations:
The studies reviewed were limited to Organisation for Economic Co-operation and Development countries and, because of the large amount of published material, the period 2014–16, assuming that later studies, especially systematic reviews, already include important earlier findings. No empirical studies of MCPs yet existed.
Conclusions:
Multidisciplinary teams are a central mechanism by which MCPs (and equivalent networks and organisations) work, provided that the teams include the relevant professions (hence, organisations) and, for care planning, individual patients. Further primary research would be required to test elements of the revised logic model, in particular about (1) how MDTs and enhanced general practice compare and interact, or can be combined, in managing referral networks and (2) under what circumstances diverting patients from in-patient to primary care reduces NHS costs and improves the quality of patient experience
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment
Background
Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment.
Results
In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware.
Conclusions
The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches
Fluoromycobacteriophages for rapid, specific, and sensitive antibiotic susceptibility testing of Mycobacterium tuberculosis
Rapid antibiotic susceptibility testing of Mycobacterium tuberculosis is of paramount importance as multiple- and extensively- drug resistant strains of M. tuberculosis emerge and spread. We describe here a virus-based assay in which fluoromycobacteriophages are used to deliver a GFP or ZsYellow fluorescent marker gene to M. tuberculosis, which can then be monitored by fluorescent detection approaches including fluorescent microscopy and flow cytometry. Pre-clinical evaluations show that addition of either Rifampicin or Streptomycin at the time of phage addition obliterates fluorescence in susceptible cells but not in isogenic resistant bacteria enabling drug sensitivity determination in less than 24 hours. Detection requires no substrate addition, fewer than 100 cells can be identified, and resistant bacteria can be detected within mixed populations. Fluorescence withstands fixation by paraformaldehyde providing enhanced biosafety for testing MDR-TB and XDR-TB infections. © 2009 Piuri et al
RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences
Background: One of the most frequent uses of bioinformatics tools
concerns functional characterization of a newly produced nucleotide
sequence (a query sequence) by applying Blast or FASTA against a set of
sequences (the subject sequences).
However, in some specific contexts, it is useful to compare the query
sequence against a cluster such as a MultiAlignment (MA). We present
here the RegExpBlasting (REB) algorithm, which compares an unclassified
sequence with a dataset of patterns defined by application of Regular
Expression rules to a given-as-input MA datasets.
The REB algorithm workflow consists in
i. the definition of a dataset of multialignments
ii. the association of each MA to a pattern, defined by application of
regular expression rules;
iii. automatic characterization of a submitted biosequence according to
the function of the sequences described by the pattern best matching the
query sequence.
Results: An application of this algorithm is used in the "characterize
your sequence" tool available in the PPNEMA resource. PPNEMA is a
resource of Ribosomal Cistron sequences from various species, grouped
according to nematode genera. It allows the retrieval of plant nematode
multialigned sequences or the classification of new nematode rDNA
sequences by applying REB. The same algorithm also supports automatic
updating of the PPNEMA database. The present paper gives examples of the
use of REB within PPNEMA.
Conclusion: The use of REB in PPNEMA updating, the PPNEMA "characterize
your sequence" option clearly demonstrates the power of the method.
Using REB can also rapidly solve any other bioinformatics problem, where
the addition of a new sequence to a pre-existing cluster is required.
The statistical tests carried out here show the powerful flexibility of
the method
Java GUI for InterProScan (JIPS): A tool to help process multiple InterProScans and perform ortholog analysis
BACKGROUND: Recent, rapid growth in the quantity of available genomic data has generated many protein sequences that are not yet biochemically classified. Thus, the prediction of biochemical function based on structural motifs is an important task in post-genomic analysis. The InterPro databases are a major resource for protein function information. For optimal results, these databases should be searched at regular intervals, since they are frequently updated. RESULTS: We describe here a new program JIPS (Java GUI for InterProScan), a tool for tracking and viewing results obtained from repeated InterProScan searches. JIPS stores matches (in a local database) obtained from InterProScan searches performed with multiple versions of the InterPro database and highlights hits that have been added since the last search of the InterPro database. Results are displayed in an easy-to-use tabular format. JIPS also contains tools to assist with ortholog-based comparative studies of protein signatures. CONCLUSION: JIPS is an efficient tool for performing repeated InterProScans on large batches of protein sequences, tracking and viewing search results, and mining the collected data
- …