1,259 research outputs found

    Seismic/Ley lines

    Get PDF
    Poetry by Brook Pearso

    Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty

    Get PDF
    Background: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. Conclusion: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search

    Testing statistical significance scores of sequence comparison methods with structure similarity

    Get PDF
    BACKGROUND: In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. RESULTS: All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. CONCLUSION: The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons

    From Programme Theory to Logic Models for Multispecialty Community Providers: A Realist Evidence Synthesis

    Get PDF
    Background: The NHS policy of constructing multispecialty community providers (MCPs) rests on a complex set of assumptions about how health systems can replace hospital use with enhanced primary care for people with complex, chronic or multiple health problems, while contributing savings to health-care budgets. Objectives: To use policy-makers’ assumptions to elicit an initial programme theory (IPT) of how MCPs can achieve their outcomes and to compare this with published secondary evidence and revise the programme theory accordingly. Design: Realist synthesis with a three-stage method: (1) for policy documents, elicit the IPT underlying the MCP policy, (2) review and synthesise secondary evidence relevant to those assumptions and (3) compare the programme theory with the secondary evidence and, when necessary, reformulate the programme theory in a more evidence-based way. Data sources: Systematic searches and data extraction using (1) the Health Management Information Consortium (HMIC) database for policy statements and (2) topically appropriate databases, including MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, the Cumulative Index to Nursing and Allied Health Literature (CINAHL) and Applied Social Sciences Index and Abstracts (ASSIA). A total of 1319 titles and abstracts were reviewed in two rounds and 116 were selected for full-text data extraction. We extracted data using a formal data extraction tool and synthesised them using a framework reflecting the main policy assumptions. Results: The IPT of MCPs contained 28 interconnected context–mechanism–outcome relationships. Few policy statements specified what contexts the policy mechanisms required. We found strong evidence supporting the IPT assumptions concerning organisational culture, interorganisational network management, multidisciplinary teams (MDTs), the uses and effects of health information technology (HIT) in MCP-like settings, planned referral networks, care planning for individual patients and the diversion of patients from inpatient to primary care. The evidence was weaker, or mixed (supporting some of the constituent assumptions but not others), concerning voluntary sector involvement, the effects of preventative care on hospital admissions and patient experience, planned referral networks and demand management systems. The evidence about the effects of referral reductions on costs was equivocal. We found no studies confirming that the development of preventative care would reduce demands on inpatient services. The IPT had overlooked certain mechanisms relevant to MCPs, mostly concerning MDTs and the uses of HITs. Limitations: The studies reviewed were limited to Organisation for Economic Co-operation and Development countries and, because of the large amount of published material, the period 2014–16, assuming that later studies, especially systematic reviews, already include important earlier findings. No empirical studies of MCPs yet existed. Conclusions: Multidisciplinary teams are a central mechanism by which MCPs (and equivalent networks and organisations) work, provided that the teams include the relevant professions (hence, organisations) and, for care planning, individual patients. Further primary research would be required to test elements of the revised logic model, in particular about (1) how MDTs and enhanced general practice compare and interact, or can be combined, in managing referral networks and (2) under what circumstances diverting patients from in-patient to primary care reduces NHS costs and improves the quality of patient experience

    CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

    Get PDF
    Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches

    Fluoromycobacteriophages for rapid, specific, and sensitive antibiotic susceptibility testing of Mycobacterium tuberculosis

    Get PDF
    Rapid antibiotic susceptibility testing of Mycobacterium tuberculosis is of paramount importance as multiple- and extensively- drug resistant strains of M. tuberculosis emerge and spread. We describe here a virus-based assay in which fluoromycobacteriophages are used to deliver a GFP or ZsYellow fluorescent marker gene to M. tuberculosis, which can then be monitored by fluorescent detection approaches including fluorescent microscopy and flow cytometry. Pre-clinical evaluations show that addition of either Rifampicin or Streptomycin at the time of phage addition obliterates fluorescence in susceptible cells but not in isogenic resistant bacteria enabling drug sensitivity determination in less than 24 hours. Detection requires no substrate addition, fewer than 100 cells can be identified, and resistant bacteria can be detected within mixed populations. Fluorescence withstands fixation by paraformaldehyde providing enhanced biosafety for testing MDR-TB and XDR-TB infections. © 2009 Piuri et al

    RegExpBlasting (REB), a Regular Expression Blasting algorithm based on multiply aligned sequences

    Get PDF
    Background: One of the most frequent uses of bioinformatics tools concerns functional characterization of a newly produced nucleotide sequence (a query sequence) by applying Blast or FASTA against a set of sequences (the subject sequences). However, in some specific contexts, it is useful to compare the query sequence against a cluster such as a MultiAlignment (MA). We present here the RegExpBlasting (REB) algorithm, which compares an unclassified sequence with a dataset of patterns defined by application of Regular Expression rules to a given-as-input MA datasets. The REB algorithm workflow consists in i. the definition of a dataset of multialignments ii. the association of each MA to a pattern, defined by application of regular expression rules; iii. automatic characterization of a submitted biosequence according to the function of the sequences described by the pattern best matching the query sequence. Results: An application of this algorithm is used in the "characterize your sequence" tool available in the PPNEMA resource. PPNEMA is a resource of Ribosomal Cistron sequences from various species, grouped according to nematode genera. It allows the retrieval of plant nematode multialigned sequences or the classification of new nematode rDNA sequences by applying REB. The same algorithm also supports automatic updating of the PPNEMA database. The present paper gives examples of the use of REB within PPNEMA. Conclusion: The use of REB in PPNEMA updating, the PPNEMA "characterize your sequence" option clearly demonstrates the power of the method. Using REB can also rapidly solve any other bioinformatics problem, where the addition of a new sequence to a pre-existing cluster is required. The statistical tests carried out here show the powerful flexibility of the method

    Java GUI for InterProScan (JIPS): A tool to help process multiple InterProScans and perform ortholog analysis

    Get PDF
    BACKGROUND: Recent, rapid growth in the quantity of available genomic data has generated many protein sequences that are not yet biochemically classified. Thus, the prediction of biochemical function based on structural motifs is an important task in post-genomic analysis. The InterPro databases are a major resource for protein function information. For optimal results, these databases should be searched at regular intervals, since they are frequently updated. RESULTS: We describe here a new program JIPS (Java GUI for InterProScan), a tool for tracking and viewing results obtained from repeated InterProScan searches. JIPS stores matches (in a local database) obtained from InterProScan searches performed with multiple versions of the InterPro database and highlights hits that have been added since the last search of the InterPro database. Results are displayed in an easy-to-use tabular format. JIPS also contains tools to assist with ortholog-based comparative studies of protein signatures. CONCLUSION: JIPS is an efficient tool for performing repeated InterProScans on large batches of protein sequences, tracking and viewing search results, and mining the collected data
    • …
    corecore