31 research outputs found

    Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST

    Get PDF
    BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server. RESULTS: We evaluate the statistical and retrieval accuracy of the E-values reported by a baseline version of TBLASTN and by two variants that use different types of composition-based statistics. To test the statistical accuracy of TBLASTN, we ran 1000 searches using scrambled proteins from the mouse genome and a database of human chromosomes. To test retrieval accuracy, we modernize and adapt to translated searches a test set previously used to evaluate the retrieval accuracy of protein-protein searches. We show that composition-based statistics greatly improve the statistical accuracy of TBLASTN, at a small cost to the retrieval accuracy. CONCLUSION: TBLASTN is widely used, as it is common to wish to compare proteins to chromosomes or to libraries of mRNAs. Composition-based statistics improve the statistical accuracy, and therefore the reliability, of TBLASTN results. The algorithms used by TBLASTN are not widely known, and some of the most important are reported here. The data used to test TBLASTN are available for download and may be useful in other studies of translated search algorithms

    Separated by a Common Language: Awareness of Term Usage Differences Between Languages and Disciplines in Biopreparedness

    Get PDF
    Preparedness for bioterrorism is based on communication between people in organizations who are educated and trained in several disciplines, including law enforcement, health, and science. Various backgrounds, cultures, and vocabularies generate difficulties in understanding and interpretating terms and concepts, which may impair communication. This is especially true in emergency situations, in which the need for clarity and consistency is vital. The EU project AniBio- Threat initiated methods and made a rough estimate of the terms and concepts that are crucial for an incident, and a pilot database with key terms and definitions has been constructed. Analysis of collected terms and sources has shown that many of the participating organizations use various international standards in their area of expertise. The same term often represents different concepts in the standards from different sectors, or, alternatively, different terms were used to represent the same or similar concepts. The use of conflicting terminology can be problematic for decision makers and communicators in planning and prevention or when handling an incident. Since the CBRN area has roots in multiple disciplines, each with its own evolving terminology, it may not be realistic to achieve unequivocal communication through a standardized vocabulary and joint definitions for words from common language. We suggest that a communication strategy should include awareness of alternative definitions and ontologies and the ability to talk and write without relying on the implicit knowledge underlying specialized jargon. Consequently, cross-disciplinary communication skills should be part of training of personnel in the CBRN field. In addition, a searchable repository of terms and definitions from relevant organizations and authorities would be a valuable addition to existing glossaries for improving awareness concerning bioterrorism prevention planning

    Apoptosis. Searching for FLASH domains

    No full text
    During programmed cell death (apoptosis), a protein named FLASH is required to regulate the proteolytic cascade that ends in the death of the cell. Imai and co-workers have reported1 that FLASH appears to be a functional analogue of two other apoptotic proteins, mammalian Apaf-1 and its nematode homologue CED-4, and that FLASH contains an amino-acid sequence motif that is homologous to the ATPase domain of Apaf-1, to the CED-4 sequence, and to a family of plant stress-resistant proteins that are apoptotic ATPases2. Furthermore, FLASH contains two other domains (DRD) that are apparently related to the death-effector domain (DED)1, an adaptor sequence that mediates interactions between proteins of the apoptosis machinery2. These findings should help to explain the mechanism of action of this important protein. However, we have been unable to confirm the exist-ence of these domains after re-examining the FLASH sequence
    corecore