17,529 research outputs found

    Peptide mass fingerprinting using field-programmable gate arrays

    Get PDF
    The reconfigurable computing paradigm, which exploits the flexibility and versatility of field-programmable gate arrays (FPGAs), has emerged as a powerful solution for speeding up time-critical algorithms. This paper describes a reconfigurable computing solution for processing raw mass spectrometric data generated by MALDI-TOF instruments. The hardware-implemented algorithms for denoising, baseline correction, peak identification, and deisotoping, running on a Xilinx Virtex-2 FPGA at 180 MHz, generate a mass fingerprint that is over 100 times faster than an equivalent algorithm written in C, running on a Dual 3-GHz Xeon server. The results obtained using the FPGA implementation are virtually identical to those generated by a commercial software package MassLynx

    Efficient seeding techniques for protein similarity search

    Get PDF
    We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets.We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix

    Efficient seeding techniques for protein similarity search

    Get PDF
    We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets.We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix

    TRAPID : an efficient online tool for the functional and comparative analysis of de novo RNA-Seq transcriptomes

    Get PDF
    Transcriptome analysis through next-generation sequencing technologies allows the generation of detailed gene catalogs for non-model species, at the cost of new challenges with regards to computational requirements and bioinformatics expertise. Here, we present TRAPID, an online tool for the fast and efficient processing of assembled RNA-Seq transcriptome data, developed to mitigate these challenges. TRAPID offers high-throughput open reading frame detection, frameshift correction and includes a functional, comparative and phylogenetic toolbox, making use of 175 reference proteomes. Benchmarking and comparison against state-of-the-art transcript analysis tools reveals the efficiency and unique features of the TRAPID system

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

    Towards Understanding the Origin of Genetic Languages

    Full text link
    Molecular biology is a nanotechnology that works--it has worked for billions of years and in an amazing variety of circumstances. At its core is a system for acquiring, processing and communicating information that is universal, from viruses and bacteria to human beings. Advances in genetics and experience in designing computers have taken us to a stage where we can understand the optimisation principles at the root of this system, from the availability of basic building blocks to the execution of tasks. The languages of DNA and proteins are argued to be the optimal solutions to the information processing tasks they carry out. The analysis also suggests simpler predecessors to these languages, and provides fascinating clues about their origin. Obviously, a comprehensive unraveling of the puzzle of life would have a lot to say about what we may design or convert ourselves into.Comment: (v1) 33 pages, contributed chapter to "Quantum Aspects of Life", edited by D. Abbott, P. Davies and A. Pati, (v2) published version with some editin

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing
    corecore