43,726 research outputs found

    Optimizing Splicing Junction Detection in Next Generation Sequencing Data on a Virtual-GRID Infrastructure

    Get PDF
    The new protocol for sequencing the messenger RNA in a cell, named RNA-seq produce millions of short sequence fragments. Next Generation Sequencing technology allows more accurate analysis but increase needs in term of computational resources. This paper describes the optimization of a RNA-seq analysis pipeline devoted to splicing variants detection, aimed at reducing computation time and providing a multi-user/multisample environment. This work brings two main contributions. First, we optimized a well-known algorithm called TopHat by parallelizing some sequential mapping steps. Second, we designed and implemented a hybrid virtual GRID infrastructure allowing to efficiently execute multiple instances of TopHat running on different samples or on behalf of different users, thus optimizing the overall execution time and enabling a flexible multi-user environmen

    Virtual Environment for Next Generation Sequencing Analysis

    Get PDF
    Next Generation Sequencing technology, on the one hand, allows a more accurate analysis, and, on the other hand, increases the amount of data to process. A new protocol for sequencing the messenger RNA in a cell, known as RNA- Seq, generates millions of short sequence fragments in a single run. These fragments, or reads, can be used to measure levels of gene expression and to identify novel splice variants of genes. The proposed solution is a distributed architecture consisting of a Grid Environment and a Virtual Grid Environment, in order to reduce processing time by making the system scalable and flexibl

    Metagenomics for Bacteriology

    Get PDF
    The study of bacteria, or bacteriology, has gone through transformative waves since its inception in the 1600s. It all started by the visualization of bacteria using light microscopy by Antonie van Leeuwenhoek, when he first described “animalcules.” Direct cellular observation then evolved into utilizing different wavelengths on novel platforms such as electron, fluorescence, and even near-infrared microscopy. Understanding the link between microbes and disease (pathogenicity) began with the ability to isolate and cultivate organisms through aseptic methodologies starting in the 1700s. These techniques became more prevalent in the following centuries with the work of famous scientists such as Louis Pasteur and Robert Koch, and many others since then. The relationship between bacteria and the host’s immune system was first inferred in the 1800s, and to date is continuing to unveil its mysteries. During the last century, researchers initiated the era of molecular genetics. The discovery of the first-generation sequencing technology, the Sanger method, and, later, the polymerase chain reaction technology propelled the molecular genetics field by exponentially expanding the knowledge of relationship between gene structure and function. The rise of commercially available next-generation sequencing methodologies, in the beginning of this century, is drastically allowing larger amount of information to be acquired, in a manner open to the democratization of the approach

    Steps in Metagenomics: Let’s Avoid Garbage in and Garbage Out

    Get PDF
    Is metagenomics a revolution or a new fad? Metagenomics is tightly associated with the availability of next-generation sequencing in all its implementations. The key feature of these new technologies, moving beyond the Sanger-based DNA sequencing approach, is the depth of nucleotide sequencing per sample.1 Knowing much more about a sample changes the traditional paradigms of “What is the most abundant?” or “What is the most significant?” to “What is present and potentially sig­nificant that might influence the situation and outcome?” Let’s take the case of identifying proper biomarkers of disease state in the context of chronic disease prevention. Prevention has been deemed as a viable option to avert human chronic diseases and to curb health­care management costs.2 The actual implementation of any effective preventive measures has proven to be rather difficult. In addition to the typically poor compliance of the general public, the vagueness of the successful validation of habit modification on the long-term risk, points to the need of defining new biomarkers of disease state. Scientists and the public are accepting the fact that humans are super-organisms, harboring both a human genome and a microbial genome, the latter being much bigger in size and diversity, and key for the health of individuals.3,4 It is time to investigate the intricate relationship between humans and their associated microbiota and how this relationship mod­ulates or affects both partners.5 These remarks can be expanded to the animal and plant kingdoms, and holistically to the Earth’s biome. By its nature, the evolution and function of all the Earth’s biomes are influenced by a myriad of interactions between and among microbes (planktonic, in biofilms or host associated) and the surrounding physical environment. The general definition of metagenomics is the cultivation-indepen­dent analysis of the genetic information of the collective genomes of the microbes within a given environment based on its sampling. It focuses on the collection of genetic information through sequencing that can target DNA, RNA, or both. The subsequent analyses can be solely fo­cused on sequence conservation, phylogenetic, phylogenomic, function, or genetic diversity representation including yet-to-be annotated genes. The diversity of hypotheses, questions, and goals to be accomplished is endless. The primary design is based on the nature of the material to be analyzed and its primary function

    Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems

    Get PDF
    A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft
    • 

    corecore