83,160 research outputs found

    A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

    Get PDF
    Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

    A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

    Get PDF
    Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

    Bioinformatic analysis of proteomics data

    Get PDF
    Most biochemical reactions in a cell are regulated by highly specialized proteins, which are the prime mediators of the cellular phenotype. Therefore the identification, quantitation and characterization of all proteins in a cell are of utmost importance to understand the molecular processes that mediate cellular physiology. With the advent of robust and reliable mass spectrometers that are able to analyze complex protein mixtures within a reasonable timeframe, the systematic analysis of all proteins in a cell becomes feasible. Besides the ongoing improvements of analytical hardware, standardized methods to analyze and study all proteins have to be developed that allow the generation of testable new hypothesis based on the enormous pre-existing amount of biological information. Here we discuss current strategies on how to gather, filter and analyze proteomic data sates using available software packages

    Lipoproteins of Mycobacterium tuberculosis : an abundant and functionally diverse class of cell envelope components

    Get PDF
    Mycobacterium tuberculosis remains the predominant bacterial scourge of mankind. Understanding of its biology and pathogenicity has been greatly advanced by the determination of whole genome sequences for this organism. Bacterial lipoproteins are a functionally diverse class of membrane-anchored proteins. The signal peptides of these proteins direct their export and post-translational lipid modification. These signal peptides are amenable to bioinformatic analysis, allowing the lipoproteins encoded in whole genomes to be catalogued. This review applies bioinformatic methods to the identification and functional characterisation of the lipoproteins encoded in the M. tuberculosis genomes. Ninety nine putative lipoproteins were identified and so this family of proteins represents ca. 2.5% of the M. tuberculosis predicted proteome. Thus, lipoproteins represent an important class of cell envelope proteins that may contribute to the virulence of this major pathogen

    PinAPL-Py: A comprehensive web-application for the analysis of CRISPR/Cas9 screens.

    Get PDF
    Large-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise. To make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots. PinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions and test datasets

    Clinical findings associated with a de novo partial trisomy 10p11.22p15.3 and monosomy 7p22.3 detected by chromosomal microarray analysis.

    Get PDF
    We present the case of an 18-month-old boy with dysmorphic facial features, developmental delay, growth retardation, bilateral clubfeet, thrombocytopenia, and strabismus, whose array CGH analysis revealed concurrent de novo trisomy 10p11.22p15.3 and monosomy 7p22.3. We describe the patient's clinical presentation, along with his cytogenetic analysis, and we compare the findings to those of similar case reports in the literature. We also perform a bioinformatic analysis in the chromosomal regions of segmental aneuploidy to find genes that could potentially explain the patient's phenotype

    Public or private economies of knowledge: The economics of diffusion and appropriation of bioinformatics tools

    Get PDF
    The past three decades have witnessed a period of great turbulence in the economies of biological knowledge, during which there has been great uncertainty as to how and where boundaries could be drawn between public or private knowledge especially with regard to the explosive growth in biological databases and their related bioinformatic tools. This paper will focus on some of the key software tools developed in relation to bio-databases. It will argue that bioinformatic tools are particularly economically unstable, and that there is a continuing tension and competition between their public and private modes of production, appropriation, distribution, and use. The paper adopts an ?instituted economic process? approach, and in this paper will elaborate on processes of making knowledge public in the creation of ?public goods?. The question is one of continuously creating and sustaining new institutions of the commons. We believe this critical to an understanding of the division and interdependency between public and private economies of knowledge

    Predicting the outer membrane proteome of Pasteurella multocida based on consensus prediction enhanced by results integration and manual confirmation

    Get PDF
    Background Outer membrane proteins (OMPs) of Pasteurella multocida have various functions related to virulence and pathogenesis and represent important targets for vaccine development. Various bioinformatic algorithms can predict outer membrane localization and discriminate OMPs by structure or function. The designation of a confident prediction framework by integrating different predictors followed by consensus prediction, results integration and manual confirmation will improve the prediction of the outer membrane proteome. Results In the present study, we used 10 different predictors classified into three groups (subcellular localization, transmembrane β-barrel protein and lipoprotein predictors) to identify putative OMPs from two available P. multocida genomes: those of avian strain Pm70 and porcine non-toxigenic strain 3480. Predicted proteins in each group were filtered by optimized criteria for consensus prediction: at least two positive predictions for the subcellular localization predictors, three for the transmembrane β-barrel protein predictors and one for the lipoprotein predictors. The consensus predicted proteins were integrated from each group into a single list of proteins. We further incorporated a manual confirmation step including a public database search against PubMed and sequence analyses, e.g. sequence and structural homology, conserved motifs/domains, functional prediction, and protein-protein interactions to enhance the confidence of prediction. As a result, we were able to confidently predict 98 putative OMPs from the avian strain genome and 107 OMPs from the porcine strain genome with 83% overlap between the two genomes. Conclusions The bioinformatic framework developed in this study has increased the number of putative OMPs identified in P. multocida and allowed these OMPs to be identified with a higher degree of confidence. Our approach can be applied to investigate the outer membrane proteomes of other Gram-negative bacteria
    corecore