581 research outputs found

    Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions

    Full text link
    Improvements in sequencing technologies and reduced experimental costs have resulted in a vast number of studies generating high-throughput data. Although the number of methods to analyze these "omics" data has also increased, computational complexity and lack of documentation hinder researchers from analyzing their high-throughput data to its true potential. In this chapter we detail our data-driven, transkingdom network (TransNet) analysis protocol to integrate and interrogate multi-omics data. This systems biology approach has allowed us to successfully identify important causal relationships between different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of data

    A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

    Get PDF
    Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

    A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

    Get PDF
    Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

    Engineered split in Pfu DNA polymerase fingers domain improves incorporation of nucleotide γ-phosphate derivative

    Get PDF
    Using compartmentalized self-replication (CSR), we evolved a version of Pyrococcus furiosus (Pfu) DNA polymerase that tolerates modification of the γ-phosphate of an incoming nucleotide. A Q484R mutation in α-helix P of the fingers domain, coupled with an unintended translational termination-reinitiation (split) near the finger tip, dramatically improve incorporation of a bulky γ-phosphate-O-linker-dabcyl substituent. Whether synthesized by coupled translation from a bicistronic (−1 frameshift) clone, or reconstituted from separately expressed and purified fragments, split Pfu mutant behaves identically to wild-type DNA polymerase with respect to chromatographic behavior, steady-state kinetic parameters (for dCTP), and PCR performance. Although naturally-occurring splits have been identified previously in the finger tip region of T4 gp43 variants, this is the first time a split (in combination with a point mutation) has been shown to broaden substrate utilization. Moreover, this latest example of a split hyperthermophilic archaeal DNA polymerase further illustrates the modular nature of the Family B DNA polymerase structure

    Midgut microbiota of the malaria mosquito vector Anopheles gambiae and Interactions with plasmodium falciparum Infection

    Get PDF
    The susceptibility of Anopheles mosquitoes to Plasmodium infections relies on complex interactions between the insect vector and the malaria parasite. A number of studies have shown that the mosquito innate immune responses play an important role in controlling the malaria infection and that the strength of parasite clearance is under genetic control, but little is known about the influence of environmental factors on the transmission success. We present here evidence that the composition of the vector gut microbiota is one of the major components that determine the outcome of mosquito infections. A. gambiae mosquitoes collected in natural breeding sites from Cameroon were experimentally challenged with a wild P. falciparum isolate, and their gut bacterial content was submitted for pyrosequencing analysis. The meta-taxogenomic approach revealed a broader richness of the midgut bacterial flora than previously described. Unexpectedly, the majority of bacterial species were found in only a small proportion of mosquitoes, and only 20 genera were shared by 80% of individuals. We show that observed differences in gut bacterial flora of adult mosquitoes is a result of breeding in distinct sites, suggesting that the native aquatic source where larvae were grown determines the composition of the midgut microbiota. Importantly, the abundance of Enterobacteriaceae in the mosquito midgut correlates significantly with the Plasmodium infection status. This striking relationship highlights the role of natural gut environment in parasite transmission. Deciphering microbe-pathogen interactions offers new perspectives to control disease transmission.Institut de Recherche pour le Developpement (IRD); French Agence Nationale pour la Recherche [ANR-11-BSV7-009-01]; European Community [242095, 223601]info:eu-repo/semantics/publishedVersio

    Single-Nucleotide Polymorphism Genotyping Identifies a Locally Endemic Clone of Methicillin-Resistant Staphylococcus aureus

    Get PDF
    We developed, tested, and applied a TaqMan real-time PCR assay for interrogation of three single-nucleotide polymorphisms that differentiate a clade (termed ‘t003-X’) within the radiation of methicillin-resistant Staphylococcus aureus (MRSA) ST225. The TaqMan assay achieved 98% typeability and results were fully concordant with DNA sequencing. By applying this assay to 305 ST225 isolates from an international collection, we demonstrate that clade t003-X is endemic in a single acute-care hospital in Germany at least since 2006, where it has caused a substantial proportion of infections. The strain was also detected in another hospital located 16 kilometers away. Strikingly, however, clade t003-X was not found in 62 other hospitals throughout Germany nor among isolates from other countries, and, hence, displayed a very restricted geographical distribution. Consequently, our results show that SNP-typing may be useful to identify and track MRSA clones that are specific to individual healthcare institutions. In contrast, the spatial dissemination pattern observed here had not been resolved by other typing procedures, including multilocus sequence typing (MLST), spa typing, DNA macrorestriction, and multilocus variable-number tandem repeat analysis (MLVA)

    Deep sequencing of virus-infected cells reveals HIV-encoded small RNAs

    Get PDF
    Small virus-derived interfering RNAs (viRNAs) play an important role in antiviral defence in plants, insects and nematodes by triggering the RNA interference (RNAi) pathway. The role of RNAi as an antiviral defence mechanism in mammalian cells has been obscure due to the lack of viRNA detection. Although viRNAs from different mammalian viruses have recently been identified, their functions and possible impact on viral replication remain unknown. To identify viRNAs derived from HIV-1, we used the extremely sensitive SOLiDTM 3 Plus System to analyse viRNA accumulation in HIV-1-infected T lymphocytes. We detected numerous small RNAs that correspond to the HIV-1 RNA genome. The majority of these sequences have a positive polarity (98.1%) and could be derived from miRNAs encoded by structured segments of the HIV-1 RNA genome (vmiRNAs). A small portion of the viRNAs is of negative polarity and most of them are encoded within the 3′-UTR, which may represent viral siRNAs (vsiRNAs). The identified vsiRNAs can potently repress HIV-1 production, whereas suppression of the vsiRNAs by antagomirs stimulate virus production. These results suggest that HIV-1 triggers the production of vsiRNAs and vmiRNAs to modulate cellular and/or viral gene expression

    ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

    Get PDF
    Background: The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results: The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions: ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf. comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.Blanca Postigo, JM.; Pascual Bañuls, L.; Ziarsolo Areitioaurtena, P.; Nuez Viñals, F.; Cañizares Sales, J. (2011). Ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics. 12:1-8. doi:10.1186/1471-2164-12-285S1812Metzker ML: Sequencing technologies - the next generation. Nature Reviews Genetics. 2010, 11 (1): 31-46. 10.1038/nrg2626.454 sequencing. [ http://www.454.com/ ]Illumina Inc. [ http://www.illumina.com/ ]Flicek P, Birney E: Sense from sequence reads: methods for alignment and assembly (vol 6, pg S6, 2009). Nature Methods. 2010, 7 (6): 479-479.Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WEG, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research. 2004, 14 (6): 1147-1159. 10.1101/gr.1917404.Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009, 10 (3):Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data P: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.1000 Genomes. A deep Catalog of Human Genetic Variation. [ http://1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0 ]The seqanswers internet forum. [ http://seqanswers.com/ ]Blankenberg D, Taylor J, Schenck I, He JB, Zhang Y, Ghent M, Veeraraghavan N, Albert I, Miller W, Makova KD, Ross CH, Nekrutenko A: A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly. Genome Research. 2007, 17 (6): 960-964. 10.1101/gr.5578007.CloVR Automated Sequence Analysis from Your Desktop. [ http://clovr.org/ ]Papanicolaou A, Stierli R, Ffrench-Constant RH, Heckel DG: Next generation transcriptomes for next generation genomes using est2assembly. Bmc Bioinformatics. 2009, 10:Applied Biosystems by life technologies. [ http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html ]Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang HY, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW: Comparison of next generation sequencing technologies for transcriptome characterization. Bmc Genomics. 2009, 10:Murchison EP, Tovar C, Hsu A, Bender HS, Kheradpour P, Rebbeck CA, Obendorf D, Conlan C, Bahlo M, Blizzard CA, Pyecroft S, Kreiss A, Kellis M, Stark A, Harkins TT, Marshall Graves JA, Woods GM, Hanon GJ, Papenfuss AT: The Tasmanian Devil Transcriptome Reveals Schwann Cell Origins of a Clonally Transmissible Cancer. Science. 2010, 327 (5961): 84-87. 10.1126/science.1180616.Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. Bmc Genomics. 2010, 11:Babik W, Stuglik M, Qi W, Kuenzli M, Kuduk K, Koteja P, Radwan J: Heart transcriptome of the bank vole (Myodes glareolus): towards understanding the evolutionary variation in metabolic rate. BMC Genomics. 2010, 11: 390-10.1186/1471-2164-11-390.Miller JC, Tanksley SD: RFLP analysis of phylogenetic-relationships and genetic-variation in the genus Lycopersicon. Theoretical and Applied Genetics. 1990, 80 (4): 437-448.Williams CE, Stclair DA: Phenetic relationships and levels of variability detected by restriction-fragment-length-polymorphism and random amplified polymorphic DNA analysis of cultivated and wild accessions of Lycopersicon-esculentum. Genome. 1993, 36 (3): 619-630. 10.1139/g93-083.Rick CM: Tomato, Lycopersicon esculentum (Solanaceae). Evolution of crop plants. Edited by: Simmonds NW. 1976, London: Longman Group, 268-273.Labate JA, Baldo AM: Tomato SNP discovery by EST mining and resequencing. Molecular Breeding. 2005, 16 (4): 343-349. 10.1007/s11032-005-1911-5.Yano K, Watanabe M, Yamamoto N, Maeda F, Tsugane T, Shibata D: Expressed sequence tags (EST) database of a miniature tomato cultivar, Micro-Tom. Plant and Cell Physiology. 2005, 46: S139-S139.Jimenez-Gomez JM, Maloof JN: Sequence diversity in three tomato species: SNPs, markers, and molecular evolution. Bmc Plant Biology. 2009, 9:Yang WC, Bai XD, Kabelka E, Eaton C, Kamoun S, van der Knaap E, Francis D: Discovery of single nucleotide polymorphisms in Lycopersicon esculentum by computer aided analysis of expressed sequence tags. Molecular Breeding. 2004, 14 (1): 21-34.Van Deynze A, Stoffel K, Buell CR, Kozik A, Liu J, van der Knaap E, Francis D: Diversity in conserved genes in tomato. Bmc Genomics. 2007, 8:Sim SC, Robbins MD, Chilcott C, Zhu T, Francis DM: Oligonucleotide array discovery of polymorphisms in cultivated tomato (Solanum lycopersicum L.) reveals patterns of SNP variation associated with breeding. Bmc Genomics. 2009, 10:Bioinformatics at COMAV. [ http://bioinf.comav.upv.es/ngs_backbone/index.html ]Broad institute. [ http://www.broadinstitute.org/igv ]Bioinformatics at COMAV. [ http://bioinf.comav.upv.es/ngs_backbone/install.html ]Github social coding. [ http://github.com/JoseBlanca/franklin ]Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17 (12): 1093-1104. 10.1093/bioinformatics/17.12.1093.Picard. [ http://picard.sourceforge.net/index.shtml ]McKenna A, Hanna M, Banks E, Sivachenko A, Citulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010, 20: 1297-1303. 10.1101/gr.107524.110.Sol Genomics Network. [ ftp://ftp.solgenomics.net/ ]NCBI Genbank. [ http://www.ncbi.nlm.nih.gov/genbank/ ]Gundry CN, Vandersteen JG, Reed GH, Pryor RJ, Chen J, Wittwer CT: Amplicon melting analysis with labeled primers: A closed-tube method for differentiating homozygotes and heterozygotes. Clinical Chemistry. 2003, 49 (3): 396-406. 10.1373/49.3.396

    Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

    Get PDF
    Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (Me) for the adjustment of multiple testing, but current methods of calculation for Me are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate Me. Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the Me, and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10−7 as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10−8 for current or merged commercial genotyping arrays, ~10−8 for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10−8 for the common SNPs only within genes

    Measuring, in solution, multiple-fluorophore labeling by combining Fluorescence Correlation Spectroscopy and photobleaching

    Get PDF
    Determining the number of fluorescent entities that are coupled to a given molecule (DNA, protein, etc.) is a key point of numerous biological studies, especially those based on a single molecule approach. Reliable methods are important, in this context, not only to characterize the labeling process, but also to quantify interactions, for instance within molecular complexes. We combined Fluorescence Correlation Spectroscopy (FCS) and photobleaching experiments to measure the effective number of molecules and the molecular brightness as a function of the total fluorescence count rate on solutions of cDNA (containing a few percent of C bases labeled with Alexa Fluor 647). Here, photobleaching is used as a control parameter to vary the experimental outputs (brightness and number of molecules). Assuming a Poissonian distribution of the number of fluorescent labels per cDNA, the FCS-photobleaching data could be easily fit to yield the mean number of fluorescent labels per cDNA strand (@ 2). This number could not be determined solely on the basis of the cDNA brightness, because of both the statistical distribution of the number of fluorescent labels and their unknown brightness when incorporated in cDNA. The statistical distribution of the number of fluorophores labeling cDNA was confirmed by analyzing the photon count distribution (with the cumulant method), which showed clearly that the brightness of cDNA strands varies from one molecule to the other.Comment: 38 pages (avec les figures
    corecore