1,669 research outputs found

    Assessment of Next Generation Sequencing Technologies for \u3ci\u3eDe novo\u3c/i\u3e and Hybrid Assemblies of Challenging Bacterial Genomes

    Get PDF
    In past decade, tremendous progress has been made in DNA sequencing methodologies in terms of throughput, speed, read-lengths, along with a sharp decrease in per base cost. These technologies, commonly referred to as next-generation sequencing (NGS) are complimented by the development of hybrid assembly approaches which can utilize multiple NGS platforms. In the first part of my dissertation I performed systematic evaluations and optimizations of nine de novo and hybrid assembly protocols across four novel microbial genomes. While each had strengths and weaknesses, via optimization using multiple strategies I obtained dramatic improvements in overall assembly size and quality. To select the best assembly, I also proposed the novel rDNA operon validation approach to evaluate assembly accuracy. Additionally, I investigated the ability of third-generation PacBio sequencing platform and achieved automated finishing of Clostridium autoethanogenum without any accessory data. These complete genome sequences facilitated comparisons which revealed rDNA operons as a major limitation for short read technologies, and also enabled comparative and functional genomics analysis. To facilitate future assessment and algorithms developments of NGS technologies we publically released the sequence datasets for C. autoethanogenum which span three generations of sequencing technologies, containing six types of data from four NGS platforms. To assess limitations of NGS technologies, assessment of unassembled regions within Illumina and PacBio assemblies was performed using eight microbial genomes. This analysis confirmed rDNA operons as major breakpoints within Illumina assembly while gaps within PacBio assembly appears to be an unaccounted for event and assembly quality is cumulative effect of read-depth, read-quality, sample DNA quality and presence of phage DNA or mobile genetic elements. In a final collaborative study an enrichment protocol was applied for isolation of live endophytic bacteria from roots of the tree Populus deltoides. This protocol achieved a significant reduction in contaminating plant DNA and enabled use these samples for single-cell genomics analysis for the first time. Whole genome sequencing of selected single-cell genomes was performed, assembly and contamination removal optimized, and followed by the bioinformatics, phylogenetic and comparative genomics analyses to identify unique characteristics of these uncultured microorganisms

    Die Rolle der Cyclin-abhÀngigen Kinase 18 (CDK18) im klarzelligen Nierenzellkarzinom

    Get PDF
    The most common and one of the more aggressive types of kidney cancer is clear cell renal cell carcinoma. It is characterized by sporadic occurrence, poor prognosis, and high resistance to therapies, which necessitates the discovery of new biomarkers for improving diagnostics and prognostics. The aim of the study was to identify a panel of genes whose mRNA is strongly upregulated in clear cell carcinoma tissue, in order to develop a qPCR detection assay based on their potential differential expression in the blood of cancer patients compared to healthy individuals. A further aim was to functionally characterize a novel gene in cell lines representing this cancer. The construction of the gene panel was performed by a bioinformatic analysis of several databases containing tissue (tumor and normal) and blood expression (from healthy individuals) of all genes in the genome. The presence of selected genes was tested in tissue and blood of patients and healthy individuals by RT-qPCR. CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) /Cas9 system enabled the generation of stable knockout clones for a loss-of-function analysis, and RNA sequencing allowed for the global transcriptome analysis of the knockout condition, revealing the possible mechanism of action of the investigated gene. A ranked list of genes overexpressed in clear cell carcinoma tissue compared to adjacent normal kidney tissue was produced, among them CDK18 (cyclin-dependent kinase 18), CCND1 and LOX. Two genes, CDK18 and CCND1 were underexpressed in the blood of clear cell carcinoma patients, and LOX showed a tendency towards upregulation in metastatic compared to non-metastatic blood samples. CDK18 knockout in two renal cancer cell lines led to a reduced proliferation rate, possibly via effects on WDR77 and SOAT1, the former being downregulated, and the second showing a tendency towards downregulation in the knockout condition.Die hĂ€ufigste und eine der aggressiveren Formen von Nierentumoren ist das klarzellige Nierenzellkarzinom. Charakteristika sind sporadisches Auftreten, schlechte Prognose und hohe Therapieresistenz, und deswegen ist die Entdeckung neuer Biomarker zur Verbesserung von Diagnostik und Prognose erforderlich. Das Ziel der Studie war, eine Gruppe von Genen zu identifizieren, deren mRNA im klarzelligen Nierenzellkarzinom stark hochreguliert ist, um einen qPCR-Assay zu entwickeln, der auf ihrer potenziellen differenziellen Expression im Blut von Krebspatienten im Vergleich zu gesunden Personen basiert. Ferner sollte, ein neues Gen in Zelllinien, die diesen Tumor reprĂ€sentieren, funktionell charakterisiert werden. Methoden: Die Gruppe von Genen wurde durch bioinformatische Analyse mehrerer Datenbanken, die die Gewebe- und Blutexpression aller Gene enthalten, herausgefiltert. Die Expression von ausgewĂ€hlten Genen wurde in Gewebe und Blut von Patienten und gesunden Personen durch RT-qPCR bestimmt. Das CRISPR/Cas9-System ermöglichte die Erzeugung von stabilen Knockout-Klonen für die Funktionsverlustanalyse, und die RNA-Sequenzierung ermöglichte die globale Transkriptomanalyse des Knockout-Zustands und die Aufdeckung des möglichen Wirkmechanismus des untersuchten Gens. Ergebnisse: Eine Rangliste von Genen, die in klarzelligem Nierenzellkarzinomgewebe im Vergleich zu benachbartem normalem Nierengewebe überexprimiert sind, wurde erstellt, darunter CDK18, CCND1 und LOX. Zwei Gene, CDK18 und CCND1, waren im Blut von Klarzellkarzinompatienten vermindert exprimiert, und LOX zeigt eine Tendenz zur Hochregulation bei metastatischen im Vergleich zu nicht-metastatischen Blutproben. CDK18 Knockout in zwei Nierenkrebszelllinien führte zu einer reduzierten Proliferationsrate, möglicherweise durch Effekte auf WDR77 und SOAT1, wobei das erste herunterreguliert war und das zweite eine Tendenz zur Herunterregulierung im Knockout-Zustand zeigte. Schlussfolgerungen: Diese Studie veranschaulicht die Schwierigkeit, tumorspezifische mRNAs im Blut nachzuweisen, und zeigte paradoxerweise eine verminderte Expression von zwei Genen im Blut von Klarzellkarzinompatienten entgegen der Überexpression im Gewebe. Die Studie konnte den Einfluss von CDK18 auf die Tumorzellproliferation belegen und einen möglichen Mechanismus dafür aufzeigen, der noch nĂ€her erforscht werden sollte

    Targeted Computational Approaches for Mining Functional Elements in Metagenomes

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics, 2012Metagenomics enables the genomic study of uncultured microorganisms by directly extracting the genetic material from microbial communities for sequencing. Fueled by the rapid development of Next Generation Sequencing (NGS) technology, metagenomics research has been revolutionizing the field of microbiology, revealing the taxonomic and functional composition of many microbial communities and their impacts on almost every aspect of life on Earth. Analyzing metagenomes (a metagenome is the collection of genomic sequences of an entire microbial community) is challenging: metagenomic sequences are often extremely short and therefore lack genomic contexts needed for annotating functional elements, while whole-metagenome assemblies are often poor because a metagenomic dataset contains reads from many different species. Novel computational approaches are still needed to get the most out of the metagenomes. In this dissertation, I first developed a binning algorithm (AbundanceBin) for clustering metagenomic sequences into groups, each containing sequences from species of similar abundances. AbundanceBin provides accurate estimations of the abundances of the species in a microbial community and their genome sizes. Application of AbundanceBin prior to assembly results in better assemblies of metagenomes--an outcome crucial to downstream analyses of metagenomic datasets. In addition, I designed three targeted computational approaches for assembling and annotating protein coding genes and other functional elements from metagenomic sequences. GeneStitch is an approach for gene assembly by connecting gene fragments scattered in different contigs into longer genes with the guidance of reference genes. I also developed two specialized assembly methods: the targeted-assembly method for assembling CRISPRs (Clustered Regularly Interspersed Short Palindromic Repeats), and the constrained-assembly method for retrieving chromosomal integrons. Applications of these methods to the Human Microbiome Project (HMP) datasets show that human microbiomes are extremely dynamic, reflecting the interactions between community members (including bacteria and viruses)

    Genome-scale analysis identifies paralog lethality as a vulnerability of chromosome 1p loss in cancer.

    Get PDF
    Functional redundancy shared by paralog genes may afford protection against genetic perturbations, but it can also result in genetic vulnerabilities due to mutual interdependency1-5. Here, we surveyed genome-scale short hairpin RNA and CRISPR screening data on hundreds of cancer cell lines and identified MAGOH and MAGOHB, core members of the splicing-dependent exon junction complex, as top-ranked paralog dependencies6-8. MAGOHB is the top gene dependency in cells with hemizygous MAGOH deletion, a pervasive genetic event that frequently occurs due to chromosome 1p loss. Inhibition of MAGOHB in a MAGOH-deleted context compromises viability by globally perturbing alternative splicing and RNA surveillance. Dependency on IPO13, an importin-ÎČ receptor that mediates nuclear import of the MAGOH/B-Y14 heterodimer9, is highly correlated with dependency on both MAGOH and MAGOHB. Both MAGOHB and IPO13 represent dependencies in murine xenografts with hemizygous MAGOH deletion. Our results identify MAGOH and MAGOHB as reciprocal paralog dependencies across cancer types and suggest a rationale for targeting the MAGOHB-IPO13 axis in cancers with chromosome 1p deletion

    A gap-free genome assembly of Chlamydomonas reinhardtii and detection of translocations induced by CRISPR-mediated mutagenesis

    Get PDF
    Genomic assemblies of the unicellular green alga Chlamydomonas reinhardtii have provided important resources for researchers. However, assembly errors, large gaps, and unplaced scaffolds as well as strain-specific variants currently impede many types of analysis. By combining PacBio HiFi and Oxford Nanopore long-read technologies, we generated a de novo genome assembly for strain CC-5816, derived from crosses of strains CC-125 and CC-124. Multiple methods of evaluating genome completeness and base-pair error rate suggest that the final telomere-to-telomere assembly is highly accurate. The CC-5816 assembly enabled previously difficult analyses that include characterization of the 17 centromeres, rDNA arrays on three chromosomes, and 56 insertions of organellar DNA into the nuclear genome. Using Nanopore sequencing, we identified sites of cytosine (CpG) methylation, which are enriched at centromeres. We analyzed CRISPR-Cas9 insertional mutants in the PF23 gene. Two of the three alleles produced progeny that displayed patterns of meiotic inviability that suggested the presence of a chromosomal aberration. Mapping Nanopore reads from pf23-2 and pf23-3 onto the CC-5816 genome showed that these two strains each carry a translocation that was initiated at the PF23 gene locus on chromosome 11 and joined with chromosomes 5 or 3, respectively. The translocations were verified by demonstrating linkage between loci on the two translocated chromosomes in meiotic progeny. The three pf23 alleles display the expected short-cilia phenotype, and immunoblotting showed that pf23-2 lacks the PF23 protein. Our CC-5816 genome assembly will undoubtedly provide an important tool for the Chlamydomonas research community

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    Isolation and characterization of bacteriophages with therapeutic potential

    Get PDF

    Complete genome sequence of Meiothermus silvanus type strain (VI-R2).

    Get PDF
    Meiothermus silvanus (Tenreiro et al. 1995) Nobre et al. 1996 belongs to a thermophilic genus whose members share relatively low degrees of 16S rRNA gene sequence similarity. Meiothermus constitutes an evolutionary lineage separate from members of the genus Thermus, from which they can generally be distinguished by their slightly lower temperature optima. M. silvanus is of special interest as it causes colored biofilms in the paper making industry and may thus be of economic importance as a biofouler. This is the second completed genome sequence of a member of the genus Meiothermus and only the third genome sequence to be published from a member of the family Thermaceae. The 3,721,669 bp long genome with its 3,667 protein-coding and 55 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project
    • 

    corecore