43 research outputs found

    A new implementation of high-throughput five-dimensional clone pooling strategy for BAC library screening

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A five-dimensional (5-D) clone pooling strategy for screening of bacterial artificial chromosome (BAC) clones with molecular markers utilizing highly-parallel Illumina GoldenGate assays and PCR facilitates high-throughput BAC clone and BAC contig anchoring on a genetic map. However, this strategy occasionally needs manual PCR to deconvolute pools and identify truly positive clones.</p> <p>Results</p> <p>A new implementation is reported here for our previously reported clone pooling strategy. Row and column pools of BAC clones are divided into sub-pools with 1~2× genome coverage. All BAC pools are screened with Illumina's GoldenGate assay and the BAC pools are deconvoluted to identify individual positive clones. Putative positive BAC clones are then further analyzed to find positive clones on the basis of them being neighbours in a contig. An exhaustive search or brute force algorithm was designed for this deconvolution and integrated into a newly developed software tool, FPCBrowser, for analyzing clone pooling data. This algorithm was used with empirical data for 55 Illumina GoldenGate SNP assays detecting SNP markers mapped on <it>Aegilops tauschii </it>chromosome 2D and <it>Ae. tauschii </it>contig maps. Clones in single contigs were successfully assigned to 48 (87%) specific SNP markers on the map with 91% precision.</p> <p>Conclusion</p> <p>A new implementation of 5-D BAC clone pooling strategy employing both GoldenGate assay screening and assembled BAC contigs is shown here to be a high-throughput, low cost, rapid, and feasible approach to screening BAC libraries and anchoring BAC clones and contigs on genetic maps. The software FPCBrowser with the integrated clone deconvolution algorithm has been developed and is downloadable at <url>http://avena.pw.usda.gov/wheatD/fpcbrowser.shtml</url>.</p

    The Sequence of the Human Genome

    Get PDF
    A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective cov- erage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additiona

    Clonal structures and cell interactions in cancer

    Get PDF
    Despite sharing an identical genome, cells of higher order multicellular organisms display a large degree of phenotypic diversity. This diversity is maintained by a sophisticated regulatory machinery that integrates information from both intrinsic and extrinsic factors, ultimately coordinating the appropriate gene expression. Sequencing methods such as RNA and DNA sequencing have become indispensable tools in the pursuit to understand gene regulation. In recent years, the integration of single-cell sequencing techniques and CRISPR-based methods has ushered in a new era of genomic exploration, providing unprecedented opportunities to investigate the intricate interplay between genes, cellular processes, and disease progression. These cutting-edge advances have transformed the research landscape, enabling in-depth studies of gene regulation in single cells, and paving the way for future discoveries in both healthy and malignant tissues. While cancer has traditionally been studied as a genetic disease, it is now evident that mutations alone do not determine cancer initiation or progression. This notion is supported by two key observations: first, cancer-driving mutations do not always lead to malignancy; and second, identical mutations can yield different outcomes depending on the cell type in which they occur. Consequently, a deeper understanding of gene regulation and the various ways it is modulated is critical for deciphering the complex relationship between genetic changes and cancer initiation. In this thesis we aimed to develop novel single-cell methodologies applicable to studying biological complex systems. We have developed four techniques: CIM-seq, DNTR-seq, Smart3-ATAC, and ACTIseq, described in papers I-IV, respectively. The methods all capture additional modalities in combination with single-cell RNA-seq data, including spatial information, whole genome sequencing, accessible chromatin, and direct read out of guide RNAs. We applied these methods to investigate biological systems at the single-cell level, offering a more comprehensive understanding of cellular behavior in health and disease. Our approaches have allowed us to characterize stem cell niches and regeneration dynamics in the epithelial layer of the colon, and delve into the effects of gene dosage, quantifying how mutational changes impact transcriptional output. Furthermore, we have explored the complex landscape of gene regulation within pancreatic ductal adenocarcinomas, identifying mechanisms that enable cancer growth and proliferation. This body of work emphasizes the importance of multimodal and integrative approaches for unraveling the complexities of biological systems at a cellular level. The methods we've developed represent a significant step forward, promising to facilitate the discovery of molecular targets for cancer therapeutics

    G-Quadruplex (G4) Motifs in the Maize (Zea mays L.) Genome Are Enriched at Specific Locations in Thousands of Genes Coupled to Energy Status, Hypoxia, Low Sugar, and Nutrient Deprivation

    Get PDF
    The G-quadruplex (G4) elements comprise a class of nucleic acid structures formed by stacking of guanine base quartets in a quadruple helix. This G4 DNA can form within or across single-stranded DNA molecules and is mutually exclusive with duplex B-form DNA. The reversibility and structural diversity of G4s make them highly versatile genetic structures, as demonstrated by their roles in various functions including telomere metabolism, genome maintenance, immunoglobulin gene diversification, transcription, and translation. Sequence motifs capable of forming G4 DNA are typically located in telomere repeat DNA and other non-telomeric genomic loci. To investigate their potential roles in a large-genome model plant species, we computationally identified 149,988 non-telomeric G4 motifs in maize (Zea mays L., B73 AGPv2), 29% of which were in non-repetitive genomic regions. G4 motif hotspots exhibited non-random enrichment in genes at two locations on the antisense strand, one in the 5′ UTR and the other at the 5′ end of the first intron. Several genic G4 motifs were shown to adopt sequence-specific and potassium-dependent G4 DNA structures in vitro. The G4 motifs were prevalent in key regulatory genes associated with hypoxia (group VII ERFs), oxidative stress (DJ-1/GATase1), and energy status (AMPK/SnRK) pathways. They also showed statistical enrichment for genes in metabolic pathways that function in glycolysis, sugar degradation, inositol metabolism, and base excision repair. Collectively, the maize G4 motifs may represent conditional regulatory elements that can aid in energy status gene responses. Such a network of elements could provide a mechanistic basis for linking energy status signals to gene regulation in maize, a model genetic system and major world crop species for feed, food, and fuel

    Rewriting the genome of Escherichia coli

    Get PDF
    Our recently acquired ability to synthesize DNA at large scale is opening the door to writing entire genomes; this constitutes a powerful approach to address fundamental biological questions, and may enable the creation of designer organisms with useful properties. One interesting avenue for investigation is the creation of recoded genomes, where codons are substituted by their synonyms. Compression of synonymous codon boxes may provide blank spaces in the quasi-universal genetic code, and these may be amenable for reassignment into unnatural amino acids, in synergy with parallel efforts to engineer the protein translation machinery. Recoding genomes is subject to both biological and technical challenges. First, synonymous codon choice genome-wide is not trivial, and identifying suitable synonymous replacements is challenging. Second, synthetic DNA pieces need to be assembled into fragments of increasing size, and ultimately implemented inside a target host. Here, recently reported strategies for genome engineering in E. coli (REXER and GENESIS) are extended and used to create a synthetic, recoded E. coli genome. Chapter 2 describes a strategy for assembling large natural genomic DNA pieces into BACs that are substrates for genome replacement. Experiments with these BACs serve to validate and improve REXER and GENESIS, and lay out a strategy for genome replacement. Chapter 3 employs these strategies for the synthesis and assembly of a recoded E. coli genome where all annotated instances of serine codons TCG and TCA, and stop codon TAG, are systematically replaced by their synonyms. The resulting strain, Syn61, provides a unique platform for exploring sense codon reassignment in vivo, and reassignment of the TCG codon to both natural and unnatural amino acids is demonstrated. Finally, Chapter 4 extends the toolkit for genome engineering in E. coli, and provides technologies for splitting the genome into pairs of chromosomes, as well as performing precise inversions and translocations. These technologies are used to precisely combine synthetic sections from distinct strains into a single genome. Together, these technologies may provide a foundation for future genome synthesis endeavours

    Chromatin Structure and Differential Accessibility of Homologous Human Mitotic Metaphase Chromosomes

    Get PDF
    The human mitotic metaphase chromosome is a product of complex chromatin restructuring during interphase. Metaphase chromosomes exhibit considerable plasticity in condensation. This is evident as distinct regions of accessible and compact chromatin fiber or epigenetic differences in histone and non-histone proteins. Such differences in chromatin condensation have been extensively described along the length of individual mitotic chromosomes but have not been recognized between homologous loci during metaphase. This thesis characterizes localized differences in condensation of homologous metaphase chromosomes that are related to differences in accessibility (DA) of associated DNA probe targets. Reproducible DA was observed for ~10% of locus-specific, short (1.5-5 kb) single copy (SC) DNA probes used in fluorescence in situ hybridization. To investigate the physical and structural organization of chromatin at locus-specific sites, we developed correlated atomic force and fluorescence microscopy imaging. Comparison of centromeric DNA and protein distribution patterns in fixed homologous chromosomes indicated that CENP-B and α-satellite DNA were distributed distinctly from one another and relative to observed centromeric ridge topography. At non-centromeic locations, short DNA probes that did not exhibit DA showed greater accessibility to the accessible chromatin topography on both homologs. Localized differential accessibility between chromosome homologs in metaphase was non-random and reproducible but not unique to known imprinted regions or specific chromosomes. Second, non-random DA was shown to be heritable within a 2 generation family. Third, DNA probe volume and depth measurements of hybridized metaphase chromosomes showed internal differences in chromatin accessibility of homologous regions by super-resolution 3D-structured illumination microscopy. Finally, genomic regions with equivalent accessibility were enriched for epigenetic marks of open interphase chromatin to a greater extent than regions with DA, suggesting that observed structural differences in accessibility may arise during or preceding metaphase chromosome compaction. Inhibition of the topoisomerase IIα-DNA cleavage complex mitigated DA by decreasing DNA superhelicity and axial metaphase chromosome condensation. Inter-homolog probe intensity ratios, depth, and volume between chromosomes treated with a catalytic inhibitor of topoisomerase IIα, were equalized compared to untreated cells. These data altogether suggest that DA is a reflection of allelic differences in metaphase chromosome compaction, dictated by the catenation state of the chromosome

    Tilbake til det grunnleggende : forenkling av mikrobielle samfunn for å tolke komplekse interaksjoner

    Get PDF
    Microbes are everywhere and contribute to many essential processes relevant for planet Earth, ranging from biogeochemical cycles to complex human behavior. The means to achieve these colossal tasks for such small and, at first glance, simple organisms rely on their ability to assemble in heterogeneous communities in which populations with different taxonomies and functions coexist and complement each other. Some microbes are of particular interest for human civilization and have long been used for everyday tasks, such as the production of bread and wine. More recently, large-scale industrial and civil projects have taken advantage of the transformative capabilities of microbial communities, with key examples being biogas reactors, mining and wastewater treatment. Decades of classical microbiology, based on pure culture isolates and their physiological characterization, have built the foundations of modern microbial ecology. Molecular analysis of microbes and microbial communities has generated an understanding that for many microbial populations cultivation is hard to achieve and that breaking a community apart impacts its function. These limitations have driven the development of technical tools that bring us directly in contact with communities in their natural environment. In the mid 2000’s the recently established “omics” techniques were quickly adapted to their “meta-omics” version, enabling direct analysis of the microbial samples without culture. Every class of molecules (DNA, RNA, protein, metabolite, etc.) can now theoretically be analyzed from the entire community within a given sample. Metagenomics uses community DNA to build the phylogenetic picture and the genetic potential, whereas metatranscriptomics and metaproteomics employ RNA and proteins respectively to inquire the gene expression of the community. Finally, meta-metabolomics can close the loop and describe the metabolic activity of the microbes. Here, we combined the four aforementioned major meta-omics disciplines in a gene- and population-centric perspective to re-iterate the same Aristotelian question underlying microbial ecology: how is it possible that the whole is more than the sum of its parts? Along the detailed answers provided by the individual communities in various environments, we also tried to learn something about biology itself. We first addressed in a saccharolytic and methane-producing minimalistic consortium (SEM1b), the strain-specific interplay engaged in (hemi)cellulose degradation, explaining the ubiquity of Coprothermobacter proteolyticus in biogas reactors. We showed through the genetic potential of the C. proteolyticus-affiliated COPR1 population, the putative acquisition via horizontal gene transfer of a gene cassette for hemicellulose degradation. Moreover, we showed how the gene expression of these COPR1 genes were both coherent with the release of hemicellulose by another population of the community (RCLO1) and synced with the gene expression of the orthologous genes of an already known hemicellulolytic population (CLOS1). Conclusively, we demonstrated how the same purified COPR1 protein (Glycosyl Hydrolases 16) showed endoglucanase activity on several hemicellulose substrates. Secondly, we explored the combined application of absolute omics-based quantification of RNA and proteins using SEM1b as a benchmark community, due to its lower complexity (less than 12 populations) and relatively resolved biology. We subsequently demonstrated that the uncultured bacterial populations in SEM1b followed the expected protein-to-RNA ratio (102-104) of previously analyzed cultured bacteria in exponential phase. In contrast, an archaeon population from SEM1b showed values in the range 103-105, the same as what has been reported for eukaryotes (yeast and human) in the literature. In addition, we modeled the linearity (k) between genome-centric transcriptomes and proteomes over time and used it to predict the essential metabolic populations of the SEM1b community through converging and parallel k-trends, which was subsequently confirmed via classical pathway analysis. Finally, we estimated the translation and the protein degradation rates, coming to the conclusion that some of the processes in the cell that require a rapid tuning (e.g. metabolism and motility) are regulated (also) post-transcriptionally. Thirdly we sought to apply our approach of collapsing complex datasets into simplistic metrics in order to identify underlying community trends, onto a more complex and “real-world” microbiome. To do this, we resolved more than one year of weekly sampling from a lipid-accumulating community (Shif-LAO) that inhabits a wastewater treatment in Shifflange (Luxembourg), and showed an extreme genetic redundancy and turnover in contrast to a more conservative trend in functions. Moreover, we demonstrated how the time patterns (e.g. seasonality) in both gene count and gene expression are linked with the physico-chemical parameters associated with the corresponding samples. Furthermore, we built the static reaction network underlying the whole community over the complete dataset (51 temporal samples). From this, we characterized the sub-network for lipid accumulation, and showed that its more expressed nodes were defined by resource competition between different taxa (deduced via inverse taxonomic richness and gene expression over time). In contrast, the nitrogen metabolism sub-network instead exhibited a dominant taxon and a keystone ammonia oxidizing monooxygenase, the first enzyme of ammonia oxidation, which may lead to the production of nitrous gas (a powerful greenhouse gas). Overall, our results presented in this thesis build a comprehensive repertoire of interactions in microbial communities ranging from a simplistic (10’s of populations) consortium to a natural complex microbiome (100’s of populations). These were ultimately uncovered using an array of techniques, including unsupervised gene expression clustering, pathway analysis, reaction networks, co-expression networks, eigengenes and linearity trends between transcriptome and proteome. Moreover, we learnt that to achieve a full understanding of microbial ecology and detailed interactions, we need to integrate all the meta-omics layers quantified with absolute measurements. However, when scaling these approaches to real-world communities the massive amounts of generated data brings new challenges and necessitates simplifying strategies to reduce complexity and extrapolate ecological trends.Mikroorganismer er overalt og de bidrar til mange essensielle prosesser som er viktige for planeten vår, alt fra biokjemiske sykluser til kompleks menneskelig oppførsel. Midlene disse små, og ved første øyekast enkle organismene bruker for å oppnå så betydelige oppgaver på, ligger i deres evne til å forenes i et heterogent samfunn der ulike populasjoner med en forskjellig taksonomi og funksjoner sameksisterer og utfyller hverandre. Noen mikrobielle samfunn er av særlig interesse for oss mennesker, og har i lang tid blitt utnyttet i hverdagslige gjøremål, slik som produksjon av brød og vin. I senere tid har også stor-skala industri og kommunale anlegg, for eksempel biogass reaktorer og renseanlegg, dratt nytte av mikrobesamfunns evne til å transformere. Tiår med klassisk mikrobiologi, basert på dyrking og fysiologisk karakterisering av renkulturer har bygget grunnlaget for moderne mikrobiell økologi. Molekylære analyser av mikrober og mikrobielle samfunn har resultert i forståelsen om at mange mikrobielle populasjoner er vanskelige å kultivere, og at en oppdeling av samfunnet vil påvirke dens funksjoner. Disse begrensningene har vært en drivkraft for utviklingen av tekniske verktøy som kan bringe oss i direkte kontakt med mikrobesamfunnet i deres naturlige miljø. I midten av 2000-talles ble de nylig etablerte «omikk»-teknikkene raskt adoptert til også å gjelde «meta-omikk», som muliggjør direkte analysering av mikrobielle samfunn uten kultivering. I dag kan i teorien hver molekylerære klasse (DNA, RNA, proteiner, metabolitter, osv.) bli analysert fra hele mikrobesamfunn i en bestemt prøve. I metagenomikk benyttes DNA-innholdet til å konstruere et fylogenetisk bilde av samfunnet og det genetiske potensiale, mens metatranskriptomikk og metaproteomikk bruker henholdsvis RNA og proteiner for å se på gen-uttrykket i samfunnet. Meta-metabolomikk kan slutte sirkelen ved å beskrive den metabolske aktiviteten til mikrobene. I arbeidet som ligger til grunn for denne avhandlingen, kombinerte vi fire av de nevnte fagfeltene innen meta-omikk i et gen- og populasjons-orientert perspektiv for å gjenta det samme Aristoteliske spørsmålet bak mikrobiell økologi: hvordan er det mulig at helheten er større enn summen av enkeltdelene? Sammen med de detaljerte svarene som ble gitt av de enkelte mikrobesamfunnene i ulike miljøer, forsøkte vi også å lære noe om biologi i seg selv. Først adresserte vi det stamme-spesifikke samspillet involvert i (hemi)cellulose degradering i et sakkarolytisk og metan-produserende minimalistisk konsortium (SEM1b), som belyser omfanget av Coprothermobacter proteolyticus i biogass reaktorer. Gjennom det genetiske potensiale til COPR1-populasjonen tilknyttet C. proteolyticus, viste vi den antatte ervervelsen, via horisontal gen-overføring, av en gen-kassett for nedbrytning av hemicellulose. Videre viste vi hvordan genuttrykket til disse COPR1-genene var i samsvar med frigivelsen av hemicellulose av en annen populasjon i samfunnet (RCLO1), og synkronisert med genuttrykket av de ortologe genene fra en allerede kjent hemicellulolytisk populasjon (CLOS1). Avslutningsvis demonstrerte vi hvordan det samme rensede COPR1-proteinet (glykosid-hydrolase 16) viste endoglukanase-aktivitet på flere hemicellulosesubstrater. På grunn av lavere kompleksitet (færre enn 12 populasjoner) og en relativt kjent biologi, benytte vi SEM1b videre som et referansesamfunn for å utforske den kombinerte anvendelsen av absolutt omikk-basert kvantifisering av RNA og proteiner. Vi demonstrerte deretter at de ukultiverte bakterie-populasjonene i SEM1b fulgte en protein-til-RNA ratio (102-104) som var forventet basert på tidligere analyser av bakteriekulturer i eksponentiell fase. I kontrast til dette viste en arkeonpopulasjon fra SEM1b verdier i området mellom 103-105, som er det samme som tidligere rapportert i litteraturen for eukaryote (gjær og menneske). I tillegg modellerte vi lineariteten (k) mellom genom-orienterte transkriptomer og proteomer over tid, og brukte dette til å forutsi de essensielle metabolsk populasjon i SEM1b-samfunnet gjennom konvergerende og parallelle k-trender, som senere ble bekreftet via klassiske analyser av metabolske synteseveier. Til slutt estimerte vi frekvensen av translasjon og protein degradering, hvorpå vi konkluderte med at noen av prosessene i en celle som krever rask innstilling (som for eksempel metabolisme og bevegelse) er regulert (også) post- transkripsjonelt. Til slutt ønsket vi å anvende vår tilnærming for å sette komplekse datasett inn i forenklede matriser for å identifisere underliggende trender i mikrosamfunnet, på et mer komplekst og virkelighetsnært mikrobiom. Til dette benyttet vi et mer enn ett år med ukentlige prøvetakninger fra en lipid-akkumulerende mikrobesamfunn (Shif-LAO) i et renseanlegg i Shifflange (Luxembourg), og avdekket en ekstrem genetisk redundans og turnover, i motsetning til en mer konservativ trend i funksjoner. Videre demonstrerte vi hvordan tidsavhengige mønstre (som for eksempel sesongvariasjoner) i både antall gener og genuttrykk er knyttet til fysisk-kjemiske parameter assosiert med de tilsvarende prøvene. I tillegg rekonstruerte vi det underliggende statiske reaksjonsnettverket til mikrobesamfunnet over hele datasettet (51 prøver over tid). Basert på dette, karakteriserte vi sub-nettverk for lipid-akkumulering, og demonstrerte at mer uttrykte noder var definert av konkurransen om ressurser mellom ulike taksonomiske grupper (antatt via reversert taksonomisk diversitet og genuttrykk over tid). I motsetning til dette, viste nettverket for nitrogen-metabolismen i stedet et dominerende taxon og en keystone ammoniakk-oksiderende monooxygenase, det første enzymet i ammoniakk oksidasjon, som fører til produksjonen av lystgass (en svært sterk klimagass). Resultatene presentert i denne doktorgradsavhandlingen bygger på et omfattende repertoar av interaksjoner i mikrobielle samfunn som spenner fra et forenklet konsortium (titalls populasjoner) til et naturlig komplekst mikrobiom (hundretalls populasjoner). Disse mikrobiomene ble til slutt kartlagt ved hjelp av en rekke teknikker, blant annet unsupervised gruppering av genutrykk, analyser av metabolisk synteseveier, nettverk av reaksjoner og co-uttrykte gener, eigengener og lineære trender mellom transkriptom og proteom. I tillegg erfarte vi at for å oppnå en full forståelse av mikrobiell økologi og detaljerte interaksjoner må vi integrere alle lagene av meta-omikk, kvantifisert med absolutte målinger. Når man oppskalering disse tilnærmingen til virkelige mikrobesamfunn, bringer imidlertid enorme mengder generert data til nye utfordringer som nødvendiggjør en forenkling av strategier for å redusere kompleksiteten og ekstrapolerer økologiske trender

    Biodiversity and assembly processes of soil fungal communities in Chinese subtropical forests with variable tree diversity

    Get PDF
    In der vorliegenden Dissertation wurde der Einfluss der Baumdiversität auf die Bodenpilzgemeinschaft in Chinesischen subtropischen experimentellen Waldflächen untersucht
    corecore