89 research outputs found

    Analisi della struttura genomica di <i>Arabidopsis thaliana</i> L.

    Get PDF
    The present thesis provides an insight on the genomic structure of plant species, by taking in exam the model organism Arabidopsis thaliana. The research activity pointed towards three directions. Initially the correlation between the expression profile and some structural properties such as the sequence length and the GC (guanine + cytosine) content was studied. The results revealed that in plants highly expressed genes undergo a selection for miniaturization which is probably due to the need to minimize the cost of the transcription/translation process. In a successive phase the usage of the synonymous codons (i.e. nucleotide triplettes which code for the same amino acid) was investigated within 15 Arabidopsis tissues. The results showed that genes specifically expressed in certain tissues use a definite set of codons, whereas more widely expressed genes feature a codon composition which is, at a certain extent, a compromise between the codons used in the single tissues. Finally the nucleotide composition as a function of the position in the gene was studied in two monocots and two dicots. For all the analyzed bases compositional gradients were revealed. The observed trends, mostly describable with a linear model, underlined marked difference between monocots and dicots

    GC3 biology in corn, rice, sorghum and other grasses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The third, or wobble, position in a codon provides a high degree of possible degeneracy and is an elegant fault-tolerance mechanism. Nucleotide biases between organisms at the wobble position have been documented and correlated with the abundances of the complementary tRNAs. We and others have noticed a bias for cytosine and guanine at the third position in a subset of transcripts within a single organism. The bias is present in some plant species and warm-blooded vertebrates but not in all plants, or in invertebrates or cold-blooded vertebrates.</p> <p>Results</p> <p>Here we demonstrate that in certain organisms the amount of GC at the wobble position (GC<sub>3</sub>) can be used to distinguish two classes of genes. We highlight the following features of genes with high GC<sub>3 </sub>content: they (1) provide more targets for methylation, (2) exhibit more variable expression, (3) more frequently possess upstream TATA boxes, (4) are predominant in certain classes of genes (e.g., stress responsive genes) and (5) have a GC<sub>3 </sub>content that increases from 5'to 3'. These observations led us to formulate a hypothesis to explain GC<sub>3 </sub>bimodality in grasses.</p> <p>Conclusions</p> <p>Our findings suggest that high levels of GC<sub>3 </sub>typify a class of genes whose expression is regulated through DNA methylation or are a legacy of accelerated evolution through gene conversion. We discuss the three most probable explanations for GC<sub>3 </sub>bimodality: biased gene conversion, transcriptional and translational advantage and gene methylation.</p

    De novo sequencing, annotation, and characterization of the genome of Lavandula angustifolia (Lavender)

    Get PDF
    Lavender (Lavandula angustifolia) is a perennial plant native to the Mediterranean region, best known for its essential oil (EOs) that have numerous applications in the pharmaceutical, cosmetic and perfume industries. We performed sequencing of the L. angustifolia genome and report a detailed analysis of the assembled genome, focusing on genome size, ploidy, and repeat content. The lavender genome was estimated to be around 870 Mbp (1C=0.96 pg) using a quantitative PCR method. Genome size was further validated through analysis of raw genome sequences using Kmergenie, providing a conclusive end to the lavender genome size dispute. The repeat element composition of the genome was analyzed using de novo (RepeatModeler) and library-based methods (RepeatMasker) and was estimated to be around 45% of the full genome or ~57% of the non-gap genome sequences. Further characterization revealed Long Terminal Repeat (LTRs) retrotransposons as the major repeat type, which contribute to ~18% of the genome, followed by DNA transposons at ~8.5% of the genome. Interestingly, unlike most other plant genomes, the lavender genome has many more Copia than Gypsy elements, both showing a trend of recent increasing activity. Furthermore, these LTRs, especially Copia elements, have shown active participation in gene function including genes for essential oil production, with Copia elements contributing to ~30 % of the coding DNA sequence (CDS) regions, in addition to promoter, intron and untranslated (UTR) regions. The lavender genome also has an unusually high number of miniature inverted-repeat transposable elements (MITEs) compared to other model plant genomes, with the number being ~88,000, which is close to that (~90,000) of the much larger maize genome. Analysis also revealed the lavender genome with a high proportion at polyploidy level, which is strongly biased towards regions containing essential oil genes, with polyploidization events in the lavender genome occurred between 16 to 41 Mya. In conclusion, our results reveal the lavender genome to be highly duplicated and with past and ongoing active retrotransposition, making the genome optimized for EO production

    Power in numbers : in silico analysis of multigene families in Arabidopsis thaliana

    Get PDF

    Comparative analysis of plant genomes through data integration

    Get PDF
    When we started our research in 2008, several online resources for genomics existed, each with a different focus. TAIR (The Arabidopsis Information Resource) has a focus on the plant model species Arabidopsis thaliana, with (at that time) little or no support for evolutionary or comparative genomics. Ensemble provided some basic tools and functions as a data warehouse, but it would only start incorporating plant genomes in 2010. There was no online resource at that time however, that provided the necessary data content and tools for plant comparative and evolutionary genomics that we required. As such, the plant community was missing an essential component to get their research at the same level as the biomedicine oriented research communities. We started to work on PLAZA in order to provide such a data resource that could be accessed by the plant community, and which also contained the necessary data content to help our research group’s focus on evolutionary genomics. The platform for comparative and evolutionary genomics, which we named PLAZA, was developed from scratch (i.e. not based on an existing database scheme, such as Ensemble). Gathering the data for all species, parsing this data into a common format and then uploading it into the database was the next step. We developed a processing pipeline, based on sequence similarity measurements, to group genes into gene families and sub families. Functional annotation was gathered through both the original data providers and through InterPro scans, combined with Interpro2GO. This primary data information was then ready to be used in every subsequent analysis. Building such a database was good enough for research within our bioinformatics group, but the target goal was to provide a comprehensive resource for all plant biologists with an interest in comparative and evolutionary genomics. Designing and creating a user-friendly, visually appealing web interface, connected to our database, was the next step. While the most detailed information is commonly presented in data tables, aesthetically pleasing graphics, images and charts are often used to visualize trends, general statistics and also used in specific tools. Design and development of these tools and visualizations is thus one of the core elements within my PhD. The PLAZA platform was designed as a gene-centric data resource, which is easily navigated when a biologist wants to study a relative small number of genes. However, using the default PLAZA website to retrieve information for dozens of genes quickly becomes very tedious. Therefore a ’gene set’-centric extra layer was developed where user-defined gene sets could be quickly analyzed. This extra layer, called the PLAZA workbench, functions on top of the normal PLAZA website, implicating that only gene sets from species present within the PLAZA database can be directly analyzed. The PLAZA resource for comparative and evolutionary genomics was a major success, but it still had several issues. We tried to solve at least two of these problems at the same time by creating a new platform. The first issue was the building procedure of PLAZA: adding a single species, or updating the structural annotation of an existing one, requires the total re-computation of the database content. The second issue was the restrictiveness of the PLAZA workbench: through a mapping procedure gene sets could be entered for species not present in the PLAZA database, but for species without a phylogenetic close relative this approach did not always yield satisfying results. Furthermore, the research in question might just focus on the difference between a species present in PLAZA and a close relative not present in PLAZA (e.g. to study adaptation to a different ecological niche). In such a case, the mapping procedure is in itself useless. With the advent of NGS transcriptome data sets for a growing number of species, it was clear that a next challenge had presented itself. We designed and developed a new platform, named TRAPID, which could automatically process entire transcriptome data sets, using a reference database. The target goal was to have the processing done quickly with the results containing both gene family oriented data (such as multiple sequence alignments and phylogenetic trees) and functional characterization of the transcripts. Major efforts went into designing the processing pipeline so it could be reliable, fast and accurate

    Évolution des chromosomes sexuels chez les plantes : développements méthodologiques et analyses de données NGS de Silènes

    Get PDF
    In many organisms, sexes are determined by sex chromosomes. However, studies have been greatly limited by the paucity of sex chromosome sequences. Indeed, sequencing and assembling sex chromosomes are very challenging due to the large quantity of repetitive DNA that these chromosomes comprise. In this PhD, a probabilistic method was developed to infer sex-linked genes from RNA-seq data of a family (parents and progeny of each sex). The method, called SEX-DETector, was tested on simulated and real data and should performwell on a wide variety of sex chomosome systems. This new method was applied to Silene latifolia, a dioecious plant with XY system, for which partial sequence data on sex chromosomes are available (some of which obtained during this PhD by BAC sequencing), SEX-DETector returned ∼1300 sex-linked genes. In S. latifolia, Y genes are less expressed than their X counterparts. Dosage compensation (a mechanism that corrects for reduced dosage due to Y degeneration in males) was previously tested in S. latifolia, but different studies returned conflicting results. The analysis of the new set of sex-linked genes confirmed the existence of dosage compensation in S. latifolia, which seems to be achieved by the hyperexpression of the maternal X chromosome in males. An imprinting mechanism might underlie dosage compensation in that species. The RNAseq datawere also used to study the evolution of differential expression among sexes in S. latifolia, and revealed that in this species most changes have affected the female sex. The implications of our results for the evolution of dioecy and sex chromosomes in plants are discussedMalgré leur importance dans le déterminisme du sexe chez de nombreux organismes, les chromosomes sexuels ont été étudiés chez quelques espèces seulement du fait du manque de séquences disponibles. En effet, le séquençage et l'assemblage des chromosomes sexuels est rendu très difficile par leurs abondantes séquences répétées. Durant cette thèse, une méthode probabiliste a été développée pour inférer les gènes liés au sexe à partir de données RNA-seq chez une famille. Des tests de cette méthode appelée SEX-DETector sur des données réelles et simulées suggèrent qu'elle fonctionnera sur une grande variété de systèmes. La méthode a inféré ∼1300 gènes liés au sexe chez Silene latifolia, une plante dioïque qui possède des chromosomes sexuels XY pour lesquels quelques données de séquence sont disponibles (dont certaines obtenues lors de cette thèse par séquençage de BACs). Les gènes du Y sont moins exprimés que ceux du X chez S. latifolia, mais le statut de la compensation de dosage (un mécanisme qui corrige la sous-expression des gènes liés au sexe chez les males) est encore controversé. L'analyse des nouveaux gènes liés au sexe inférés par SEX-DETector a permis de confirmer la compensation de dosage chez S. latifolia, qui est effectuée par la surexpression du X maternel, possiblement via un mécanisme epigénétique d'empreinte. Les données ont également été utilisées pour étudier l'évolution de l'expression biaisée pour le sexe chez S. latifolia et ont révélé que la majorité des changements de niveaux d'expression ont eu lieu chez les femelles. Les implications de nos résultats concernant l'évolution de la dioécie et des chromosomes sexuels sont discuté

    Generation and application of genomic tools as important prerequisites for sugar beet genome analyses.

    Get PDF
    Genetic and physical maps of a genome are essential tools for structural, functional and applied genomics. Genetic maps allow the detection of quantitative trait loci (QTLs), the characterisation of QTL effects and facilitate marker-assisted selection (MAS). The characterisation of genome structure and analysis of evolution is augmented by physical maps. Whole genome physical maps or ultimately complete genomic sequences, respectively, of a species display frameworks that provide essential information for understanding processes in respect to physiology, morphology, development and genetics. However, comprehensive annotation underpins the values a genome sequence or physical map represents. An important task of genome annotation is the linkage of genetic traits to the genome sequence, which is facilitated by integrated genetic and physical maps. In the context of this study several sugar beet (Beta vulgaris L.) genomic tools were developed and applied for evolutionary studies and linkage analysis. A new technique allowing high-throughput identification and genotyping of genetic markers was developed, utilising representational oligonucleotide microarray analysis (ROMA). We tested the performance of the method in sugar beet as a model for crop plants with little sequence information available. Genomic representations of both parents of a mapping population were hybridised on microarrays containing custom oligonucleotides based on sugar beet bacterial artificial chromosome (BAC) end sequences (BESs) and expressed sequence tags (ESTs). Subsequent analysis identified potential polymorphic oligonucleotides, which were placed on new microarrays used for screening of 184 F2 individuals. Exploiting known co-dominant anchor markers, we obtained 511 new dominant markers distributed over all nine sugar beet linkage groups and calculated genetic maps. Besides the method´s transferability to other species, the obtained genetic markers will be an asset for ordering of sequence contigs in the context of the ongoing sugar beet genome sequencing project. In addition, possible linkage of physical and genetic maps was provided, since genetic markers were based on source sequences, which were also used for construction of a BAC based physical map utilising a hybridisation approach. An example of the hybridisation based approach for physical map construction and its application for synteny studies was demonstrated. Since little is known about synteny between rosids and Caryophyllales so far, we analysed the extent of synteny between the genomic sequences of two BAC clones derived from two different Beta vulgaris haplotypes and rosid genomes. For selection of the two BAC clones we hybridised 30 oligonucleotide probes based on ESTs corresponding to Arabidopsis orthologs on chromosomes 1 and 4 that were presumably co-localised in the reconstructed Arabidopsis pseudo ancestral genome (Blanc et al. 2003) on sugar beet BAC macroarrays comprising two different sugar beet libraries. A total of 27,648 clones were screened per sugar beet library, corresponding to 4.4-fold and 5.5-fold, respectively, sugar beet genome coverage. We obtained four and five positive clones for the probes on average. Two clones, one from each haplotype that were positive with the same five EST probes, were selected and their genomic sequences were determined, annotated and exploited for synteny studies. Furthermore, I constructed and characterised a sugar beet fosmid library from the doubled haploid accession KWS2320 encompassing 115,200 independent clones. The insert size of the fosmid library was determined by pulsed field gel electrophoresis to be 39 kbp on average, thus representing 5.9-fold coverage of the sugar beet genome. Fosmids bear the advantage of narrowly defined size of the clone inserts, thus fosmid end sequences will essentially contribute to the future assembly and ordering of sequence contigs. Since repeats are a major obstacle for successful assembly of plant genome sequences, frequently causing gaps and misassembled contigs, I generated a genomic short-insert library. The short-insert library facilitated repeat identification within the sugar beet genome, which was exemplarily shown for three miniature inverted-repeat transposable element (MITE) families. Altogether this work contributed substantially to a deeper understanding of the genome structure of sugar beet and provided the basis for successful sequencing of the sugar beet genome
    corecore