13 research outputs found

    Evolution of Protein Ductility in Duplicated Genes of Plants

    Get PDF
    Previous work has shown that ductile/intrinsically disordered proteins (IDPs) and residues (IDRs) are found in all unicellular and multicellular organisms, wherein they are essential for basic cellular functions and complement the function of rigid proteins. In addition, computational studies of diverse phylogenetic lineages have revealed: (1) that protein ductility increases in concert with organismic complexity, and (2) that distributions of IDPs and IDRs along the chromosomes of plant species are non-random and correlate with variations in the rates of the genetic recombination and chromosomal rearrangement. Here, we show that approximately 50% of aligned residues in paralogs across a spectrum of algae, bryophytes, monocots, and eudicots are IDRs and that a high proportion (ca. 60%) are in disordered segments greater than 30 residues. When three types of IDRs are distinguished (i.e., identical, similar and variable IDRs) we find that species with large numbers of chromosome and endoduplicated genes exhibit paralogous sequences with a higher frequency of identical IDRs, whereas species with small chromosomes numbers exhibit paralogous sequences with a higher frequency of similar and variable IDRs. These results are interpreted to indicate that genome duplication events influence the distribution of IDRs along protein sequences and likely favor the presence of identical IDRs (compared to similar IDRs or variable IDRs). We discuss the evolutionary implications of gene duplication events in the context of ductile/disordered residues and segments, their conservation, and their effects on functionality

    Synteny-based phylogenomic networks for comparative genomics

    Get PDF
    For comparative genomics, relative gene orders or synteny holds key information to assess genomic innovations such as gene duplications, gene loss, or transpositions. While the number of reference genomes is growing exponentially, a major challenge is how to detect, represent, and visualize synteny relations of any genes of interest effectively across a large number of genomes. In this thesis, I present six chapters centering on a network approach for large-scale phylogenomic synteny analysis, and discuss how such a network approach can enhance our understanding of the evolutionary history of genes and genomes across broad phylogenetic groups and divergence times. In Chapter 1, I stress that synteny information is becoming more important at this genomics age with rapidly developing DNA sequencing technologies. It provides us another layer of data besides merely sequences, and could potentially be better used to improve phylogeny. I also summarized current available tools and gave an example of popular websites for synteny detection. In Chapter 2, I propose an outline performing synteny network analysis, which is based on three primary steps: pairwise whole genome comparisons, syntenic block detection and data fusion, and network visualization. Then with comparison to a previous synteny comparison result which use traditional parallel coordinate plots, I show that the network approach could present us a much clear, strong, and systematic graph, with integrated synteny information from 101 broadly distributed species. In Chapter 3, we analyzed synteny networks of the entire MADS-box transcription factor gene family from fifty-one completed plant genomes. We applied a k-cliques percolation method to cluster the synteny network. We found lineage-specific clusters that derive from transposition events for the regulators of floral development (APETALA3 and PI) and flowering-time (FLC) in the Brassicales and for the regulators of root-development (AGL17) in Poales. We also visualized big difference of synteny properties between Type I MADS-box genes and Type II MADS-box genes. We identified two large gene clusters that jointly encompass many key phenotypic regulatory Type II MADS-box gene clades (SEP1, SQUA, TM8, SEP3, FLC, AGL6 and TM3). This allows for a better understanding of how evolution has acted on a key regulatory gene family in the plant kingdom. In Chapter 4, we performed synteny network analysis of LEA gene families, which includes eight different subfamilies (LEA_1 to LEA_6, SMP, and DHN) and has a relatively chaotic classification. Synteny clusters provide us better pictures of genomic innovations and function diversification. For example recurrent tandem duplications contributed to LEA_2 family expansion, whereas synteny and protein sequence were highly conserved during the evolution of LEA_5. In Chapter 5, instead of the analysis of a particular gene family, I scale up the analysis to all the genes from all available genomes across kingdoms over significant evolutionary timescales. We used available genomes of 87 mammals and 107 flowering plants. We first compare synteny percentage with popular genome metrics such as BUSCO and N50, which reveal genomic architecture conservation and variation across kingdoms. We characterized and compare the properties of the whole network, using degree distribution and clustering results. Through phylogenomic profiling of size, degree and compositions of all clusters, we identified many phylogenomic genomic innovations (i.e. duplications, gene transpositions, gene loss), at the individual gene level, from tested mammal and angiosperm genomes. In Chapter 6, I summarize the merits of taking a network-based approach for synteny comparisons, and discuss current clustering methods for synteny data. I also mentioned several weakness, which could be further complemented in the future.</p

    Doctor of Philosophy

    Get PDF
    dissertationThe MAKER genome annotation and curation software tool was developed in response to increased demand for genome annotation services, secondary to decreased genome sequencing costs. MAKER currently has over 1000 registered users throughout the world. This wide adoption of MAKER has uncovered the need for additional functionalities. Here I addressed moving MAKER into the domain of plant annotation, expanding MAKER to include new methods of gene and noncoding RNA annotation, and improving usability of MAKER through documentation and community outreach. To move MAKER into the plant annotation domain, I benchmarked MAKER on the well-annotated Arabidopsis thaliana genome. MAKER performs well on the Arabidopsis genome in de novo genome annotation and was able to improve the current TAIR10 gene models by incorporating mRNA-seq data not available during the original annotation efforts. In addition to this benchmarking, I annotated the genome of the sacred lotus Nelumbo Nucifera. I enabled noncoding RNA annotation in MAKER by adding the ability for MAKER to run and process the outputs of tRNAscan-SE and snoscan. These functionalities were tested on the Arabidopsis genome and used MAKER to annotate tRNAs and snoRNAs in Zea mays. The resulting version of MAKER was named MAKER-P. I added the functionality of a combiner by adding EVidence Modeler to the MAKER code base. iv As the number of MAKER users has grown, so have the help requests sent to the MAKER developers list. Motivated by the belief that improving the MAKER documentation would obviate the need for many of these requests, I created a media wiki that was linked to the MAKER download page, and the MAKER developers list was made searchable. Additionally I have written a unit on genome annotation using MAKER for Current Protocols in Bioinformatics. In response to these efforts I have seen a corresponding decrease in help requests, even though the number of registered MAKER users continues to increase. Taken together these products and activities have moved MAKER into the domain of plant annotation, expanded MAKER to include new methods of gene and noncoding RNA annotation, and improved the usability of MAKER through documentation and community outreach

    Evolution of Protein Ductility in Duplicated Genes of Plants

    Get PDF
    Previous work has shown that ductile/intrinsically disordered proteins (IDPs) and residues (IDRs) are found in all unicellular and multicellular organisms, wherein they are essential for basic cellular functions and complement the function of rigid proteins. In addition, computational studies of diverse phylogenetic lineages have revealed: (1) that protein ductility increases in concert with organismic complexity, and (2) that distributions of IDPs and IDRs along the chromosomes of plant species are non-random and correlate with variations in the rates of the genetic recombination and chromosomal rearrangement. Here, we show that approximately 50% of aligned residues in paralogs across a spectrum of algae, bryophytes, monocots, and eudicots are IDRs and that a high proportion (ca. 60%) are in disordered segments greater than 30 residues. When three types of IDRs are distinguished (i.e., identical, similar and variable IDRs) we find that species with large numbers of chromosome and endoduplicated genes exhibit paralogous sequences with a higher frequency of identical IDRs, whereas species with small chromosomes numbers exhibit paralogous sequences with a higher frequency of similar and variable IDRs. These results are interpreted to indicate that genome duplication events influence the distribution of IDRs along protein sequences and likely favor the presence of identical IDRs (compared to similar IDRs or variable IDRs). We discuss the evolutionary implications of gene duplication events in the context of ductile/disordered residues and segments, their conservation, and their effects on functionality

    Tracing the Evolution of the Floral Homeotic B- and C-Function Genes through Genome Synteny

    Get PDF
    The evolution of the floral homeotic genes has been characterized using phylogenetic and functional studies. It is possible to enhance these studies by comparing gene content and order between species to determine the evolutionary history of the regulatory genes. Here, we use a synteny-based approach to trace the evolution of the floral B- and C-function genes that are required for specification of the reproductive organs. Consistent with previous phylogenetic studies, we show that the euAP3–TM6 split occurred after the monocots and dicots diverged. The Arabidopsis TM6 and papaya euAP3 genes are absent from the respective genomes, and we have detected loci from which these genes were lost. These data indicate that either the TM6 or the euAP3 lineage genes can be lost without detriment to flower development. In contrast, PI is essential for male reproductive organ development; yet, contrary to predictions, complex genomic rearrangements have resulted in almost complete breakdown of synteny at the PI locus. In addition to showing the evolution of B-function genes through the prediction of ancestral loci, similar reconstructions reveal the origins of the C-function AG and PLE lineages in dicots, and show the shared ancestry with the monocot C-function genes. During our studies, we found that transposable elements (TEs) present in sequenced Antirrhinum genomic clones limited comparative studies. A pilot survey of the Antirrhinum data revealed that gene-rich regions contain an unusually high degree of TEs of very varied types, which will be an important consideration for future genome sequencing efforts

    Generation and application of genomic tools as important prerequisites for sugar beet genome analyses.

    Get PDF
    Genetic and physical maps of a genome are essential tools for structural, functional and applied genomics. Genetic maps allow the detection of quantitative trait loci (QTLs), the characterisation of QTL effects and facilitate marker-assisted selection (MAS). The characterisation of genome structure and analysis of evolution is augmented by physical maps. Whole genome physical maps or ultimately complete genomic sequences, respectively, of a species display frameworks that provide essential information for understanding processes in respect to physiology, morphology, development and genetics. However, comprehensive annotation underpins the values a genome sequence or physical map represents. An important task of genome annotation is the linkage of genetic traits to the genome sequence, which is facilitated by integrated genetic and physical maps. In the context of this study several sugar beet (Beta vulgaris L.) genomic tools were developed and applied for evolutionary studies and linkage analysis. A new technique allowing high-throughput identification and genotyping of genetic markers was developed, utilising representational oligonucleotide microarray analysis (ROMA). We tested the performance of the method in sugar beet as a model for crop plants with little sequence information available. Genomic representations of both parents of a mapping population were hybridised on microarrays containing custom oligonucleotides based on sugar beet bacterial artificial chromosome (BAC) end sequences (BESs) and expressed sequence tags (ESTs). Subsequent analysis identified potential polymorphic oligonucleotides, which were placed on new microarrays used for screening of 184 F2 individuals. Exploiting known co-dominant anchor markers, we obtained 511 new dominant markers distributed over all nine sugar beet linkage groups and calculated genetic maps. Besides the method´s transferability to other species, the obtained genetic markers will be an asset for ordering of sequence contigs in the context of the ongoing sugar beet genome sequencing project. In addition, possible linkage of physical and genetic maps was provided, since genetic markers were based on source sequences, which were also used for construction of a BAC based physical map utilising a hybridisation approach. An example of the hybridisation based approach for physical map construction and its application for synteny studies was demonstrated. Since little is known about synteny between rosids and Caryophyllales so far, we analysed the extent of synteny between the genomic sequences of two BAC clones derived from two different Beta vulgaris haplotypes and rosid genomes. For selection of the two BAC clones we hybridised 30 oligonucleotide probes based on ESTs corresponding to Arabidopsis orthologs on chromosomes 1 and 4 that were presumably co-localised in the reconstructed Arabidopsis pseudo ancestral genome (Blanc et al. 2003) on sugar beet BAC macroarrays comprising two different sugar beet libraries. A total of 27,648 clones were screened per sugar beet library, corresponding to 4.4-fold and 5.5-fold, respectively, sugar beet genome coverage. We obtained four and five positive clones for the probes on average. Two clones, one from each haplotype that were positive with the same five EST probes, were selected and their genomic sequences were determined, annotated and exploited for synteny studies. Furthermore, I constructed and characterised a sugar beet fosmid library from the doubled haploid accession KWS2320 encompassing 115,200 independent clones. The insert size of the fosmid library was determined by pulsed field gel electrophoresis to be 39 kbp on average, thus representing 5.9-fold coverage of the sugar beet genome. Fosmids bear the advantage of narrowly defined size of the clone inserts, thus fosmid end sequences will essentially contribute to the future assembly and ordering of sequence contigs. Since repeats are a major obstacle for successful assembly of plant genome sequences, frequently causing gaps and misassembled contigs, I generated a genomic short-insert library. The short-insert library facilitated repeat identification within the sugar beet genome, which was exemplarily shown for three miniature inverted-repeat transposable element (MITE) families. Altogether this work contributed substantially to a deeper understanding of the genome structure of sugar beet and provided the basis for successful sequencing of the sugar beet genome

    Phenotypic and Genotypic Studies in the Peach [Prunus persica (L.) Batsch] and Muscadine Grape (Vitis rotundifolia Michx.)

    Get PDF
    Peach: Peaches [Prunus persica (L.) Batsch] are routinely chilled to increase shelf-life. Exposure to temperatures of 5° C for two weeks can induce chilling injury (CI) symptoms, including flesh mealiness (or wooliness) and a lack of juiciness. Phenotypic data were collected on seven biparental F1 peach populations maintained at the University of Arkansas Fruit Research Station. A genome wide association study (GWAS) was performed using TASSEL 5 which identified four quantitative trait loci (QTLs) associated with expressible juice, four QTLs for mealiness, five QTLs for soluble solids, and three QTLs for fruit weight. Exploiting these genetic markers could help breeders identify fruit quality traits in seedlings through marker-assisted selection (MAS). Muscadine: Two biparental F1 muscadine (Vitis rotundifolia Michx.) populations were phenotyped for flower sex and berry color, and genotyping-by-sequencing (GBS) was performed to produce high-density genetic linkage maps. A total of 1244 SNP markers in population Black Beauty [BB] x Nesbitt [N] and 2069 SNP markers in population Supreme [S] x Nesbitt [N] were mapped to 20 linkage groups (LG) for each population. The results support previous studies revealing an evolutionary bifurcation of V. vinifera chromosome 7 into two independently segregating linkage groups in the muscadine, or, conversely, a possible fusion of muscadine-derived chromosomes into chromosome 7 of V. vinifera. The locus controlling flower type in muscadine mapped to a region spanning 4.6 – 5.1 Mbp on chromosome 2, while the berry color locus mapped to a region spanning 11.1-11.9 Mbp on chromosome 4. These high-density linkage maps lay the groundwork for marker-assisted selection (MAS) in muscadine and provide clues to the evolutionary relationship of the muscadine with V. vinifera. Colorimetry: Precise color identification is critical in many scientific fields, and horticulture is no exception. Plant breeders must be able to effectively discern colors among plant parts and provide accurate descriptions when applying for legal protections. The RHS Colour Chart is currently recognized as the most universally accepted method of assigning color descriptions in horticulture. The RHS Colour Chart relies on manually matching plant parts with the labeled color chips provided. Color perception in humans is complicated by many factors, including the type and quantity of illumination available as well as the individual’s own physiological abilities and limitations. Scientific colorimeters have been developed to serve as an objective way to study color, and many hypothetical color space models have been created to enhance this field of study. The CIE 1976 L*a*b* (CIELAB) color space is widely recognized as a scientific standard and was used in this study. Traditional colorimeters have been bulky and expensive lab equipment, but a new, portable, inexpensive LED-based color scanner called the Nix Pro Color Sensor™ has recently become available. Multiple studies were conducted comparing the Nix Pro with the Konica Minolta CR-400 colorimeter and the RHS Colour Chart paint chip system. The results indicate the Nix Pro, which is inexpensive, yields consistent results, and features built-in color matching capabilities, could be a very useful tool for horticulturists and plant breeders
    corecore