90 research outputs found

    Practical considerations for plant phylogenomics

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/143756/1/aps31038_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/143756/2/aps31038.pd

    Sequencing, assembly and annotation of the mitochondrial and plastid genomes of Gelidium pristoides (Turner) Kützing from Kenton-on-Sea, South Africa

    Get PDF
    The genome is the complete set of an organism's hereditary information that contains all the information necessary for the functioning of that organism. Complete nuclear, mitochondrial and plastid DNA constitute the three main types of genomes which play interconnected roles in an organism. Genome sequencing enables researchers to understand the regulation and expression of the various genes and the proteins they encode. It allows researchers to extract and analyse genes of interests for a variety of studies including molecular, biotechnological, bioinformatics and conservation and evolutionary studies. Genome sequencing of Rhodophyta has received little attention. To date, no published studies are focusing on both whole genome sequencing and sequencing of the organellar genomes of Rhodophyta species found in along the South African coastline. This study focused on genome sequencing, assembly and annotation mitochondrial and plastid genomes of Gelidium pristoides. Gelidium pristoides was collected from Kenton-on-Sea and was morphologically identified at Rhodes University. Its genomic DNA was extracted using the Nucleospin® Plant II kit and quantified using Qubit 2.0, Nanodrop and 1% agarose gel electrophoresis. The Ion Plus Fragment Library kit was used for the preparation of a 600 bp library, which was sequenced in two separate runs through the Ion S5 platform. The produced reads were quality-controlled through the Ion Torrent server version 5.6. and assessed using the FASTQC program. The SPAdes version 3.11.1 assembler was used to assemble the quality-controlled reads, and the resultant genome assembly was quality-assessed using the QUAST 4.1 software. The mitochondrial genome was selected from the produced Gelidium pristoides draft genome using mitochondrial genomes of other Gelidiales as search queries on the local BLAST algorithm of the BioEdit software. Contigs matching the organellar genomes were ordered according to the mitochondrial genomes of other Gelidiales using the trial version of Geneious R11.12 software. The plastid genome was also selected following the same approach but using plastid genomes of Gelidium elegans and Gelidium vagum as search queries. Gaps observed in the organellar genomes were closed by amplification of the relevant gap using polymerase chain reaction with newly designed primers and Sanger sequencing. Open reading frames for both organellar genomes were annotated using the NCBI ORF-Finder and alignments obtained from BlastN and BlastX searches from the NCBI database, while the tRNAs and rRNAs were identified using the tRNAscan-SE1.21 vi and the RNAmmer 1.2 servers. The circular physical map of the mitochondrial genome was constructed using the CGView server. Lastly, in silico analysis of cytochrome c oxidase 3 and Heat Shock Protein 70 was performed using the PRIMO and the SWISS-MODEL pipelines respectively. Their phylogenies were analysed through Clustal omega and the trees viewed on TreeView 1.6.6 software. Qubit and Nanodrop genomic DNA qualification revealed A260/A280 and A230/A260 ratios of 1.81 and 1.52 respectively. The 1% agarose gel electrophoresis further confirmed the good quality of the genomic DNA used for library preparation and sequencing. Pre-assembly quality control of reads resulted in a total of 30 792 074 high-quality reads which were assembled into a total of 94140 contigs, making up an estimated genome length of 217.06 Mb. The largest contig covered up to 13.17 kb of the draft genome, and an N50 statistic value of 3.17 kb was obtained. The G.pristoides mitochondrial genome mapped into a circular molecule of 25012 bp, with an overall GC content of 31.04% and a total of 45 genes distributed into 20 tRNA-coding, 2 rRNAcoding genes and 23 protein-coding genes, mostly adopting the modified genetic code of Rhodophyta. The SecY and rps12 genes overlapped by 41 bp. This study presents a partial plastid genome composed of 89 (38%) fully annotated genes, of which 71 are protein-coding, and 18 are distributed among 15 tRNA-coding, 2 rRNA-coding and 1 RNaseP RNA-coding genes. Sixty-one (26%) partial protein-coding genes were predicted, while approximately 84 (36%) genes are not yet predicted. In silico analysis of the cytochrome c oxidase and heat shock protein 70 showed that the gene sequences obtained in this study and the resultant transcribed protein have sequences and structures that are similar to those from several other different species, thus validating the integrity of the genome sequences. This study provides genomic data necessary for understanding the genomic constituent of G.pristoides and serve as a foundation for studies of individual genes and for resolving evolutionary relationships

    Organelle Genetics in Plants

    Get PDF
    Chloroplasts in photosynthetic organisms and mitochondria in a vast majority of eukaryotes, contain part of the genetic material of a eukaryotic cell. The organisation and inheritance patterns of this organellar DNA are quite different to that of nuclear DNA. Present-day chloroplast and mitochondrial genomes contain only a few dozen genes. Nevertheless, these organelles harbor several thousand proteins, the vast majority of them encoded by the nucleus. As a result, the expression of nuclear and organelle genomes has to be very precisely coordinated. The selection of experimental and review papers of this book covers a wide range of topics related to chloroplasts and plant mitochondria research, illustrating recent advances and diverse insights into the field of organelle genetics in plants. These works represent some of the latest research on the genetics, genomics, and biotechnology of plant mitochondria and chloroplasts, and they are of significant broad interest for the community of plant scientists, especially for those working in the subjects related to organelle genetic

    The genome and epigenome of the European ash tree (Fraxinus excelsior)

    Get PDF
    PhdEuropean ash trees (Fraxinus excelsior ) are under threat from the fungal pathogen Hy- menoscyphus fraxineus causing ash dieback disease (ADB). Previous research has shown heritable variation in ADB susceptibility in natural ash populations. Prior to this project, very little genetic data were available for ash, thus hampering efforts to identify markers associated with susceptibility. In this thesis, I have presented nuclear and organellar assemblies of the 880 Mbp F. excelsior genome, with a combined N50 scaffold size of over 100 kbp. Using Ks distributions for six plant species, I found evidence for two whole genome duplication (WGD) events in the history of the ash lineage, one potentially shared with olive (Ks 0.4), and one potentially with other members of the Lamiales order (Ks 0.7). Using a further 38 genome sequences from trees originating throughout Europe, I found little evidence of any population structure throughout the European range of F.excelsior, but nd a substantial decrease in effective population size, both in the distant (from 10 mya) and recent past. Linkage disequilibrium is low at small distances between loci, with an r2 of 0.15 at a few hundred bp, but decays slowly from this point. From whole genome DNA methylation data of twenty F. excelsior and F. mandshurica trees, I identi ed 665 Differ- entially Methylated Regions (DMRs) between those with high and low ADB susceptibility. Of genes putatively duplicated in historical WGD events, an average of 25.9% were differen- tially methylated in at least one cytosine context, possibly indicative of unequal silencing. Finally, I found some variability in methylation patterns among clonal replicates (Pearson's correlation coefficient 0.960), but this was less than the variability found between different genotypes ( 0.955). The results from this project and the genome sequence especially, will be valuable to researchers aiming to breed or select ash trees with low susceptibility to ADB.EU FP7-PEOPLE project `INTERCROSSING', ID:289974. Sequencing of the reference tree was funded by NERC emergency grant NE/K01112X/1

    Applied Bioinformatics for ncRNA Characterization - Case Studies Combining Next Generation Sequencing & Genomics

    Get PDF
    Non-coding RNAs (ncRNAs) present a diverse class of functional molecules inherent in virtually all forms of cellular life. Besides the canonical protein-encoding mRNAs the role of these abundant transcripts has been overlooked for decades. Defined by their highly conserved structure ncRNAs are resistant to degradation and perform various regulatory functions. Despite the poor sequence conservation, comparative genomics can be employed to identify homologous ncRNAs based on their structure in related species. Through the availability of next generation sequencing techniques, a rich corpus of datasets is available which grants a detailed look into cellular processes. The combination of genomic and transcriptomic data allows for a detailed understanding of molecular mechanism as well as characterization of individual gene functions and their evolution. However, analytical processing of modern high-throughput data is only made viable through optimized bioinformatic algorithms and reproducible automation pipelines. This thesis consists of four major parts highlighting the diverse roles of ncRNAs concerning the transcription process viewed from different vantage points. The first part concerns an unusually long untranslated region in Rhodobacter which harbors a ncRNA that regulates the expression of the downstream division cell wall cluster. Second, the degradation of 6S RNA in Bacillus subtilis is experimentally reconstructed to shed light on this final part of the RNA life cycle. This ncRNA is ubiquitous among bacteria and known to be a global transcription regulator itself. Next, the focus moves to the eukaryotic system and RNase P, an ancient ribozyme that is involved in tRNA maturation. Due to differences in composition with an optional RNA and multiple protein subunits, its phylogenetic distribution and deviant characteristics throughout the eukaryotic lineage are examined in order to trace its evolution. Finally, a diverse subgroup of non-translated RNAs are circRNAs which recently received increased attention due to their abundance in neural tissue. Resulting from post-transcriptional back-splicing events circRNAs compete with their host gene for expression. In a zoological study of social insects circRNA were for the first time identified in honeybees. The goal was to find task-related differences in circRNA expression between nurse bees and foragers and thus pinpoint potential functions of these elusive ncRNAs. The combination of genomic methods and transcriptomic data makes in-depth functional analysis of ncRNAs possible and enables us to understand the molecular mechanisms on multiple levels. Through structural predictions a riboswitch like transcriptional control of UpsM was revealed that is unique to Rhodobacteraceae. Transcriptomic analysis exposed that 6S RNA is primarily processed by RNase J1 for maturation and degraded at internal loops by RNase Y. Evolutionary comparison of organellar RNase P revealed that the RNA subunit is potentially less conserved than thought while organellar proteinonly variants are widespread potentially due to horizontal gene transfer. In the case of circRNA, an entire group of ncRNAs was characterized in the social model organism of honeybees and evidence of at least one gene where circRNA levels are significantly reduced during nurse-to-forager transition could be shown. Moreover, an unexpected link between elevated DNA methylation and RNA circularization was discovered. The bioinformatic findings in all of these cases provide a foundation for further experimental research and illustrate how scientific endeavors cannot be automated completely but require rigorous investigation with customized tools

    Annotation of marine eukaryotic genomes

    Get PDF

    An approach to improved microbial eukaryotic genome annotation

    Full text link
    Les nouvelles technologies de séquençage d’ADN ont accélérées la vitesse à laquelle les données génomiques sont générées. Par contre, une fois séquencées et assemblées, un défi continu est l'annotation structurelle précise de ces nouvelles séquences génomiques. Par le séquençage et l'assemblage du transcriptome (RNA-Seq) du même organisme, la précision de l'annotation génomique peut être améliorée, car les lectures de RNA-Seq et les transcrits assemblés fournissent des informations précises sur la structure des gènes. Plusieurs pipelines bio-informatiques actuelles incorporent des informations provenant du RNA-Seq ainsi que des données de similarité des séquences protéiques, pour automatiser l'annotation structurelle d’un génome de manière que la qualité se rapproche à celle de l'annotation par des experts. Les pipelines suivent généralement un flux de travail similaire. D'abord, les régions répétitives sont identifiées afin d'éviter de fausser les alignements de séquences et les prédictions de gènes. Deuxièmement, une base de données est construite contenant les données expérimentales telles que l’alignement des lectures de séquences, des transcrits et des protéines, ce qui informe les prédictions de gènes basées sur les Modèles de Markov Cachés généralisés. La dernière étape est de consolider les alignements de séquences et les prédictions de gènes dans un consensus de haute qualité. Or, les pipelines existants sont complexes et donc susceptibles aux biais et aux erreurs, ce qui peut empoisonner les prédictions de gènes et la construction de modèles consensus. Nous avons développé une approche améliorée pour l'annotation des génomes eucaryotes microbiens. Notre approche comprend deux aspects principaux. Le premier est axé sur la création d'un ensemble d'évidences extrinsèques le plus complet et diversifié afin de mieux informer les prédictions de gènes. Le deuxième porte sur la construction du consensus du modèle de gènes en utilisant les évidences extrinsèques et les prédictions par MMC, tel que l'influence de leurs biais potentiel soit réduite. La comparaison de notre nouvel outil avec trois pipelines populaires démontre des gains significatifs de sensibilité et de spécificité des modèles de gènes, de transcrits, d'exons et d'introns dans l’annotation structural de génomes d’eucaryotes microbiens.New sequencing technologies have considerably accelerated the rate at which genomic data is being generated. One ongoing challenge is the accurate structural annotation of those novel genomes once sequenced and assembled, in particular if the organism does not have close relatives with well-annotated genomes. Whole-transcriptome sequencing (RNA-Seq) and assembly—both of which share similarities to whole-genome sequencing and assembly, respectively—have been shown to dramatically increase the accuracy of gene annotation. Read coverage, inferred splice junctions and assembled transcripts can provide valuable information about gene structure. Several annotation pipelines have been developed to automate structural annotation by incorporating information from RNA-Seq, as well as protein sequence similarity data, with the goal of reaching the accuracy of an expert curator. Annotation pipelines follow a similar workflow. The first step is to identify repetitive regions to prevent misinformed sequence alignments and gene predictions. The next step is to construct a database of evidence from experimental data such as RNA-Seq mapping and assembly, and protein sequence alignments, which are used to inform the generalised Hidden Markov Models of gene prediction software. The final step is to consolidate sequence alignments and gene predictions into a high-confidence consensus set. Thus, automated pipelines are complex, and therefore susceptible to incomplete and erroneous use of information, which can poison gene predictions and consensus model building. Here, we present an improved approach to microbial eukaryotic genome annotation. Its conception was based on identifying and mitigating potential sources of error and bias that are present in available pipelines. Our approach has two main aspects. The first is to create a more complete and diverse set of extrinsic evidence to better inform gene predictions. The second is to use extrinsic evidence in tandem with predictions such that the influence of their respective biases in the consensus gene models is reduced. We benchmarked our new tool against three known pipelines, showing significant gains in gene, transcript, exon and intron sensitivity and specificity in the genome annotation of microbial eukaryotes

    ACARORUM CATALOGUS IX. Acariformes, Acaridida, Schizoglyphoidea (Schizoglyphidae), Histiostomatoidea (Histiostomatidae, Guanolichidae), Canestrinioidea (Canestriniidae, Chetochelacaridae, Lophonotacaridae, Heterocoptidae), Hemisarcoptoidea (Chaetodactylidae, Hyadesiidae, Algophagidae, Hemisarcoptidae, Carpoglyphidae, Winterschmidtiidae)

    Get PDF
    The 9th volume of the series Acarorum Catalogus contains lists of mites of 13 families, 225 genera and 1268 species of the superfamilies Schizoglyphoidea, Histiostomatoidea, Canestrinioidea and Hemisarcoptoidea. Most of these mites live on insects or other animals (as parasites, phoretic or commensals), some inhabit rotten plant material, dung or fungi. Mites of the families Chetochelacaridae and Lophonotacaridae are specialised to live with Myriapods (Diplopoda). The peculiar aquatic or intertidal mites of the families Hyadesidae and Algophagidae are also included.Publishe

    Genomic studies on floral and vegetative development in the Genus Streptocarpus (Gesneriaceae)

    Get PDF
    The genus Streptocarpus consists of around 180 species with diverse morphologies. At least three main types of vegetative growth forms can be distinguished: caulescent, rosulate (acaulescents with multiple leaves), and unifoliate (acaulescents with one leaf). Floral size, shape, and pigmentation pattern are also highly variable between species. Previous studies have suggested that some of the morphological characters are inherited as Mendelian traits. For instance, the rosulate growth form is dominant over the unifoliate, and the rosulate / unifoliate growth form was hypothesised to be determined by two genetic loci, based on the Mendelian segregation ratios recorded in backcross and F2 populations. However, the identity of the loci and the underlying molecular mechanisms remain unknown. In this study, Streptocarpus rexii (rosulate) and Streptocarpus grandis (unifoliate) were used to study the genetic basis of morphological variation in Streptocarpus. The aim is to use modern next generation sequencing (NGS) technologies to build draft genomes, transcriptomes, and genetic maps for the non-model Streptocarpus plants, and carry out quantitative trait loci (QTL) mapping to locate the causative loci. First, suitable DNA and RNA extraction methods for obtaining NGS-quality nucleic acids from Streptocarpus were established. For DNA extraction this was a modified protocol of the ChargeSwitch gDNA Plant Kit, and for RNA extraction a TRIzol reagent plus phenol:chloroform:isoamyl alcohol wash protocol was devised. The nucleic acid samples extracted were subsequently used for library preparation and NGS sequencing experiments. Whole genome shotgun sequencing was performed for S. rexii and S. grandis using Illumina HiSeq 4000 and HiSeq X. De novo assembly of the sequence data produced a S. rexii draft genome of 596,583,869 bp, with 95,845 scaffolds and an N50 value of 35,609 bp. The S. grandis draft genome had a total span of 843,329,708 bp, with 127,951 scaffolds and an N50 value of 31,638 bp. The genome assemblies served as references for subsequent NGS data analysis. The RNA samples derived from various vegetative and floral tissues of S. rexii and S. grandis were sequenced on MiSeq and HiSeq 4000 platforms. The transcriptome assembly was carried out using de novo and reference-based methods (i.e. mapped to the obtained draft genomes), followed by putative protein-coding open reading frame identification and annotation. For S. rexii, 60,500 and 53,322 transcripts were constructed in the de novo and reference-based assemblies respectively. For S. grandis, 51,267 and 46,429 transcripts were constructed respectively. A Streptocarpus genetic map was constructed using restriction-site associated DNA sequencing (RAD-Seq) genotyping of a backcross population ((S. grandis × S. rexii) × S. grandis). The RAD-Seq data were analysed using a de novo approach and reference-based approaches with two different aligners, and the RAD-markers recovered from the three approaches were combined to maximise the genetic map density. Different marker-filtering strategies with varying stringencies were also tested and compared. The results showed that the most stringently filtered map had 377 mapped markers in 17 linkage groups, and a total distance of 1,144.2 cM. On the other hand, the densest map consisted of 853 markers in 16 linkage groups (matching the basic haploid chromosome number of the Streptocarpus species used here), and a total distance of 1,389.9 cM. The maps constructed were used for QTL mapping of growth form variation, identifying up to 5 effective loci for the rosulate / unifoliate phenotypes, with two of the loci on LG2 and LG14 consistently found in all mapping attempts. The results suggest that the variation in growth form may be regulated by two major loci, but a few additional minor loci might also be associated with the trait. Several QTLs for floral dimension, flowering time, and floral pigmentation patterns were also found, and the genetic regions associated with the floral traits of Streptocarpus were revealed for the first time. During this study valuable genomic resources were generated for future research to identify the genes underlying different morphologies in the genus Streptocarpus. The reported QTLs narrow down the genetic region for fine-mapping studies, and the genome and transcriptome resources will aid the isolation of candidate gene sequences. Identifying the genetic loci and their crosstalk behind the variable morphologies in future work will greatly add to our knowledge on how the highly diverse genus Streptocarpus has evolved and on how fundamental developmental processes of plants are regulated
    corecore