30 research outputs found
High-quality SNPs from genic regions highlight introgression patterns among European white oaks (Quercus petraea and Q. robur)
International audienceIn the post-genomics era, non-model species like most Fagaceae still lack operational diversity resources for population genomics studies. Sequence data were produced from over 800 gene fragments covering ~530 kb across the genic partition of European oaks, in a discovery panel of 25 individuals from western and central Europe (11 Quercus petraea, 13 Q. robur, one Q. ilex as an outgroup). Regions targeted represented broad functional categories potentially involved in species ecological preferences, and a random set of genes. Using a high-quality dedicated pipeline, we provide a detailed characterization of these genic regions, which included over 14500 polymorphisms, with ~12500 SNPs â218 being triallelic-, over 1500 insertion-deletions, and ~200 novel di- and tri-nucleotide SSR loci. This catalog also provides various summary statistics within and among species, gene ontology information, and standard formats to assist loci choice for genotyping projects. The distribution of nucleotide diversity (ΞÏ) and differentiation (FST) across genic regions are also described for the first time in those species, with a mean n ÎžÏ close to ~0.0049 in Q. petraea and to ~0.0045 in Q. robur across random regions, and a mean FST ~0.13 across SNPs. The magnitude of diversity across genes is within the range estimated for long-term perennial outcrossers, and can be considered relatively high in the plant kingdom, with an estimate across the genome of 41 to 51 million SNPs expected in both species. Individuals with typical species morphology were more easily assigned to their corresponding genetic cluster for Q. robur than for Q. petraea, revealing higher or more recent introgression in Q. petraea and a stronger species integration in Q. robur in this particular discovery panel. We also observed robust patterns of a slightly but significantly higher diversity in Q. petraea, across a random gene set and in the abiotic stress functional category, and a heterogeneous landscape of both diversity and differentiation. To explain these patterns, we discuss an alternative and non-exclusive hypothesis of stronger selective constraints in Q. robur, the most pioneering species in oak forest stand dynamics, additionally to the recognized and documented introgression history in both species despite their strong reproductive barriers. The quality of the data provided here and their representativity in terms of species genomic diversity make them useful for possible applications in medium-scale landscape and molecular ecology projects. Moreover, they can serve as reference resources for validation purposes in larger-scale resequencing projects. This type of project is preferentially recommended in oaks in contrast to SNP array development, given the large nucleotide variation and the low levels of linkage disequilibrium revealed
Tandem repeats structure of gel-forming mucin domains could be revealed by SMRT sequencing data
Abstract Mucins are large glycoproteins that cover and protect epithelial surface of the body. Mucin domains of gel-forming mucins are rich in proline, threonine, and serine that are heavily glycosylated. These domains show great complexity with tandem repeats, thus make it difficult to study the sequences. With the coming of single molecule real-time (SMRT) sequencing technologies, we manage to present sequence structure of mucin domains via SMRT long reads for gel-forming mucins MUC2, MUC5AC, MUC5B and MUC6. Our study shows that for different individuals, single nucleotide polymorphisms could be found in mucin domains of MUC2, MUC5AC, MUC5B and MUC6, while different number of tandem repeats could be found in mucin domains of MUC2 and MUC6. Furthermore, we get the sequence of MUC2, MUC5AC, and MUC5B mucin domain in a Chinese individual for each nucleotide at accuracy of possibly 99.98â99.99%, 99.93â99.99%, and 99.76â99.99%, respectively. We report a new method to obtain DNA sequence of gel-forming mucin domains. This method will provided new insights on getting the sequence for Tandem Repeat parts which locate in coding region. With the sequences we obtained through this method, we can give more information for people to study the sequences of gel-forming mucin domains
Evolution, diversification, and expression of KNOX proteins in plants
The KNOX (KNOTTED1-like homeobox) transcription factors play a pivotal role in leaf and meristem development. The majority of these proteins are characterized by the KNOX1, KNOX2, ELK, and homeobox domains whereas the proteins of the KNATM family contain only the KNOX domains. We carried out an extensive inventory of these proteins and here report on a total of 394 KNOX proteins from 48 species. The land plant proteins fall into two classes (I and II) as previously shown where the class I family seems to be most closely related to the green algae homologs. The KNATM proteins are restricted to Eudicots and some species have multiple paralogs of this protein. Certain plants are characterized by a significant increase in the number of KNOX paralogs; one example is Glycine max. Through the analysis of public gene expression data we show that the class II proteins of this plant have a relatively broad expression specificity as compared to class I proteins, consistent with previous studies of other plants. In G. max, class I protein are mainly distributed in axis tissues and KNATM paralogs are overall poorly expressed; highest expression is in the early plumular axis. Overall, analysis of gene expression in G. max demonstrates clearly that the expansion in gene number is associated with functional diversification
High-throughput transcriptome sequencing and preliminary functional analysis in four Neotropical tree species
[b]Background[/b][br/] [br/] The Amazonian rainforest is predicted to suffer from ongoing environmental changes. Despite the need to evaluate the impact of such changes on tree genetic diversity, we almost entirely lack genomic resources.[br/] [br/] [b]Results[/b][br/] [br/] In this study, we analysed the transcriptome of four tropical tree species (Carapa guianensis, Eperua falcata, Symphonia globulifera and Virola michelii) with contrasting ecological features, belonging to four widespread botanical families (respectively Meliaceae, Fabaceae, Clusiaceae and Myristicaceae). We sequenced cDNA libraries from three organs (leaves, stems, and roots) using 454 pyrosequencing. We have developed an R and bioperl-based bioinformatic procedure for de novo assembly, gene functional annotation and marker discovery. Mismatch identification takes into account single-base quality values as well as the likelihood of false variants as a function of contig depth and number of sequenced chromosomes. Between 17103 (for Symphonia globulifera) and 23390 (for Eperua falcata) contigs were assembled. Organs varied in the numbers of unigenes they apparently express, with higher number in roots. Patterns of gene expression were similar across species, with metabolism of aromatic compounds standing out as an overrepresented gene function. Transcripts corresponding to several gene functions were found to be over- or underrepresented in each organ. We identified between 4434 (for Symphonia globulifera) and 9076 (for Virola surinamensis) well-supported mismatches. The resulting overall mismatch density was comprised between 0.89 (S. globulifera) and 1.05 (V. surinamensis) mismatches/100 bp in variation-containing contigs.[br/] [br/] [b]Conclusion[/b][br/] [br/] The relative representation of gene functions in the four transcriptomes suggests that secondary metabolism may be particularly important in tropical trees. The differential representation of transcripts among tissues suggests differential gene expression, which opens the way to functional studies in these non-model, ecologically important species. We found substantial amounts of mismatches in the four species. These newly identified putative variants are a first step towards acquiring much needed genomic resources for tropical tree species
Data from: Outlier loci highlight the direction of introgression in oaks
Loci considered to be under selection are generally avoided in attempts to infer past demographic processes as they do not fit neutral model assumptions. However, opportunities to better reconstruct some aspects of past demography might thus be missed. Here we examined genetic differentiation between two sympatric European oak species with contrasting ecological dynamics (Quercus robur and Q. petraea) with both outlier (i.e. loci possibly affected by divergent selection between species or by hitchhiking effects with genomic regions under selection) and non-outlier loci. We sampled 855 individuals in six mixed forests in France and genotyped them with a set of 262 SNPs enriched with markers showing high interspecific differentiation, resulting in accurate species delimitation. We identified between 13 and 74 interspecific outlier loci, depending on the coalescent simulation models and parameters used. Greater genetic diversity was predicted in Q. petraea (a late successional species) than in Q. robur (an early successional species) as introgression should theoretically occur predominantly from the resident species to the invading species. Remarkably, this prediction was verified with outlier loci but not with non-outlier loci. We suggest that the lower effective interspecific gene flow at loci showing high interspecific divergence has better preserved the signal of past asymmetric introgression towards Q. petraea caused by the speciesâ contrasting dynamics. Using markers under selection to reconstruct past demographic processes could therefore have broader potential than generally recognized