53 research outputs found

    The zebrafish transcriptome during early development

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The transition from fertilized egg to embryo is accompanied by a multitude of changes in gene expression, and the transcriptional events that underlie these processes have not yet been fully characterized. In this study RNA-Seq is used to compare the transcription profiles of four early developmental stages in zebrafish (<it>Danio rerio</it>) on a global scale.</p> <p>Results</p> <p>An average of 79 M total reads were detected from the different stages. Out of the total number of reads 65% - 73% reads were successfully mapped and 36% - 44% out of those were uniquely mapped. The total number of detected unique gene transcripts was 11187, of which 10096 were present at 1-cell stage. The largest number of common transcripts was observed between 1-cell stage and 16-cell stage. An enrichment of gene transcripts with molecular functions of DNA binding, protein folding and processing as well as metal ion binding was observed with progression of development. The sequence data (accession number ERP000635) is available at the European Nucleotide Archive.</p> <p>Conclusion</p> <p>Clustering of expression profiles shows that a majority of the detected gene transcripts are present at steady levels, and thus a minority of the gene transcripts clusters as increasing or decreasing in expression over the four investigated developmental stages. The three earliest developmental stages were similar when comparing highly expressed genes, whereas the 50% epiboly stage differed from the other three stages in the identity of highly expressed genes, number of uniquely expressed genes and enrichment of GO molecular functions. Taken together, these observations indicate a major transition in gene regulation and transcriptional activity taking place between the 512-cell and 50% epiboly stages, in accordance with previous studies.</p

    Transposon- and Genome Dynamics in the Fungal Genus Neurospora: Insights from Nearly Gapless Genome Assemblies

    Get PDF
    A large portion of nuclear DNA is composed of transposable element (TE) sequences, whose transposition is controlled by diverse host defense strategies in order to maintain genomic integrity. One such strategy is the fungal-specific Repeat-Induced Point mutation (RIP) that hyper-mutates repetitive DNA sequences. While RIP is found across Fungi, it has been shown to vary in efficiency. The filamentous ascomycete Neurospora crassa has been a pioneer in the study of RIP, but data on TEs and RIP from other species in the genus is limited. In this study, we investigated 18 nearly gapless genome assemblies of ten Neurospora species, which diverged from a common ancestor about 7 MYA, to determine and compare genome-wide TE distribution and their associated RIP patterns. Four of these assemblies, generated by PacBio technology, represent new genomic datasets. We showed that the TE contents between 8.7-18.9% covary with genome sizes that range between 37.8-43.9 Mb. Degraded copies of Long Terminal Repeat (LTR) retrotransposons were abundant among the identified TEs, and these are distributed across the genome at varying frequencies. In all investigated Neurospora genomes, TE sequences had signs of numerous C-to-T substitutions, suggesting that RIP occurred in all species, and accordingly, RIP signatures correlated with TE-dense regions in all genomes. In conclusion, essentially gapless genome assemblies allowed us to identify TEs in Neurospora genomes, and reveal that TEs contribute to genome size variation in this group. Our study suggests that TEs and RIP are highly correlated in each examined Neurospora species, and hence, the pattern of interaction is conserved over the investigated evolutionary timescale. Finally, with our results, we verify that RIP signatures can be used to facilitate the identification of TE-rich region in the genome. The comprehensive genomic dataset of Neurospora is a rich resource for further in-depth analyses of fungal genomes by the community

    Dominant Mutations in GRHL3 Cause Van der Woude Syndrome and Disrupt Oral Periderm Development

    Get PDF
    Mutations in interferon regulatory factor 6 (IRF6) account for āˆ¼70% of cases of Van der Woude syndrome (VWS), the most common syndromic form of cleft lip and palate. In 8 of 45 VWS-affected families lacking a mutation in IRF6, we found coding mutations in grainyhead-like 3 (GRHL3). According to a zebrafish-based assay, the disease-associated GRHL3 mutations abrogated periderm development and were consistent with a dominant-negative effect, in contrast to haploinsufficiency seen in most VWS cases caused by IRF6 mutations. In mouse, all embryos lacking Grhl3 exhibited abnormal oral periderm and 17% developed a cleft palate. Analysis of the oral phenotype of double heterozygote (Irf6+/āˆ’;Grhl3+/āˆ’) murine embryos failed to detect epistasis between the two genes, suggesting that they function in separate but convergent pathways during palatogenesis. Taken together, our data demonstrated that mutations in two genes, IRF6 and GRHL3, can lead to nearly identical phenotypes of orofacial cleft. They supported the hypotheses that both genes are essential for the presence of a functional oral periderm and that failure of this process contributes to VWS

    Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

    Get PDF
    Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone

    Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

    Get PDF
    The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology

    Computational approaches for in-depth analysis of cDNA sequence tags

    No full text
    Major recent improvements in biotechnology have led to an accelerated production of DNA sequences. The completion of the human genome sequence, along with the genomes of more than two hundred other species, has marked the arrival of the genome era. The ultimate goal is to understand the structure and function of genomes and their genes. This thesis has focused on the computational analysis of complementary DNA (cDNA) sequences. These are copies of mRNA transcripts that correspond to the coding regions of genomes. Studying the expression patterns of genes is essential for understanding gene function. Many gene expression profiling techniques generate short sequence tags that derive from transcripts. A pilot study was performed to assess the feasibility of using the pyrosequencing platform for gene expression analysis. The sequences generated by pyrosequencing in most cases (ā‰ˆ 85%) were long enough (&gt; 18 nucleotides) to uniquely identify the corresponding transcripts through database searches. Aspects of transcript identification by short sequence tags were further investigated in a number of public databases, revealing that a tag length 16-17 nucleotides was sufficient for unique identifi- cation. Longer transcript representations are obtained from expressed sequence tag (EST) sequencing. Method development for the analysis and maintenance of large EST data sets has been performed on data from poplar, which is a tree of commercial interest to the forest biotechnology industry. In 2003 a large ESTsequencing project reached &gt; 100 000 reads, providing a unique resource for tree biology research. ESTs have been grouped into clusters and singletons that represent potential genes. Preliminary analyses have estimated gene content in Populus to be very similar to that of model organism Arabidopsis thaliana. EST data collections provide a rich source for mining polymorphisms. A software application was developed and applied to EST data from two Populus species, and candidate single nucleotide polymorphisms (SNPs) were recorded. A study of genetic variation between the species revealed a striking similarity, with orthologous pairs being &gt; 98% identical on the protein level. Keywords: cDNA, EST, gene expression, SNP, SAGE, polymorphism, assembly, clustering, DNA sequencing, pyrosequencing, mRNA transcript, orthology, tree biotechnology, restriction enzym

    Tentative mapping of transcription-induced interchromosomal interaction using chimeric EST and mRNA data.

    Get PDF
    Recent studies on chromosome conformation show that chromosomes colocalize in the nucleus, bringing together active genes in transcription factories. This spatial proximity of actively transcribing genes could provide a means for RNA interaction at the transcript level. We have screened public databases for chimeric EST and mRNA sequences with the intent of mapping transcription-induced interchromosomal interactions. We suggest that chimeric transcripts may be the result of close encounters of active genes, either as functional products or "noise" in the transcription process, and that they could be used as probes for chromosome interactions. We have found a total of 5,614 chimeric ESTs and 587 chimeric mRNAs that meet our selection criteria. Due to their higher quality, the mRNA findings are of particular interest and we hope that they may serve as food for thought for specialists in diverse areas of molecular biology
    • ā€¦
    corecore