55 research outputs found
The zebrafish transcriptome during early development
<p>Abstract</p> <p>Background</p> <p>The transition from fertilized egg to embryo is accompanied by a multitude of changes in gene expression, and the transcriptional events that underlie these processes have not yet been fully characterized. In this study RNA-Seq is used to compare the transcription profiles of four early developmental stages in zebrafish (<it>Danio rerio</it>) on a global scale.</p> <p>Results</p> <p>An average of 79 M total reads were detected from the different stages. Out of the total number of reads 65% - 73% reads were successfully mapped and 36% - 44% out of those were uniquely mapped. The total number of detected unique gene transcripts was 11187, of which 10096 were present at 1-cell stage. The largest number of common transcripts was observed between 1-cell stage and 16-cell stage. An enrichment of gene transcripts with molecular functions of DNA binding, protein folding and processing as well as metal ion binding was observed with progression of development. The sequence data (accession number ERP000635) is available at the European Nucleotide Archive.</p> <p>Conclusion</p> <p>Clustering of expression profiles shows that a majority of the detected gene transcripts are present at steady levels, and thus a minority of the gene transcripts clusters as increasing or decreasing in expression over the four investigated developmental stages. The three earliest developmental stages were similar when comparing highly expressed genes, whereas the 50% epiboly stage differed from the other three stages in the identity of highly expressed genes, number of uniquely expressed genes and enrichment of GO molecular functions. Taken together, these observations indicate a major transition in gene regulation and transcriptional activity taking place between the 512-cell and 50% epiboly stages, in accordance with previous studies.</p
Transposon- and Genome Dynamics in the Fungal Genus Neurospora: Insights from Nearly Gapless Genome Assemblies
A large portion of nuclear DNA is composed of transposable element (TE) sequences, whose transposition is controlled by diverse host defense strategies in order to maintain genomic integrity. One such strategy is the fungal-specific Repeat-Induced Point mutation (RIP) that hyper-mutates repetitive DNA sequences. While RIP is found across Fungi, it has been shown to vary in efficiency. The filamentous ascomycete Neurospora crassa has been a pioneer in the study of RIP, but data on TEs and RIP from other species in the genus is limited. In this study, we investigated 18 nearly gapless genome assemblies of ten Neurospora species, which diverged from a common ancestor about 7 MYA, to determine and compare genome-wide TE distribution and their associated RIP patterns. Four of these assemblies, generated by PacBio technology, represent new genomic datasets. We showed that the TE contents between 8.7-18.9% covary with genome sizes that range between 37.8-43.9 Mb. Degraded copies of Long Terminal Repeat (LTR) retrotransposons were abundant among the identified TEs, and these are distributed across the genome at varying frequencies. In all investigated Neurospora genomes, TE sequences had signs of numerous C-to-T substitutions, suggesting that RIP occurred in all species, and accordingly, RIP signatures correlated with TE-dense regions in all genomes. In conclusion, essentially gapless genome assemblies allowed us to identify TEs in Neurospora genomes, and reveal that TEs contribute to genome size variation in this group. Our study suggests that TEs and RIP are highly correlated in each examined Neurospora species, and hence, the pattern of interaction is conserved over the investigated evolutionary timescale. Finally, with our results, we verify that RIP signatures can be used to facilitate the identification of TE-rich region in the genome. The comprehensive genomic dataset of Neurospora is a rich resource for further in-depth analyses of fungal genomes by the community
Differences in Gene Expression between Mouse and Human for Dynamically Regulated Genes in Early Embryo
Peer reviewe
Dominant Mutations in GRHL3 Cause Van der Woude Syndrome and Disrupt Oral Periderm Development
Mutations in interferon regulatory factor 6 (IRF6) account for ∼70% of cases of Van der Woude syndrome (VWS), the most common syndromic form of cleft lip and palate. In 8 of 45 VWS-affected families lacking a mutation in IRF6, we found coding mutations in grainyhead-like 3 (GRHL3). According to a zebrafish-based assay, the disease-associated GRHL3 mutations abrogated periderm development and were consistent with a dominant-negative effect, in contrast to haploinsufficiency seen in most VWS cases caused by IRF6 mutations. In mouse, all embryos lacking Grhl3 exhibited abnormal oral periderm and 17% developed a cleft palate. Analysis of the oral phenotype of double heterozygote (Irf6+/−;Grhl3+/−) murine embryos failed to detect epistasis between the two genes, suggesting that they function in separate but convergent pathways during palatogenesis. Taken together, our data demonstrated that mutations in two genes, IRF6 and GRHL3, can lead to nearly identical phenotypes of orofacial cleft. They supported the hypotheses that both genes are essential for the presence of a functional oral periderm and that failure of this process contributes to VWS
Ecological genomics in the Northern krill uncovers loci for local adaptation across ocean basins
29 pages, 4 figures, supplementary information https://doi.org/10.1038/s41467-024-50239-7.-- Data availability: The sequence data generated in this study have been deposited in the public European Nucleotide Archive (ENA) database under accession code PRJEB61785. The processed data and results are available at the SciLifeLab Data Repository at https://doi.org/10.17044/scilifelab.c.6626216. The genome assembly is available in ENA and NCBI under accession code GCA_964058975.1 and SNP datasets are available under accession code PRJEB77093 in the European Variation Archive (EVA). Subsets of the data are provided in the Supplementary Information. Source data is provided as a Source Data file. This study made use data from the following public databases: AlphaFold Protein Structure Database https://alphafold.ebi.ac.uk/; Climate Reanalyzer https://climatereanalyzer.org/; Dfam (Dfam_3.5) https://dfam.org/home; EggNOG (v5.0) http://eggnog5.embl.de/; FlyBase database (release FB2021_01) https://flybase.org/; GOrilla https://cbl-gorilla.cs.technion.ac.il/; GyDB2 https://gydb.org; HomeoDB https://homeodb.zoo.ox.ac.uk/; KrillDB2 https://krilldb2.bio.unipd.it/; MITOS2 http://mitos.bioinf.uni-leipzig.de/; NCBI Conserved Domain Database (CDD) https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi; NCBI Genome Database https://www.ncbi.nlm.nih.gov/genome/; NCBI RefSeq database (release 204) https://www.ncbi.nlm.nih.gov/refseq/; OrthoDB (v10.1) https://www.orthodb.org/; Pfam (release 34.0) http://pfam.xfam.org/; Repbase (RepBaseRepeatMaskerEdition 20181026) https://www.girinst.org/server/RepBase/; REXdb http://repeatexplorer.org/?page_id=918; ShinyGO 0.77 http://bioinformatics.sdstate.edu/go77/; SILVA rRNA database project (release 132) https://www.arb-silva.de/; The SWISS-MODEL Repository https://swissmodel.expasy.org/; TOPCONS https://topcons.cbr.su.se/; UniProtKB/Swiss-Prot https://www.uniprot.org/. Biological tissue from the reference specimen tissue is available in the LIB Biobank at Museum Koenig Bonn under accession ZFMK-TIS-82493. Three additional specimens are deposited under accessions ZFMK-TIS-82494 through ZFMK-TIS-82496. Source data are provided with this paper.-- Code availability: Public code is available at https://github.com/NBISweden/genecovr and https://github.com/andreaswallberg/Ecological-Genomics-Northern-Krill. A copy of the Github repositories is available on Zenodo: https://zenodo.org/doi/10.5281/zenodo.10827407Krill are vital as food for many marine animals but also impacted by global warming. To learn how they and other zooplankton may adapt to a warmer world we studied local adaptation in the widespread Northern krill (Meganyctiphanes norvegica). We assemble and characterize its large genome and compare genome-scale variation among 74 specimens from the colder Atlantic Ocean and warmer Mediterranean Sea. The 19 Gb genome likely evolved through proliferation of retrotransposons, now targeted for inactivation by extensive DNA methylation, and contains many duplicated genes associated with molting and vision. Analysis of 760 million SNPs indicates extensive homogenizing gene-flow among populations. Nevertheless, we detect signatures of adaptive divergence across hundreds of genes, implicated in photoreception, circadian regulation, reproduction and thermal tolerance, indicating polygenic adaptation to light and temperature. The top gene candidate for ecological adaptation was nrf-6, a lipid transporter with a Mediterranean variant that may contribute to early spring reproduction. Such variation could become increasingly important for fitness in Atlantic stocks. Our study underscores the widespread but uneven distribution of adaptive variation, necessitating characterization of genetic variation among natural zooplankton populations to understand their adaptive potential, predict risks and support ocean conservation in the face of climate changeOpen access funding provided by Uppsala University.With the institutional support of the ‘Severo Ochoa Centre of Excellence’ accreditation (CEX2019-000928-S)Peer reviewe
Dominant Mutations in GRHL3 Cause Van der Woude Syndrome and Disrupt Oral Periderm Development
Peer reviewe
Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology
Computational approaches for in-depth analysis of cDNA sequence tags
Major recent improvements in biotechnology have led to an accelerated production of DNA sequences. The completion of the human genome sequence, along with the genomes of more than two hundred other species, has marked the arrival of the genome era. The ultimate goal is to understand the structure and function of genomes and their genes. This thesis has focused on the computational analysis of complementary DNA (cDNA) sequences. These are copies of mRNA transcripts that correspond to the coding regions of genomes. Studying the expression patterns of genes is essential for understanding gene function. Many gene expression profiling techniques generate short sequence tags that derive from transcripts. A pilot study was performed to assess the feasibility of using the pyrosequencing platform for gene expression analysis. The sequences generated by pyrosequencing in most cases (≈ 85%) were long enough (> 18 nucleotides) to uniquely identify the corresponding transcripts through database searches. Aspects of transcript identification by short sequence tags were further investigated in a number of public databases, revealing that a tag length 16-17 nucleotides was sufficient for unique identifi- cation. Longer transcript representations are obtained from expressed sequence tag (EST) sequencing. Method development for the analysis and maintenance of large EST data sets has been performed on data from poplar, which is a tree of commercial interest to the forest biotechnology industry. In 2003 a large ESTsequencing project reached > 100 000 reads, providing a unique resource for tree biology research. ESTs have been grouped into clusters and singletons that represent potential genes. Preliminary analyses have estimated gene content in Populus to be very similar to that of model organism Arabidopsis thaliana. EST data collections provide a rich source for mining polymorphisms. A software application was developed and applied to EST data from two Populus species, and candidate single nucleotide polymorphisms (SNPs) were recorded. A study of genetic variation between the species revealed a striking similarity, with orthologous pairs being > 98% identical on the protein level. Keywords: cDNA, EST, gene expression, SNP, SAGE, polymorphism, assembly, clustering, DNA sequencing, pyrosequencing, mRNA transcript, orthology, tree biotechnology, restriction enzym
- …