87 research outputs found

    Stress-inducible protein 1: a bioinformatic analysis of the human, mouse and yeast STI1 gene structure

    Get PDF
    Stress-inducible protein 1 (Sti1) is a 60 kDa eukaryotic protein that is important under stress and non-stress conditions. Human Sti1 is also known as the Hsp70/Hsp90 organising protein (Hop) that coordinates the functional cooperation of heat shock protein 70 (Hsp70) and heat shock protein 90 (Hsp90) during the folding of various transcription factors and kinases, including certain oncogenic proteins and prion proteins. Limited studies have been conducted on the STI1 gene structure. Thus, the aim of this study was to develop a comprehensive description of human STI1 (hSTI1), mouse STI1 (mSTI1), and yeast STI1 (ySTI1) genes, using a bioinformatic approach. Genes encoded near the STI1 loci were identified for the three organisms using National Centre for Biotechnology Information (NCBI) MapViewer and the Saccharomyces Genome Database. Exon/intron boundaries were predicted using Hidden Markov model gene prediction software (HMMGene) and Genscan, and by alignment of the mRNA sequence with the genomic DNA sequence. Transcription factor binding sites (TFBS) were predicted by scanning the region 1000 base pairs (bp) upstream of the STI1 orthologues’ transcription start site (TSS) with Alibaba, Transcription element search software (TESS) and Transcription factor search (TFSearch). The promoter region was defined by comparing the number, type and position of TFBS across the orthologous STI1 genes. Additional putative TFBS were identified for ySTI1 by searching with software that aligns nucleic acid conserved elements (AlignACE) for over-represented motifs in the region upstream of the TSS of genes thought to be co-regulated with ySTI1. This study showed that hSTI1 and mSTI1 occur in a region of synteny with a number of genes of related function. Both hSTI1 and mSTI1 comprised 14 putative exons, while ySTI1 was encoded on a single exon. Human and mouse STI1 shared a perfectly conserved 55 bp region spanning their predicted TSS, although their TATA boxes were not conserved. A putative CpG island was identified in the region from -500 to +100 bp relative to the hSTI1 and mSTI1 TSS. This region overlapped with a region of high TFBS density, suggesting that the core promoter region was located in the region approximately 100 to 200 bp upstream of the TSS. Several conserved clusters of TFBS were also identified upstream of this promoter region, including binding sites for stimulatory protein 1 (Sp1), heat shock factor (HSF), nuclear factor kappa B (NF-kappaB), and the cAMP/enhancer binding protein (C/EBP). Microarray data suggested that ySTI1 was co-regulated with several heat shock proteins and substrates of the Hsp70/Hsp90 heterocomplex, and several putative regulatory elements were identified in the upstream region of these co-regulated genes, including a motif for HSF binding. The results of this research suggest several avenues of future experimental work, including the confirmation of the proposed core promoter, upstream regulatory elements, and CpG island, and the investigation into the co-regulation of mammalian STI1 with its surrounding genes. These results could also be used to inform STI1 gene knockout experiments in mice, to assess the biological importance of mammalian STI1

    From identification to validation to gene count

    Get PDF
    The current GENCODE gene count of ~ 30,000, including 21,727 protein-coding and 8,483 RNA genes, is significantly lower than the 100,000 genes anticipated by early estimates. Accurate annotation of protein-coding and non-coding genes and pseudogenes is essential in calculating the true gene count and gaining insight into human evolution. As part of the GENCODE Consortium, the HAVANA team produces high quality manual gene annotation, which forms the basis for the reference gene set being used by the ENCODE project and provides a rich annotation of alternative splice variants and assignment of functional potential. However, the protein-coding potential of some splice variants is uncertain and valid splice variants can remain unannotated if they are absent from current cDNA libraries. Recent technological developments in sequencing and mass spectrometry have created a vast amount of new transcript and protein data that facilitate the identification and validation of new and existing transcripts, while harboring their own limitations and problems

    The draft genomes of soft-shell turtle and green sea turtle yield insights into the development and evolution of the turtle-specific body plan

    Get PDF
    The unique anatomical features of turtles have raised unanswered questions about the origin of their unique body plan. We generated and analyzed draft genomes of the soft-shell turtle (Pelodiscus sinensis) and the green sea turtle (Chelonia mydas); our results indicated the close relationship of the turtles to the bird-crocodilian lineage, from which they split ~267.9–248.3 million years ago (Upper Permian to Triassic). We also found extensive expansion of olfactory receptor genes in these turtles. Embryonic gene expression analysis identified an hourglass-like divergence of turtle and chicken embryogenesis, with maximal conservation around the vertebrate phylotypic period, rather than at later stages that show the amniote-common pattern. Wnt5a expression was found in the growth zone of the dorsal shell, supporting the possible co-option of limb-associated Wnt signaling in the acquisition of this turtle-specific novelty. Our results suggest that turtle evolution was accompanied by an unexpectedly conservative vertebrate phylotypic period, followed by turtle-specific repatterning of development to yield the novel structure of the shell

    Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences

    Get PDF
    We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian (\u27marsupial\u27) species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation. ©2007 Nature Publishing Group

    Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

    Get PDF
    The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228

    An improved pig reference genome sequence to enable pig genetics and genomics research.

    Get PDF
    BACKGROUND: The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model given its similarity in size, anatomy, physiology, metabolism, pathology, and pharmacology to humans. The draft reference genome (Sscrofa10.2) of a purebred Duroc female pig established using older clone-based sequencing methods was incomplete, and unresolved redundancies, short-range order and orientation errors, and associated misassembled genes limited its utility. RESULTS: We present 2 annotated highly contiguous chromosome-level genome assemblies created with more recent long-read technologies and a whole-genome shotgun strategy, 1 for the same Duroc female (Sscrofa11.1) and 1 for an outbred, composite-breed male (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy than Sscrofa10.2. CONCLUSIONS: These highly contiguous assemblies plus annotation of a further 11 short-read assemblies provide an unprecedented view of the genetic make-up of this important agricultural and biomedical model species. We propose that the improved Duroc assembly (Sscrofa11.1) become the reference genome for genomic research in pigs

    Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes

    Get PDF
    Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology

    Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes.

    Get PDF
    Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology

    Avianbase: a community resource for bird genomics

    Get PDF
    Giving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium
    corecore