35 research outputs found

    ISSD Version 2.0: taxonomic range extended

    Get PDF
    Two more organisms from different taxonomic groups were added to a new version of the integrated Sequence-Structure Database (ISSD). ISSD serves as an integrated source of sequence and structure information for the analysis of correlations between mRNA synonymous codon usage and threedimensional structure of the encoded proteins. ISSD now holds 88 non-homologous Escherichia coli proteins and 25 yeast Saccharomyces cerevisiae proteins in addition to the expanded set of mammalian proteins, which includes 166 proteins (107 in ISSD Version 1.0). Comparison of ISSD sequences with organism-specific codon usage data derived from CUTG database shows that it is a representative subset of the genbank coding sequences data. Preliminary results of the statistical analysis confirm that sequence-structure correlations observed by us earlier are also present in the upgraded ISSD (Version 2.0), including bacterial and yeast proteins. The ISSD version 2.0 release includes an improved web-based data search and retrieval system and is accessible via URL http://www.protein.bio.msu.su/issd/. ISSD can be also accessed at ExPASy, URL http://www.expasy.ch/swissmod/swiss-model.htm

    Transcriptomic insights into genetic diversity of protein-coding genes in X. laevis

    Get PDF
    © The Author(s), 2017. This is the author's version of the work and is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Developmental Biology 424 (2017): 181-188, doi:10.1016/j.ydbio.2017.02.019We characterize the genetic diversity of Xenopus laevis strains using RNA-seq data and allele- specific analysis. This data provides a catalogue of coding variation, which can be used for improving the genomic sequence, as well as for better sequence alignment, probe design, and proteomic analysis. In addition, we paint a broad picture of the genetic landscape of the species by functionally annotating different classes of mutations with a well-established prediction tool (PolyPhen-2). Further, we specifically compare the variation in the progeny of four crosses: inbred genomic (J)- strain, outbred albino (B)-strain, and two hybrid crosses of J and B strains. We identify a subset of mutations specific to the B strain, which allows us to investigate the selection pressures affecting duplicated genes in this allotetraploid. From these crosses we find the ratio of non-synonymous to synonymous mutations is lower in duplicated genes, which suggests that they are under greater purifying selection. Surprisingly, we also find that function-altering ("damaging") mutations constitute a greater fraction of the non-synonymous variants in this group, which suggests a role for subfunctionalization in coding variation affecting duplicated genes.L.P. was supported by the NIH grant R01HD073104, also L.P., A.N. and V.S. were supported by R21HD81675, M.H. and E.P. by P40 OD010997.2018-03-0

    Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

    Get PDF
    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view about chromatin structure has emerged, including its interrelationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded novel mechanistic and evolutionary insights about the functional landscape of the human genome. Together, these studies are defining a path forward to pursue a more-comprehensive characterisation of human genome function

    De novo and biallelic DEAF1 variants cause a phenotypic spectrum.

    Get PDF
    PURPOSE: To investigate the effect of different DEAF1 variants on the phenotype of patients with autosomal dominant and recessive inheritance patterns and on DEAF1 activity in vitro. METHODS: We assembled a cohort of 23 patients with de novo and biallelic DEAF1 variants, described the genotype-phenotype correlation, and investigated the differential effect of de novo and recessive variants on transcription assays using DEAF1 and Eif4g3 promoter luciferase constructs. RESULTS: The proportion of the most prevalent phenotypic features, including intellectual disability, speech delay, motor delay, autism, sleep disturbances, and a high pain threshold, were not significantly different in patients with biallelic and pathogenic de novo DEAF1 variants. However, microcephaly was exclusively observed in patients with recessive variants (p < 0.0001). CONCLUSION: We propose that different variants in the DEAF1 gene result in a phenotypic spectrum centered around neurodevelopmental delay. While a pathogenic de novo dominant variant would also incapacitate the product of the wild-type allele and result in a dominant-negative effect, a combination of two recessive variants would result in a partial loss of function. Because the clinical picture can be nonspecific, detailed phenotype information, segregation, and functional analysis are fundamental to determine the pathogenicity of novel variants and to improve the care of these patients

    An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data

    No full text
    We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and f,y angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNAsequencesshowing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optim..

    Small open reading frames: a comparative genetics approach to validation

    No full text
    Abstract Open reading frames (ORFs) with fewer than 100 codons are generally not annotated in genomes, although bona fide genes of that size are known. Newer biochemical studies have suggested that thousands of small protein-coding ORFs (smORFs) may exist in the human genome, but the true number and the biological significance of the micropeptides they encode remain uncertain. Here, we used a comparative genomics approach to identify high-confidence smORFs that are likely protein-coding. We identified 3,326 high-confidence smORFs using constraint within human populations and evolutionary conservation as additional lines of evidence. Next, we validated that, as a group, our high-confidence smORFs are conserved at the amino-acid level rather than merely residing in highly conserved non-coding regions. Finally, we found that high-confidence smORFs are enriched among disease-associated variants from GWAS. Overall, our results highlight that smORF-encoded peptides likely have important functional roles in human disease

    Protein identification pipeline for the homology-driven proteomics

    No full text
    Homology-driven proteomics is a major tool to characterize proteomes of organisms with unsequenced genomes. This paper addresses practical aspects of automated homology-driven protein identifications by LC-MS/MS on a hybrid LTQ orbitrap mass spectrometer. All essential software elements supporting the presented pipeline are either hosted at the publicly accessible web server, or are available for free download. (C) 2008 Elsevier B.V. All rights reserved.U.S. National Institutes of Health (NIH)NIH NIGMS[1R01GM070986-01A1

    Correction: Hypermutable Non-Synonymous Sites Are under Stronger Negative Selection

    No full text
    This corrects the article on p. e1000281 in Vol. 4, PMID: 19043566. Hypermutable Non-Synonymous Sites Are Under Stronger Negative Selection
    corecore