44 research outputs found

    Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing.

    Get PDF
    Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community

    Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome

    Get PDF
    YesHuman identification from biological material is largely dependent on the ability to characterize genetic polymorphisms in DNA. Unfortunately, DNA can degrade in the environment, sometimes below the level at which it can be amplified by PCR. Protein however is chemically more robust than DNA and can persist for longer periods. Protein also contains genetic variation in the form of single amino acid polymorphisms. These can be used to infer the status of non-synonymous single nucleotide polymorphism alleles. To demonstrate this, we used mass spectrometry-based shotgun proteomics to characterize hair shaft proteins in 66 European-American subjects. A total of 596 single nucleotide polymorphism alleles were correctly imputed in 32 loci from 22 genes of subjects’ DNA and directly validated using Sanger sequencing. Estimates of the probability of resulting individual non-synonymous single nucleotide polymorphism allelic profiles in the European population, using the product rule, resulted in a maximum power of discrimination of 1 in 12,500. Imputed non-synonymous single nucleotide polymorphism profiles from European–American subjects were considerably less frequent in the African population (maximum likelihood ratio = 11,000). The converse was true for hair shafts collected from an additional 10 subjects with African ancestry, where some profiles were more frequent in the African population. Genetically variant peptides were also identified in hair shaft datasets from six archaeological skeletal remains (up to 260 years old). This study demonstrates that quantifiable measures of identity discrimination and biogeographic background can be obtained from detecting genetically variant peptides in hair shaft protein, including hair from bioarchaeological contexts.The Technology Commercialization Innovation Program (Contracts #121668, #132043) of the Utah Governors Office of Commercial Development, the Scholarship Activitie

    Widespread variation in molecular interactions and regulatory properties among transcription factor isoforms.

    Full text link
    Most human Transcription factors (TFs) genes encode multiple protein isoforms differing in DNA binding domains, effector domains, or other protein regions. The global extent to which this results in functional differences between isoforms remains unknown. Here, we systematically compared 693 isoforms of 246 TF genes, assessing DNA binding, protein binding, transcriptional activation, subcellular localization, and condensate formation. Relative to reference isoforms, two-thirds of alternative TF isoforms exhibit differences in one or more molecular activities, which often could not be predicted from sequence. We observed two primary categories of alternative TF isoforms: "rewirers" and "negative regulators", both of which were associated with differentiation and cancer. Our results support a model wherein the relative expression levels of, and interactions involving, TF isoforms add an understudied layer of complexity to gene regulatory networks, demonstrating the importance of isoform-aware characterization of TF functions and providing a rich resource for further studies

    The central nervous system transcriptome of the weakly electric brown ghost knifefish (Apteronotus leptorhynchus): de novo assembly, annotation, and proteomics validation

    Get PDF

    Large-Scale Mass Spectrometric Detection of Variant Peptides Resulting from Nonsynonymous Nucleotide Differences

    No full text
    Each individual carries thousands of nonsynonymous single nucleotide variants (nsSNVs) in their genome, each corresponding to a single amino acid polymorphism (SAP) in the encoded proteins. It is important to be able to directly detect and quantify these variations at the protein level to study post-transcriptional regulation, differential allelic expression, and other important biological processes. However, such variant peptides are not generally detected in standard proteomic analyses due to their absence from the generic databases that are employed for mass spectrometry searching. Here we extend previous work that demonstrated the use of customized SAP databases constructed from sample-matched RNA-Seq data. We collected deep-coverage RNA-Seq data from the Jurkat cell line, compiled the set of nsSNVs that are expressed, used this information to construct a customized SAP database, and searched it against deep-coverage shotgun MS data obtained from the same sample. This approach enabled the detection of 421 SAP peptides mapping to 395 nsSNVs. We compared these peptides to peptides identified from a large generic search database containing all known nsSNVs (dbSNP) and found that more than 70% of the SAP peptides from this dbSNP-derived search were not supported by the RNA-Seq data and thus are likely false positives. Next, we increased the SAP coverage from the RNA-Seq derived database by utilizing multiple protease digestions, thereby increasing variant detection to 695 SAP peptides mapping to 504 nsSNV sites. These detected SAP peptides corresponded to moderate to high abundance transcripts (30+ transcripts per million, TPM). The SAP peptides included 192 allelic pairs; the relative expression levels of the two alleles were evaluated for 51 of those pairs and were found to be comparable in all cases

    Large-Scale Mass Spectrometric Detection of Variant Peptides Resulting from Nonsynonymous Nucleotide Differences

    No full text
    Each individual carries thousands of nonsynonymous single nucleotide variants (nsSNVs) in their genome, each corresponding to a single amino acid polymorphism (SAP) in the encoded proteins. It is important to be able to directly detect and quantify these variations at the protein level to study post-transcriptional regulation, differential allelic expression, and other important biological processes. However, such variant peptides are not generally detected in standard proteomic analyses due to their absence from the generic databases that are employed for mass spectrometry searching. Here we extend previous work that demonstrated the use of customized SAP databases constructed from sample-matched RNA-Seq data. We collected deep-coverage RNA-Seq data from the Jurkat cell line, compiled the set of nsSNVs that are expressed, used this information to construct a customized SAP database, and searched it against deep-coverage shotgun MS data obtained from the same sample. This approach enabled the detection of 421 SAP peptides mapping to 395 nsSNVs. We compared these peptides to peptides identified from a large generic search database containing all known nsSNVs (dbSNP) and found that more than 70% of the SAP peptides from this dbSNP-derived search were not supported by the RNA-Seq data and thus are likely false positives. Next, we increased the SAP coverage from the RNA-Seq derived database by utilizing multiple protease digestions, thereby increasing variant detection to 695 SAP peptides mapping to 504 nsSNV sites. These detected SAP peptides corresponded to moderate to high abundance transcripts (30+ transcripts per million, TPM). The SAP peptides included 192 allelic pairs; the relative expression levels of the two alleles were evaluated for 51 of those pairs and were found to be comparable in all cases

    Large-Scale Mass Spectrometric Detection of Variant Peptides Resulting from Nonsynonymous Nucleotide Differences

    No full text
    Each individual carries thousands of nonsynonymous single nucleotide variants (nsSNVs) in their genome, each corresponding to a single amino acid polymorphism (SAP) in the encoded proteins. It is important to be able to directly detect and quantify these variations at the protein level to study post-transcriptional regulation, differential allelic expression, and other important biological processes. However, such variant peptides are not generally detected in standard proteomic analyses due to their absence from the generic databases that are employed for mass spectrometry searching. Here we extend previous work that demonstrated the use of customized SAP databases constructed from sample-matched RNA-Seq data. We collected deep-coverage RNA-Seq data from the Jurkat cell line, compiled the set of nsSNVs that are expressed, used this information to construct a customized SAP database, and searched it against deep-coverage shotgun MS data obtained from the same sample. This approach enabled the detection of 421 SAP peptides mapping to 395 nsSNVs. We compared these peptides to peptides identified from a large generic search database containing all known nsSNVs (dbSNP) and found that more than 70% of the SAP peptides from this dbSNP-derived search were not supported by the RNA-Seq data and thus are likely false positives. Next, we increased the SAP coverage from the RNA-Seq derived database by utilizing multiple protease digestions, thereby increasing variant detection to 695 SAP peptides mapping to 504 nsSNV sites. These detected SAP peptides corresponded to moderate to high abundance transcripts (30+ transcripts per million, TPM). The SAP peptides included 192 allelic pairs; the relative expression levels of the two alleles were evaluated for 51 of those pairs and were found to be comparable in all cases
    corecore