44 research outputs found
Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing.
Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community
Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome
YesHuman identification from biological material is largely dependent on the ability to characterize genetic polymorphisms in DNA. Unfortunately, DNA can degrade in the environment, sometimes below the level at which it can be amplified by PCR. Protein however is chemically more robust than DNA and can persist for longer periods. Protein also contains genetic variation in the form of single amino acid polymorphisms. These can be used to infer the status of non-synonymous single nucleotide polymorphism alleles. To demonstrate this, we used mass spectrometry-based shotgun proteomics to characterize hair shaft proteins in 66 European-American subjects. A total of 596 single nucleotide polymorphism alleles were correctly imputed in 32 loci from 22 genes of subjects’ DNA and directly validated using Sanger sequencing. Estimates of the probability of resulting individual non-synonymous single nucleotide polymorphism allelic profiles in the European population, using the product rule, resulted in a maximum power of discrimination of 1 in 12,500. Imputed non-synonymous single nucleotide polymorphism profiles from European–American subjects were considerably less frequent in the African population (maximum likelihood ratio = 11,000). The converse was true for hair shafts collected from an additional 10 subjects with African ancestry, where some profiles were more frequent in the African population. Genetically variant peptides were also identified in hair shaft datasets from six archaeological skeletal remains (up to 260 years old). This study demonstrates that quantifiable measures of identity discrimination and biogeographic background can be obtained from detecting genetically variant peptides in hair shaft protein, including hair from bioarchaeological contexts.The Technology Commercialization Innovation Program (Contracts #121668, #132043) of the Utah Governors Office of Commercial Development, the Scholarship Activitie
Widespread variation in molecular interactions and regulatory properties among transcription factor isoforms.
Most human Transcription factors (TFs) genes encode multiple protein isoforms differing in DNA binding domains, effector domains, or other protein regions. The global extent to which this results in functional differences between isoforms remains unknown. Here, we systematically compared 693 isoforms of 246 TF genes, assessing DNA binding, protein binding, transcriptional activation, subcellular localization, and condensate formation. Relative to reference isoforms, two-thirds of alternative TF isoforms exhibit differences in one or more molecular activities, which often could not be predicted from sequence. We observed two primary categories of alternative TF isoforms: "rewirers" and "negative regulators", both of which were associated with differentiation and cancer. Our results support a model wherein the relative expression levels of, and interactions involving, TF isoforms add an understudied layer of complexity to gene regulatory networks, demonstrating the importance of isoform-aware characterization of TF functions and providing a rich resource for further studies
Large-Scale Mass Spectrometric Detection of Variant Peptides Resulting from Nonsynonymous Nucleotide Differences
Each
individual carries thousands of nonsynonymous single nucleotide
variants (nsSNVs) in their genome, each corresponding to a single
amino acid polymorphism (SAP) in the encoded proteins. It is important
to be able to directly detect and quantify these variations at the
protein level to study post-transcriptional regulation, differential
allelic expression, and other important biological processes. However,
such variant peptides are not generally detected in standard proteomic
analyses due to their absence from the generic databases that are
employed for mass spectrometry searching. Here we extend previous
work that demonstrated the use of customized SAP databases constructed
from sample-matched RNA-Seq data. We collected deep-coverage RNA-Seq
data from the Jurkat cell line, compiled the set of nsSNVs that are
expressed, used this information to construct a customized SAP database,
and searched it against deep-coverage shotgun MS data obtained from
the same sample. This approach enabled the detection of 421 SAP peptides
mapping to 395 nsSNVs. We compared these peptides to peptides identified
from a large generic search database containing all known nsSNVs (dbSNP)
and found that more than 70% of the SAP peptides from this dbSNP-derived
search were not supported by the RNA-Seq data and thus are likely
false positives. Next, we increased the SAP coverage from the RNA-Seq
derived database by utilizing multiple protease digestions, thereby
increasing variant detection to 695 SAP peptides mapping to 504 nsSNV
sites. These detected SAP peptides corresponded to moderate to high
abundance transcripts (30+ transcripts per million, TPM). The SAP
peptides included 192 allelic pairs; the relative expression levels
of the two alleles were evaluated for 51 of those pairs and were found
to be comparable in all cases
Large-Scale Mass Spectrometric Detection of Variant Peptides Resulting from Nonsynonymous Nucleotide Differences
Each
individual carries thousands of nonsynonymous single nucleotide
variants (nsSNVs) in their genome, each corresponding to a single
amino acid polymorphism (SAP) in the encoded proteins. It is important
to be able to directly detect and quantify these variations at the
protein level to study post-transcriptional regulation, differential
allelic expression, and other important biological processes. However,
such variant peptides are not generally detected in standard proteomic
analyses due to their absence from the generic databases that are
employed for mass spectrometry searching. Here we extend previous
work that demonstrated the use of customized SAP databases constructed
from sample-matched RNA-Seq data. We collected deep-coverage RNA-Seq
data from the Jurkat cell line, compiled the set of nsSNVs that are
expressed, used this information to construct a customized SAP database,
and searched it against deep-coverage shotgun MS data obtained from
the same sample. This approach enabled the detection of 421 SAP peptides
mapping to 395 nsSNVs. We compared these peptides to peptides identified
from a large generic search database containing all known nsSNVs (dbSNP)
and found that more than 70% of the SAP peptides from this dbSNP-derived
search were not supported by the RNA-Seq data and thus are likely
false positives. Next, we increased the SAP coverage from the RNA-Seq
derived database by utilizing multiple protease digestions, thereby
increasing variant detection to 695 SAP peptides mapping to 504 nsSNV
sites. These detected SAP peptides corresponded to moderate to high
abundance transcripts (30+ transcripts per million, TPM). The SAP
peptides included 192 allelic pairs; the relative expression levels
of the two alleles were evaluated for 51 of those pairs and were found
to be comparable in all cases
Large-Scale Mass Spectrometric Detection of Variant Peptides Resulting from Nonsynonymous Nucleotide Differences
Each
individual carries thousands of nonsynonymous single nucleotide
variants (nsSNVs) in their genome, each corresponding to a single
amino acid polymorphism (SAP) in the encoded proteins. It is important
to be able to directly detect and quantify these variations at the
protein level to study post-transcriptional regulation, differential
allelic expression, and other important biological processes. However,
such variant peptides are not generally detected in standard proteomic
analyses due to their absence from the generic databases that are
employed for mass spectrometry searching. Here we extend previous
work that demonstrated the use of customized SAP databases constructed
from sample-matched RNA-Seq data. We collected deep-coverage RNA-Seq
data from the Jurkat cell line, compiled the set of nsSNVs that are
expressed, used this information to construct a customized SAP database,
and searched it against deep-coverage shotgun MS data obtained from
the same sample. This approach enabled the detection of 421 SAP peptides
mapping to 395 nsSNVs. We compared these peptides to peptides identified
from a large generic search database containing all known nsSNVs (dbSNP)
and found that more than 70% of the SAP peptides from this dbSNP-derived
search were not supported by the RNA-Seq data and thus are likely
false positives. Next, we increased the SAP coverage from the RNA-Seq
derived database by utilizing multiple protease digestions, thereby
increasing variant detection to 695 SAP peptides mapping to 504 nsSNV
sites. These detected SAP peptides corresponded to moderate to high
abundance transcripts (30+ transcripts per million, TPM). The SAP
peptides included 192 allelic pairs; the relative expression levels
of the two alleles were evaluated for 51 of those pairs and were found
to be comparable in all cases