42 research outputs found
Latent rank change detection for analysis of splice-junction microarrays with nonlinear effects
Alternative splicing of gene transcripts greatly expands the functional
capacity of the genome, and certain splice isoforms may indicate specific
disease states such as cancer. Splice junction microarrays interrogate
thousands of splice junctions, but data analysis is difficult and error prone
because of the increased complexity compared to differential gene expression
analysis. We present Rank Change Detection (RCD) as a method to identify
differential splicing events based upon a straightforward probabilistic model
comparing the over- or underrepresentation of two or more competing isoforms.
RCD has advantages over commonly used methods because it is robust to false
positive errors due to nonlinear trends in microarray measurements. Further,
RCD does not depend on prior knowledge of splice isoforms, yet it takes
advantage of the inherent structure of mutually exclusive junctions, and it is
conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD
specifically identifies the biologically important cases when a splice junction
becomes more or less prevalent compared to other mutually exclusive junctions.
The example data is from different cell lines of glioblastoma tumors assayed
with Agilent microarrays.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS389 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Over-represented sequences located on UTRs are potentially involved in regulatory functions
Eukaryotic gene expression must be coordinated for the proper functioning of biological processes. This coordination can be achieved both at the transcriptional and post-transcriptional levels. In both cases, regulatory sequences placed at either promoter regions or on UTRs function as markers recognized by regulators that can then activate or repress different groups of genes according to necessity. While regulatory sequences involved in transcription are quite well documented, there is a lack of information on sequence elements involved in post-transcriptional regulation. We used a statistical over-representation method to identify novel regulatory elements located on UTRs. An exhaustive search approach was used to calculate the frequency of all possible n-mers (short nucleotide sequences) in 16,160 human genes of NCBI RefSeq sequences and to identify any peculiar usage of n-mers on UTRs. After a stringent filtering process, we identified circa 4,000 highly over-represented n-mers on UTRs. We provide evidence that these n-mers are potentially involved in regulatory functions. Identified n-mers overlap with previously identified binding sites for HuR and Tia1 and, AU-rich and GU-rich sequences. We determined also that over-represented n-mers are particularly enriched in a group of 159 genes directly involved in tumor formation. Finally, a method to cluster n-mer groups allowed the identification of putative gene networks.Over-represented sequences, UTRs, regulatory functions
Before It Gets Started: Regulating Translation at the 5ā² UTR
Translation regulation plays important roles in both normal physiological conditions and diseases states. This regulation requires cis-regulatory elements located mostly in 5ā² and 3ā² UTRs and trans-regulatory factors (e.g., RNA binding proteins (RBPs)) which recognize specific RNA features and interact with the translation machinery to modulate its activity. In this paper, we discuss important aspects of 5ā² UTR-mediated regulation by providing an overview of the characteristics and the function of the main elements present in this region, like uORF (upstream open reading frame), secondary structures, and RBPs binding motifs and different mechanisms of translation regulation and the impact they have on gene expression and human health when deregulated
A Two-Phase Innate Host Response to Alphavirus Infection Identified by mRNP-Tagging In Vivo
A concept fundamental to viral pathogenesis is that infection induces specific changes within the host cell, within specific tissues, or within the entire animal. These changes are reflected in a cascade of altered transcription patterns evident during infection. However, elucidation of this cascade in vivo has been limited by a general inability to distinguish changes occurring in the minority of infected cells from those in surrounding uninfected cells. To circumvent this inherent limitation of traditional gene expression profiling methods, an innovative mRNP-tagging technique was implemented to isolate host mRNA specifically from infected cells in vitro as well as in vivo following Venezuelan equine encephalitis virus (VEE) infection. This technique facilitated a direct characterization of the host defense response specifically within the first cells infected with VEE, while simultaneous total RNA analysis assessed the collective response of both the infected and uninfected cells. The result was a unique, multifaceted profile of the early response to VEE infection in primary dendritic cells, as well as in the draining lymph node, the initially targeted tissue in the mouse model. A dynamic environment of complex interactions was revealed, and suggested a two-step innate response in which activation of a subset of host genes in infected cells subsequently leads to activation of the surrounding uninfected cells. Our findings suggest that the application of viral mRNP-tagging systems, as introduced here, will facilitate a much more detailed understanding of the highly coordinated host response to infectious agents
The RNA-Binding Protein Musashi1 Affects Medulloblastoma Growth via a Network of Cancer- Related Genes and Is an Indicator of Poor Prognosis
Musashi1 (Msi1) is a highly conserved RNA-binding protein that is required during the development of the nervous system. Msi1 has been characterized as a stem cell marker, controlling the balance between self-renewal and differentiation, and has also been implicated in tumorigenesis, being highly expressed in multiple tumor types. We analyzed Msi1 expression in a large cohort of medulloblastoma samples and found that Msi1 is highly expressed in tumor tissue compared with normal cerebellum. Notably, high Msi1 expression levels proved to be a sign of poor prognosis. Msi1 expression was determined to be particularly high in molecular subgroups 3 and 4 of medulloblastoma. We determined that Msi1 is required for tumorigenesis because inhibition of Msi1 expression by small-interfering RNAs reduced the growth of Daoy medulloblastoma cells in xenografts. To characterize the participation of Msi1 in medulloblastoma, we conducted different high-throughput analyses. Ribonucleoprotein immunoprecipitation followed by microarray analysis (RIP-chip) was used to identify mRNA species preferentially associated with Msi1 protein in Daoy cells. We also used cluster analysis to identify genes with similar or opposite expression patterns to Msi1 in our medulloblastoma cohort. A network study identified RAC1, CTGF, SDCBP, SRC, PRL, and SHC1 as major nodes of an Msi1-associated network. Our results suggest that Msi1 functions as a regulator of multiple processes in medulloblastoma formation and could become an important therapeutic target
Mining gene functional networks to improve mass-spectrometry-based protein identification
Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly
Integrating shotgun proteomics and mRNA expression data to improve protein identification
Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information available, e.g. the probability of a protein's presence is likely to correlate with its mRNA concentration
Site identification in high-throughput RNA-protein interaction data
Motivation: Post-transcriptional and co-transcriptional regulation is a crucial link between genotype and phenotype. The central players are the RNA-binding proteins, and experimental technologies [such as cross-linking with immunoprecipitation-(CLIP-) and RIP-seq] for probing their activities have advanced rapidly over the course of the past decade. Statistically robust, flexible computational methods for binding site identification from high-throughput immunoprecipitation assays are largely lacking however.Results: We introduce a method for site identification which provides four key advantages over previous methods: (i) it can be applied on all variations of CLIP and RIP-seq technologies, (ii) it accurately models the underlying read-count distributions, (iii) it allows external covariates, such as transcript abundance (which we demonstrate is highly correlated with read count) to inform the site identification process and (iv) it allows for direct comparison of site usage across cell types or conditions. Ā© The Author 2012. Published by Oxford University Press. All rights reserved