73 research outputs found

    Massively Parallel Sequencing of Human Urinary Exosome/Microvesicle RNA Reveals a Predominance of Non-Coding RNA

    Get PDF
    Intact RNA from exosomes/microvesicles (collectively referred to as microvesicles) has sparked much interest as potential biomarkers for the non-invasive analysis of disease. Here we use the Illumina Genome Analyzer to determine the comprehensive array of nucleic acid reads present in urinary microvesicles. Extraneous nucleic acids were digested using RNase and DNase treatment and the microvesicle inner nucleic acid cargo was analyzed with and without DNase digestion to examine both DNA and RNA sequences contained in microvesicles. Results revealed that a substantial proportion (∼87%) of reads aligned to ribosomal RNA. Of the non-ribosomal RNA sequences, ∼60% aligned to non-coding RNA and repeat sequences including LINE, SINE, satellite repeats, and RNA repeats (tRNA, snRNA, scRNA and srpRNA). The remaining ∼40% of non-ribosomal RNA reads aligned to protein coding genes and splice sites encompassing approximately 13,500 of the known 21,892 protein coding genes of the human genome. Analysis of protein coding genes specific to the renal and genitourinary tract revealed that complete segments of the renal nephron and collecting duct as well as genes indicative of the bladder and prostate could be identified. This study reveals that the entire genitourinary system may be mapped using microvesicle transcript analysis and that the majority of non-ribosomal RNA sequences contained in microvesicles is potentially functional non-coding RNA, which play an emerging role in cell regulation

    RNA-SeQC: RNA-seq metrics for quality control and process optimization

    Get PDF
    Summary: RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3′/5′ bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis

    Atlas of Signaling for Interpretation of Microarray Experiments

    Get PDF
    Microarray-based expression profiling of living systems is a quick and inexpensive method to obtain insights into the nature of various diseases and phenotypes. A typical microarray profile can yield hundreds or even thousands of differentially expressed genes and finding biologically plausible themes or regulatory mechanisms underlying these changes is a non-trivial and daunting task. We describe a novel approach for systems-level interpretation of microarray expression data using a manually constructed “overview” pathway depicting the main cellular signaling channels (Atlas of Signaling). Currently, the developed pathway focuses on signal transduction from surface receptors to transcription factors and further transcriptional regulation of cellular “workhorse” proteins. We show how the constructed Atlas of Signaling in combination with an enrichment analysis algorithm allows quick identification and visualization of the main signaling cascades and cellular processes affected in a gene expression profiling experiment. We validate our approach using several publicly available gene expression datasets

    ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining

    Get PDF
    Background New systems biology studies require researchers to understand how interplay among myriads of biomolecular entities is orchestrated in order to achieve high-level cellular and physiological functions. Many software tools have been developed in the past decade to help researchers visually navigate large networks of biomolecular interactions with built-in template-based query capabilities. To further advance researchers' ability to interrogate global physiological states of cells through multi-scale visual network explorations, new visualization software tools still need to be developed to empower the analysis. A robust visual data analysis platform driven by database management systems to perform bi-directional data processing-to-visualizations with declarative querying capabilities is needed. Results We developed ProteoLens as a JAVA-based visual analytic software tool for creating, annotating and exploring multi-scale biological networks. It supports direct database connectivity to either Oracle or PostgreSQL database tables/views, on which SQL statements using both Data Definition Languages (DDL) and Data Manipulation languages (DML) may be specified. The robust query languages embedded directly within the visualization software help users to bring their network data into a visualization context for annotation and exploration. ProteoLens supports graph/network represented data in standard Graph Modeling Language (GML) formats, and this enables interoperation with a wide range of other visual layout tools. The architectural design of ProteoLens enables the de-coupling of complex network data visualization tasks into two distinct phases: 1) creating network data association rules, which are mapping rules between network node IDs or edge IDs and data attributes such as functional annotations, expression levels, scores, synonyms, descriptions etc; 2) applying network data association rules to build the network and perform the visual annotation of graph nodes and edges according to associated data values. We demonstrated the advantages of these new capabilities through three biological network visualization case studies: human disease association network, drug-target interaction network and protein-peptide mapping network. Conclusion The architectural design of ProteoLens makes it suitable for bioinformatics expert data analysts who are experienced with relational database management to perform large-scale integrated network visual explorations. ProteoLens is a promising visual analytic platform that will facilitate knowledge discoveries in future network and systems biology studies

    Comparative analysis of RNA sequencing methods for degraded or low-input samples

    Get PDF
    available in PMC 2014 January 01RNA-seq is an effective method for studying the transcriptome, but it can be difficult to apply to scarce or degraded RNA from fixed clinical samples, rare cell populations or cadavers. Recent studies have proposed several methods for RNA-seq of low-quality and/or low-quantity samples, but the relative merits of these methods have not been systematically analyzed. Here we compare five such methods using metrics relevant to transcriptome annotation, transcript discovery and gene expression. Using a single human RNA sample, we constructed and sequenced ten libraries with these methods and compared them against two control libraries. We found that the RNase H method performed best for chemically fragmented, low-quality RNA, and we confirmed this through analysis of actual degraded samples. RNase H can even effectively replace oligo(dT)-based methods for standard RNA-seq. SMART and NuGEN had distinct strengths for measuring low-quantity RNA. Our analysis allows biologists to select the most suitable methods and provides a benchmark for future method development.National Institutes of Health (U.S.) (Pioneer Award DP1-OD003958-01)National Human Genome Research Institute (U.S.) (NHGRI) 1P01HG005062-01)National Human Genome Research Institute (U.S.) (NHGRI Center of Excellence in Genome Science Award 1P50HG006193-01)Howard Hughes Medical Institute (Investigator)Merkin Family Foundation for Stem Cell ResearchBroad Institute of MIT and Harvard (Klarman Cell Observatory)National Human Genome Research Institute (U.S.) (NHGRI grant HG03067)Fonds voor Wetenschappelijk Onderzoek--Vlaandere

    Inhibitor-Sensitive FGFR2 and FGFR3 Mutations in Lung Squamous Cell Carcinoma

    Get PDF
    A comprehensive description of genomic alterations in lung squamous cell carcinoma (lung SqCC) has recently been reported, enabling the identification of genomic events that contribute to the oncogenesis of this disease. In lung SqCC, one of the most frequently altered receptor tyrosine kinase families is the fibroblast growth factor receptor (FGFR) family, with amplification or mutation observed in all four family members. Here, we describe the oncogenic nature of mutations observed in FGFR2 and FGFR3, which are each observed in 3% of samples, for a mutation rate of 6% across both genes. Using cell culture and xenograft models, we show that several of these mutations drive cellular transformation. Transformation can be reversed by small molecule FGFR inhibitors currently being developed for clinical use. We also show that mutations in the extracellular domains of FGFR2 lead to constitutive FGFR dimerization. Additionally, we report a patient with an FGFR2-mutated oral squamous cell carcinoma who responded to the multi-targeted tyrosine kinase inhibitor pazopanib. These findings provide new insights into driving oncogenic events in a subset of lung squamous cancers, and recommend future clinical studies with FGFR inhibitors in patients with lung and head and neck SqCC
    corecore