612 research outputs found
In silico prediction of housekeeping long intergenic non-coding RNAs reveals HKlincR1 as an essential player in lung cancer cell survival
Prioritising long intergenic noncoding RNAs (lincRNAs) for functional characterisation is a significant challenge. Here we applied computational approaches to discover lincRNAs expected to play a critical housekeeping (HK) role within the cell. Using the Illumina Human BodyMap RNA sequencing dataset as a starting point, we first identified lincRNAs ubiquitously expressed across a panel of human tissues. This list was then further refined by reference to conservation score, secondary structure and promoter DNA methylation status. Finally, we used tumour expression and copy number data to identify lincRNAs rarely downregulated or deleted in multiple tumour types. The resulting list of candidate essential lincRNAs was then subjected to co-expression analyses using independent data from ENCODE and The Cancer Genome Atlas (TCGA). This identified a substantial subset with a predicted role in DNA replication and cell cycle regulation. One of these, HKlincR1, was selected for further characterisation. Depletion of HKlincR1 affected cell growth in multiple lung cancer cell lines, and led to disruption of genes involved in cell growth and viability. In addition, HKlincR1 expression was correlated with overall survival in lung adenocarcinoma patients. Our in silico studies therefore reveal a set of housekeeping noncoding RNAs of interest both in terms of their role in normal homeostasis, and their relevance in tumour growth and maintenance
An annotation infrastructure for the analysis and interpretation of Affymetrix exon array data
An annotation database (X:MAP) and BioConductor/R package (exonmap) have been developed to support fine-grained analysis of exon array data
Mutation pattern analysis reveals polygenic mini-drivers associated with relapse after surgery in lung adenocarcinoma
The genomic lesions found in malignant tumours exhibit a striking degree of heterogeneity. Many tumours lack a known driver mutation, and their genetic basis is unclear. By mapping the somatic mutations identified in primary lung adenocarcinomas onto an independent coexpression network derived from normal tissue, we identify a critical gene network enriched for metastasis-associated genes. While individual genes within this module were rarely mutated, a significant accumulation of mutations within this geneset was predictive of relapse in lung cancer patients that have undergone surgery. Since it is the density of mutations within this module that is informative, rather than the status of any individual gene, these data are in keeping with a ‘mini-driver’ model of tumorigenesis in which multiple mutations, each with a weak effect, combine to form a polygenic driver with sufficient power to significantly alter cell behaviour and ultimately patient outcome. These polygenic mini-drivers therefore provide a means by which heterogeneous mutation patterns can generate the consistent hallmark changes in phenotype observed across tumours
Recommended from our members
Publisher Correction: In silico prediction of housekeeping long intergenic non-coding RNAs reveals HKlincR1 as an essential player in lung cancer cell survival
An amendment to this paper has been published and can be accessed via a link at the top of the paper
A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling
<p>Abstract</p> <p>Background</p> <p>RNA-Seq exploits the rapid generation of gigabases of sequence data by Massively Parallel Nucleotide Sequencing, allowing for the mapping and digital quantification of whole transcriptomes. Whilst previous comparisons between RNA-Seq and microarrays have been performed at the level of gene expression, in this study we adopt a more fine-grained approach. Using RNA samples from a normal human breast epithelial cell line (MCF-10a) and a breast cancer cell line (MCF-7), we present a comprehensive comparison between RNA-Seq data generated on the Applied Biosystems SOLiD platform and data from Affymetrix Exon 1.0ST arrays. The use of Exon arrays makes it possible to assess the performance of RNA-Seq in two key areas: detection of expression at the granularity of individual exons, and discovery of transcription outside annotated loci.</p> <p>Results</p> <p>We found a high degree of correspondence between the two platforms in terms of exon-level fold changes and detection. For example, over 80% of exons detected as expressed in RNA-Seq were also detected on the Exon array, and 91% of exons flagged as changing from Absent to Present on at least one platform had fold-changes in the same direction. The greatest detection correspondence was seen when the read count threshold at which to flag exons Absent in the SOLiD data was set to <it>t</it><1 suggesting that the background error rate is extremely low in RNA-Seq. We also found RNA-Seq more sensitive to detecting differentially expressed exons than the Exon array, reflecting the wider dynamic range achievable on the SOLiD platform. In addition, we find significant evidence of novel protein coding regions outside known exons, 93% of which map to Exon array probesets, and are able to infer the presence of thousands of novel transcripts through the detection of previously unreported exon-exon junctions.</p> <p>Conclusions</p> <p>By focusing on exon-level expression, we present the most fine-grained comparison between RNA-Seq and microarrays to date. Overall, our study demonstrates that data from a SOLiD RNA-Seq experiment are sufficient to generate results comparable to those produced from Affymetrix Exon arrays, even using only a single replicate from each platform, and when presented with a large genome.</p
The utility of MAS5 expression summary and detection call algorithms
<p>Abstract</p> <p>Background</p> <p>Used alone, the MAS5.0 algorithm for generating expression summaries has been criticized for high False Positive rates resulting from exaggerated variance at low intensities.</p> <p>Results</p> <p>Here we show, with replicated cell line data, that, when used alongside detection calls, MAS5 can be both selective and sensitive. A set of differentially expressed transcripts were identified that were found to be changing by MAS5, but unchanging by RMA and GCRMA. Subsequent analysis by real time PCR confirmed these changes. In addition, with the Latin square datasets often used to assess expression summary algorithms, filtered MAS5.0 was found to have performance approaching that of its peers.</p> <p>Conclusion</p> <p>When used alongside detection calls, MAS5 is a sensitive and selective algorithm for identifying differentially expressed genes.</p
Using large-scale genomics data to identify driver mutations in lung cancer: methods and challenges
Lung cancer is the commonest cause of cancer death in the world and carries a poor prognosis for most patients. While precision targeting of mutated proteins has given some successes for never- and light-smoking patients, there are no proven targeted therapies for the majority of smokers with the disease. Despite sequencing hundreds of lung cancers, known driver mutations are lacking for a majority of tumors. Distinguishing driver mutations from inconsequential passenger mutations in a given lung tumor is extremely challenging due to the high mutational burden of smoking-related cancers. Here we discuss the methods employed to identify driver mutations from these large datasets. We examine different approaches based on bioinformatics, in silico structural modeling and biological dependency screens and discuss the limitations of these approaches
An Integrated Mass-Spectrometry Pipeline Identifies Novel Protein Coding-Regions in the Human Genome
Background: Most protein mass spectrometry (MS) experiments rely on searches against a database of known or predicted proteins, limiting their ability as a gene discovery tool.Results: Using a search against an in silico translation of the entire human genome, combined with a series of annotation filters, we identified 346 putative novel peptides [False Discovery Rate (FDR), <5%] in a MS dataset derived from two human breast epithelial cell lines. A subset of these were then successfully validated by a different MS technique. Two of these correspond to novel isoforms of Heterogeneous Ribonuclear Proteins, while the rest correspond to novel loci.Conclusions: MS technology can be used for ab initio gene discovery in human data, which, since it is based on different underlying assumptions, identifies protein-coding genes not found by other techniques. As MS technology continues to evolve, such approaches will become increasingly powerful
Symbolic powers of monomial ideals and Cohen-Macaulay vertex-weighted digraphs
In this paper we study irreducible representations and symbolic Rees algebras
of monomial ideals. Then we examine edge ideals associated to vertex-weighted
oriented graphs. These are digraphs having no oriented cycles of length two
with weights on the vertices. For a monomial ideal with no embedded primes we
classify the normality of its symbolic Rees algebra in terms of its primary
components. If the primary components of a monomial ideal are normal, we
present a simple procedure to compute its symbolic Rees algebra using Hilbert
bases, and give necessary and sufficient conditions for the equality between
its ordinary and symbolic powers. We give an effective characterization of the
Cohen--Macaulay vertex-weighted oriented forests. For edge ideals of transitive
weighted oriented graphs we show that Alexander duality holds. It is shown that
edge ideals of weighted acyclic tournaments are Cohen--Macaulay and satisfy
Alexander dualityComment: Special volume dedicated to Professor Antonio Campillo, Springer, to
appea
- …