131 research outputs found
MageComet—web application for harmonizing existing large-scale experiment descriptions
Motivation: Meta-analysis of large gene expression datasets obtained from public repositories requires consistently annotated data. Curation of such experiments, however, is an expert activity which involves repetitive manipulation of text. Existing tools for automated curation are few, which bottleneck the analysis pipeline
ArrayExpress—a public database of microarray experiments and gene expression profiles
ArrayExpress is a public database for high throughput functional genomics data. ArrayExpress consists of two parts—the ArrayExpress Repository, which is a MIAME supportive public archive of microarray data, and the ArrayExpress Data Warehouse, which is a database of gene expression profiles selected from the repository and consistently re-annotated. Archived experiments can be queried by experiment attributes, such as keywords, species, array platform, authors, journals or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms and gene expression profiles can be visualized. ArrayExpress is a rapidly growing database, currently it contains data from >50 000 hybridizations and >1 500 000 individual expression profiles. ArrayExpress supports community standards, including MIAME, MAGE-ML and more recently the proposal for a spreadsheet based data exchange format: MAGE-TAB. Availability:
Fast approximate hierarchical clustering using similarity heuristics
© 2008 Kull and Vilo; licensee BioMed Central Ltd
ArrayExpress update—an archive of microarray and high-throughput sequencing-based functional genomics experiments
The ArrayExpress Archive (http://www.ebi.ac.uk/arrayexpress) is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy
Aberrant methylation of tRNAs links cellular stress to neuro-developmental disorders.
Mutations in the cytosine-5 RNA methyltransferase NSun2 cause microcephaly and other neurological abnormalities in mice and human. How post-transcriptional methylation contributes to the human disease is currently unknown. By comparing gene expression data with global cytosine-5 RNA methylomes in patient fibroblasts and NSun2-deficient mice, we find that loss of cytosine-5 RNA methylation increases the angiogenin-mediated endonucleolytic cleavage of transfer RNAs (tRNA) leading to an accumulation of 5' tRNA-derived small RNA fragments. Accumulation of 5' tRNA fragments in the absence of NSun2 reduces protein translation rates and activates stress pathways leading to reduced cell size and increased apoptosis of cortical, hippocampal and striatal neurons. Mechanistically, we demonstrate that angiogenin binds with higher affinity to tRNAs lacking site-specific NSun2-mediated methylation and that the presence of 5' tRNA fragments is sufficient and required to trigger cellular stress responses. Furthermore, the enhanced sensitivity of NSun2-deficient brains to oxidative stress can be rescued through inhibition of angiogenin during embryogenesis. In conclusion, failure in NSun2-mediated tRNA methylation contributes to human diseases via stress-induced RNA cleavage
A global insight into a cancer transcriptional space using pancreatic data: importance, findings and flaws
Despite the increasing wealth of available data, the structure of cancer transcriptional space remains largely unknown. Analysis of this space would provide novel insights into the complexity of cancer, assess relative implications in complex biological processes and responses, evaluate the effectiveness of cancer models and help uncover vital facets of cancer biology not apparent from current small-scale studies. We conducted a comprehensive analysis of pancreatic cancer-expression space by integrating data from otherwise disparate studies. We found (i) a clear separation of profiles based on experimental type, with patient tissue samples, cell lines and xenograft models forming distinct groups; (ii) three subgroups within the normal samples adjacent to cancer showing disruptions to biofunctions previously linked to cancer; and (iii) that ectopic subcutaneous xenografts and cell line models do not effectively represent changes occurring in pancreatic cancer. All findings are available from our online resource for independent interrogation. Currently, the most comprehensive analysis of pancreatic cancer to date, our study primarily serves to highlight limitations inherent with a lack of raw data availability, insufficient clinical/histopathological information and ambiguous data processing. It stresses the importance of a global-systems approach to assess and maximise findings from expression profiling of malignant and non-malignant diseases
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies.
FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects.
DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects.
METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data
Pervasive lesion segregation shapes cancer genome evolution
Cancers arise through the acquisition of oncogenic mutations and grow through clonal expansion. Here we reveal that most mutagenic DNA lesions are not resolved as mutations within a single cell-cycle. Instead, DNA lesions segregate unrepaired into daughter cells for multiple cell generations, resulting in the chromosome-scale phasing of subsequent mutations. We characterise this process in mutagen-induced mouse liver tumours and show that DNA replication across persisting lesions can produce multiple alternative alleles in successive cell divisions, thereby generating both multi-allelic and combinatorial genetic diversity. The phasing of lesions enables the accurate measurement of strand biased repair processes, quantification of oncogenic selection, and fine mapping of sister chromatid exchange events. Finally, we demonstrate that lesion segregation is a unifying property of exogenous mutagens, including UV light and chemotherapy agents in human cells and tumours, which has profound implications for the evolution and adaptation of cancer genomes.This work was supported by: Cancer Research UK (20412, 22398), the European Research Council (615584, 682398), the Wellcome Trust (WT108749/Z/15/Z, WT106563/Z/14/A, WT202878/B/16/Z), the European Molecular Biology Laboratory, the MRC Human Genetics Unit core funding programme grants (MC_UU_00007/11, MC_UU_00007/16), and the ERDF/Spanish Ministry of Science, Innovation and Universities-Spanish State Research Agency/DamReMap Project (RTI2018-094095-B-I00)
Analysis of gene expression data from non-small celllung carcinoma cell lines reveals distinct sub-classesfrom those identified at the phenotype level
Microarray data from cell lines of Non-Small Cell Lung Carcinoma (NSCLC) can be used to look for differences in gene expression between the cell lines derived from different tumour samples, and to investigate if these differences can be used to cluster the cell lines into distinct groups. Dividing the cell lines into classes can help to improve diagnosis and the development of screens for new drug candidates. The micro-array data is first subjected to quality control analysis and then subsequently normalised using three alternate methods to reduce the chances of differences being artefacts resulting from the normalisation process. The final clustering into sub-classes was carried out in a conservative manner such that subclasses were consistent across all three normalisation methods. If there is structure in the cell line population it was expected that this would agree with histological classifications, but this was not found to be the case. To check the biological consistency of the sub-classes the set of most strongly differentially expressed genes was be identified for each pair of clusters to check if the genes that most strongly define sub-classes have biological functions consistent with NSCLC
Differential expression of THOC1 and ALY mRNP biogenesis/export factors in human cancers
<p>Abstract</p> <p>Background</p> <p>One key step in gene expression is the biogenesis of mRNA ribonucleoparticle complexes (mRNPs). Formation of the mRNP requires the participation of a number of conserved factors such as the THO complex. THO interacts physically and functionally with the Sub2/UAP56 RNA-dependent ATPase, and the Yra1/REF1/ALY RNA-binding protein linking transcription, mRNA export and genome integrity. Given the link between genome instability and cancer, we have performed a comparative analysis of the expression patterns of THOC1, a THO complex subunit, and ALY in tumor samples.</p> <p>Methods</p> <p>The mRNA levels were measured by quantitative real-time PCR and hybridization of a tumor tissue cDNA array; and the protein levels and distribution by immunostaining of a custom tissue array containing a set of paraffin-embedded samples of different tumor and normal tissues followed by statistical analysis.</p> <p>Results</p> <p>We show that the expression of two mRNP factors, THOC1 and ALY are altered in several tumor tissues. THOC1 mRNA and protein levels are up-regulated in ovarian and lung tumors and down-regulated in those of testis and skin, whereas ALY is altered in a wide variety of tumors. In contrast to THOC1, ALY protein is highly detected in normal proliferative cells, but poorly in high-grade cancers.</p> <p>Conclusions</p> <p>These results suggest a differential connection between tumorogenesis and the expression levels of human THO and ALY. This study opens the possibility of defining mRNP biogenesis factors as putative players in cell proliferation that could contribute to tumor development.</p
- …