197 research outputs found
Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study
BACKGROUND: A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. To solve the problem we propose a two-level analysis. First self-organizing map (SOM) is used. SOM is a vector quantization method that simplifies and reduces the dimensionality of original measurements and visualizes individual tumor sample in a SOM component plane. Next, hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples. RESULTS: We tested the two-level analysis on public data from diffuse large B-cell lymphomas. The analysis easily distinguished major gene expression patterns without the need for supervision: a germinal center-related, a proliferation, an inflammatory and a plasma cell differentiation-related gene expression pattern. The first three patterns matched the patterns described in the original publication using supervised clustering analysis, whereas the fourth one was novel. CONCLUSIONS: Our study shows that by using SOM as an intermediate step to analyze genome-wide gene expression data, the gene expression patterns can more easily be revealed. The "expression display" by the SOM component plane summarises the complicated data in a way that allows the clinician to evaluate the classification options rather than giving a fixed diagnosis
Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data
BACKGROUND: Using DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted/mean SOM component plane) or automatic feature selection (by pair-wise Fisher's linear discriminant). RESULTS: The proposed models were tested on four published datasets: (1) Leukemia (2) Colon cancer (3) Brain tumors and (4) NCI cancer cell lines. The models gave class prediction with markedly reduced error rates compared to other class prediction approaches, and the importance of feature selection on microarray data analysis was also emphasized. CONCLUSIONS: Our models identify marker genes with predictive potential, often better than other available methods in the literature. The models are potentially useful for medical diagnostics and may reveal some insights into cancer classification. Additionally, we illustrated two limitations in tumor classification from microarray data related to the biology underlying the data, in terms of (1) the class size of data, and (2) the internal structure of classes. These limitations are not specific for the classification models used
M-CGH: Analysing microarray-based CGH experiments
BACKGROUND: Microarray-based comparative genomic hybridisation (array CGH) is a technique by which variation in relative copy numbers between two genomes can be analysed by competitive hybridisation to DNA microarrays. This technology has most commonly been used to detect chromosomal amplifications and deletions in cancer. Dedicated tools are needed to analyse the results of such experiments, which include appropriate visualisation, and to take into consideration the physical relation in the genome between the probes on the array. RESULTS: M-CGH is a MATLAB toolbox with a graphical user interface designed specifically for the analysis of array CGH experiments, with multiple approaches to ratio normalization. Specifically, the distributions of three classes of DNA copy numbers (gains, normal and losses) can be estimated using a maximum likelihood method. Amplicon boundaries are computed by either the fuzzy K-nearest neighbour method or a wavelet approach. The program also allows linking each genomic clone with the corresponding genomic information in the Ensembl database . CONCLUSIONS: M-CGH, which encompasses the basic tools needed for analysing array CGH experiments, is freely available for academics , and does not require any other MATLAB toolbox
Cancer Predisposition Sequencing Reporter (CPSR): A flexible variant report engine for high-throughput germline screening in cancer
The value of high-throughput germline genetic testing is increasingly recognized inclinical cancer care. Disease-associated germline variants in cancer patients areimportant for risk management and surveillance, surgical decisions and can also havemajor implications for treatment strategies since many are in DNA repair genes. Withthe increasing availability of high-throughput DNA sequencing in cancer clinics andresearch, there is thus a need to provide clinically oriented sequencing reports forgermline variants and their potential therapeutic relevance on a per-patient basis. Tomeet this need, we have developed the Cancer Predisposition Sequencing Reporter(CPSR), an open-source computational workflow that generates a structured reportof germline variants identified in known cancer predisposition genes, highlightingmarkers of therapeutic, prognostic and diagnostic relevance. A fully automated vari-ant classification procedure based on more than 30 refined American College ofMedical Genetics and Genomics (ACMG) criteria represents an integral part of theworkflow. Importantly, the set of cancer predisposition genes profiled in the reportcan be flexibly chosen from more than 40 virtual gene panels established by scientificexperts, enabling customization of the report for different screening purposes andclinical contexts. The report can be configured to also list actionable secondary vari-ant findings, as recommended by ACMG. CPSR demonstrates comparable sensitivityand specificity for the detection of pathogenic variants when compared to otheralgorithms in the field. Technically, the tool is implemented in Python/R, and is freelyavailable through Docker technology. Source code, documentation, example reportsand installation instructions are accessible via the project GitHub page: https://github.com/sigven/cpsr.publishedVersio
Limitations of mRNA amplification from small-size cell samples
BACKGROUND: Global mRNA amplification has become a widely used approach to obtain gene expression profiles from limited material. An important concern is the reliable reflection of the starting material in the results obtained. This is especially important with extremely low quantities of input RNA where stochastic effects due to template dilution may be present. This aspect remains under-documented in the literature, as quantitative measures of data reliability are most often lacking. To address this issue, we examined the sensitivity levels of each transcript in 3 different cell sample sizes. ANOVA analysis was used to estimate the overall effects of reduced input RNA in our experimental design. In order to estimate the validity of decreasing sample sizes, we examined the sensitivity levels of each transcript by applying a novel model-based method, TransCount. RESULTS: From expression data, TransCount provided estimates of absolute transcript concentrations in each examined sample. The results from TransCount were used to calculate the Pearson correlation coefficient between transcript concentrations for different sample sizes. The correlations were clearly transcript copy number dependent. A critical level was observed where stochastic fluctuations became significant. The analysis allowed us to pinpoint the gene specific number of transcript templates that defined the limit of reliability with respect to number of cells from that particular source. In the sample amplifying from 1000 cells, transcripts expressed with at least 121 transcripts/cell were statistically reliable and for 250 cells, the limit was 1806 transcripts/cell. Above these thresholds, correlation between our data sets was at acceptable values for reliable interpretation. CONCLUSION: These results imply that the reliability of any amplification experiment must be validated empirically to justify that any gene exists in sufficient quantity in the input material. This finding has important implications for any experiment where only extremely small samples such as single cell analyses or laser captured microdissected cells are available
Profound influence of microarray scanner characteristics on gene expression ratios: analysis and procedure for correction
BACKGROUND: High throughput gene expression data from spotted cDNA microarrays are collected by scanning the signal intensities of the corresponding spots by dedicated fluorescence scanners. The major scanner settings for increasing the spot intensities are the laser power and the voltage of the photomultiplier tube (PMT). It is required that the expression ratios are independent of these settings. We have investigated the relationships between PMT voltage, spot intensities, and expression ratios for different scanners, in order to define an optimal scanning procedure. RESULTS: All scanners showed a limited intensity range from 200 to 50 000 (mean spot intensity), for which the expression ratios were independent of PMT voltage. This usable intensity range was considerably less than the maximum detection range of the PMTs. The use of spot and background intensities outside this range led to errors in the ratios. The errors at high intensities were caused by saturation of pixel intensities within the spots. An algorithm was developed to correct the intensities of these spots, and, hence, extend the upper limit of the usable intensity range. CONCLUSIONS: It is suggested that the PMT voltage should be increased to avoid intensities of the weakest spots below the usable range, allowing the brightest spots to reach the level of saturation. Subsequently, a second set of images should be acquired with a lower PMT setting such that no pixels are in saturation. Reliable data for spots with saturation in the first set of images can easily be extracted from the second set of images by the use of our algorithm. This procedure would lead to an increase in the accuracy of the data and in the number of data points achieved in each experiment compared to traditional procedures
DNA copy number changes in high-grade malignant peripheral nerve sheath tumors by array CGH
<p>Abstract</p> <p>Background</p> <p>Malignant peripheral nerve sheath tumors (MPNSTs) are rare and highly aggressive soft tissue tumors showing complex chromosomal aberrations. In order to identify recurrent chromosomal regions of gain and loss, and thereby novel gene targets of potential importance for MPNST development and/or progression, we have analyzed DNA copy number changes in seven high-grade MPNSTs using microarray-based comparative genomic hybridization (array CGH).</p> <p>Results</p> <p>Considerable more gains than losses were observed, and the most frequent minimal recurrent regions of gain included 1q24.1-q24.2, 1q24.3-q25.1, 8p23.1-p12, 9q34.11-q34.13 and 17q23.2-q25.3, all gained in five of seven samples. The 17q23.2-q25.3 region was gained in all five patients with poor outcome and not in the two patients with disease-free survival. cDNA microarray analysis and quantitative real-time reverse transcription PCR were used to investigate expression of genes located within these regions. The gene lysyl oxidase-like 2 (<it>LOXL2</it>) was identified as a candidate target for the 8p23.1-p12 gain. Within 17q, the genes topoisomerase II-α (<it>TOP2A</it>), ets variant gene 4 (E1A enhancer binding protein, <it>E1AF</it>) (<it>ETV4</it>) and baculoviral IAP repeat-containing 5 (survivin) (<it>BIRC5</it>) showed increased expression in all samples compared to two benign tumors. Increased expression of these genes has previously been associated with poor survival in other malignancies, and for <it>TOP2A</it>, in MPNSTs as well. In addition, we have analyzed the expression of five micro RNAs located within the 17q23.2-q25.3 region, but none of them showed high expression levels compared to the benign tumors.</p> <p>Conclusion</p> <p>Our study shows the potential of using DNA copy number changes obtained by array CGH to predict the prognosis of MPNST patients. Although no clear correlations between the expression level and patient outcome were observed, the genes <it>TOP2A</it>, <it>ETV4 </it>and <it>BIRC5 </it>are interesting candidate targets for the 17q gain associated with poor survival.</p
Chromosomal instability and a deregulated cell cycle are intrinsic features of high-risk gastrointestinal stromal tumours with a metastatic potential
Patients with localised, high-risk gastrointestinal stromal tumours (GIST) benefit from adjuvant imatinib treatment. Still, approximately 40% of patients relapse within 3 years after adjuvant therapy and the clinical and histopathological features currently used for risk classification cannot precisely predict poor outcomes after standard treatment. This study aimed to identify genomic and transcriptomic profiles that could be associated with disease relapse and thus a more aggressive phenotype. Using a multi-omics approach, we analysed a cohort of primary tumours from patients with untreated, resectable high-risk GISTs. We compared patients who developed metastatic disease within 3 years after finishing adjuvant imatinib treatment and patients without disease relapse after more than 5 years of follow-up. Combining genomics and transcriptomics data, we identified somatic mutations and deregulated mRNA and miRNA genes intrinsic to each group. Our study shows that increased chromosomal instability (CIN), including chromothripsis and deregulated kinetochore and cell cycle signalling, separates high-risk samples according to metastatic potential. The increased CIN seems to be an intrinsic feature for tumours that metastasise and should be further validated as a novel prognostic biomarker for high-risk GIST.publishedVersio
Clinical and molecular implications of NAB2-STAT6 fusion variants in solitary fibrous tumour
Solitary fibrous tumour (SFT) is a mesenchymal neoplasm characterised by pathognomonic NAB2-STAT6 gene fusions. The clinical implications and prognostic value of different fusion variants has not been clarified. In the current study, we explore the clinicopathological, prognostic and molecular differences between tumours with different fusions. Thirty-nine patients with localised, extrameningeal SFT were included, of whom 20 developed distant recurrence and 19 were without recurrence after long term follow-up. Capture-based RNA sequencing identified 12 breakpoint variants, which were categorised into two groups based on the STAT6 domain composition in the predicted chimeric proteins. Twenty-one of 34 (62%) sequenced tumours had fusions with most of the STAT6 domains intact and were classified as STAT6-Full. Thirteen tumours (38%) contained only the transactivation domain of STAT6 and were classified as STAT6-TAD. Tumours with STAT6-TAD fusions had a higher mitotic count (p=0.016) and were associated with inferior recurrence-free interval (p=0.004) and overall survival (p=0.012). Estimated 10-year recurrence-free survival was 25% for patients with STAT6-TAD tumours compared to 78% for the STAT6-Full group. Distinct transcriptional signatures between the fusion groups were identified, including higher expression of FGF2 in the STAT6-TAD group and IGF2, EGR2, PDGFRB, STAT6 and several extracellular matrix genes in STAT6-Full tumours. In summary, we demonstrate that NAB2-STAT6 fusion variants are associated with distinct clinicopathological and molecular characteristics and have prognostic significance in extrameningeal SFT.publishedVersio
Effects of mRNA amplification on gene expression ratios in cDNA experiments estimated by analysis of variance
BACKGROUND: A limiting factor of cDNA microarray technology is the need for a substantial amount of RNA per labeling reaction. Thus, 20–200 micro-grams total RNA or 0.5–2 micro-grams poly (A) RNA is typically required for monitoring gene expression. In addition, gene expression profiles from large, heterogeneous cell populations provide complex patterns from which biological data for the target cells may be difficult to extract. In this study, we chose to investigate a widely used mRNA amplification protocol that allows gene expression studies to be performed on samples with limited starting material. We present a quantitative study of the variation and noise present in our data set obtained from experiments with either amplified or non-amplified material. RESULTS: Using analysis of variance (ANOVA) and multiple hypothesis testing, we estimated the impact of amplification on the preservation of gene expression ratios. Both methods showed that the gene expression ratios were not completely preserved between amplified and non-amplified material. We also compared the expression ratios between the two cell lines for the amplified material with expression ratios between the two cell lines for the non-amplified material for each gene. With the aid of multiple t-testing with a false discovery rate of 5%, we found that 10% of the genes investigated showed significantly different expression ratios. CONCLUSION: Although the ratios were not fully preserved, amplification may prove to be extremely useful with respect to characterizing low expressing genes
- …