222 research outputs found

    Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes

    Get PDF
    Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior

    On Molecular Classification of Bladder Cancer: Out of One, Many.

    Get PDF
    Comparative analysis showed that bladder cancer classification systems identify overlapping subtypes but at different levels. Muscle-invasive bladder cancer shows remarkable heterogeneity, and six subtypes were identified that differ in transcriptional networks, marker profiles, and expression of actionable targets

    Genome-wide transcription factor binding site/promoter databases for the analysis of gene sets and co-occurrence of transcription factor binding motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of global gene expression profiling is a well established approach to understand biological processes. One of the major goals of these investigations is to identify sets of genes with similar expression patterns. Such gene signatures may be very informative and reveal new aspects of particular biological processes. A logical and systematic next step is to reduce the identified gene signatures to the regulatory components that induce the relevant gene expression changes. A central issue in this context is to identify transcription factors, or transcription factor binding sites (TFBS), likely to be of importance for the expression of the gene signatures.</p> <p>Results</p> <p>We develop a strategy that efficiently produces TFBS/promoter databases based on user-defined criteria. The resulting databases constitute all genes in the Santa Cruz database and the positions for all TFBS provided by the user as position weight matrices. These databases are then used for two purposes, to identify significant TFBS in the promoters in sets of genes and to identify clusters of co-occurring TFBS. We use two criteria for significance, significantly enriched TFBS in terms of total number of binding sites for the promoters, and significantly present TFBS in terms of the fraction of promoters with binding sites. Significant TFBS are identified by a re-sampling procedure in which the query gene set is compared with typically 10<sup>5 </sup>gene lists of similar size randomly drawn from the TFBS/promoter database. We apply this strategy to a large number of published ChIP-Chip data sets and show that the proposed approach faithfully reproduces ChIP-Chip results. The strategy also identifies relevant TFBS when analyzing gene signatures obtained from the MSigDB database. In addition, we show that several TFBS are highly correlated and that co-occurring TFBS define functionally related sets of genes.</p> <p>Conclusions</p> <p>The presented approach of promoter analysis faithfully reproduces the results from several ChIP-Chip and MigDB derived gene sets and hence may prove to be an important method in the analysis of gene signatures obtained through ChIP-Chip or global gene expression experiments. We show that TFBS are organized in clusters of co-occurring TFBS that together define highly coherent sets of genes.</p

    Referring physicians underestimate the extent of abnormalities in final reports from myocardial perfusion imaging

    Get PDF
    BACKGROUND: It is important that referring physicians and other treating clinicians properly understand the final reports from diagnostic tests. The aim of the study was to investigate whether referring physicians interpret a final report for a myocardial perfusion scintigraphy (MPS) test in the same way that the reading nuclear medicine physician intended. METHODS: After viewing final reports containing only typical clinical verbiage and images, physicians in nuclear medicine and referring physicians (physicians in cardiology, internal medicine, and general practitioners) independently classified 60 MPS tests for the presence versus absence of ischemia/infarction according to objective grades of 1–5 (1 = No ischemia/infarction, 2 = Probably no ischemia/infarction 3 = Equivocal, 4 = Probable ischemia/infarction, and 5 = Certain ischemia/infarction). When ischemia and/or infarction were thought to be present in the left ventricle, all physicians were also asked to mark the involved segments based on the 17-segment model. RESULTS: There was good diagnostic agreement between physicians in nuclear medicine and referring physicians when assessing the general presence versus absence of both ischemia and infarction (median squared kappa coefficient of 0.92 for both). However, when using the 17-segment model, compared to the physicians in nuclear medicine, 12 of 23 referring physicians underestimated the extent of ischemic area while 6 underestimated and 1 overestimated the extent of infarcted area. CONCLUSIONS: Whereas referring physicians gain a good understanding of the general presence versus absence of ischemia and infarction from MPS test reports, they often underestimate the extent of any ischemic or infarcted areas. This may have adverse clinical consequences and thus the language in final reports from MPS tests might be further improved and standardized

    Independent component analysis reveals new and biologically significant structures in micro array data

    Get PDF
    BACKGROUND: An alternative to standard approaches to uncover biologically meaningful structures in micro array data is to treat the data as a blind source separation (BSS) problem. BSS attempts to separate a mixture of signals into their different sources and refers to the problem of recovering signals from several observed linear mixtures. In the context of micro array data, "sources" may correspond to specific cellular responses or to co-regulated genes. RESULTS: We applied independent component analysis (ICA) to three different microarray data sets; two tumor data sets and one time series experiment. To obtain reliable components we used iterated ICA to estimate component centrotypes. We found that many of the low ranking components indeed may show a strong biological coherence and hence be of biological significance. Generally ICA achieved a higher resolution when compared with results based on correlated expression and a larger number of gene clusters with significantly enriched for gene ontology (GO) categories. In addition, components characteristic for molecular subtypes and for tumors with specific chromosomal translocations were identified. ICA also identified more than one gene clusters significant for the same GO categories and hence disclosed a higher level of biological heterogeneity, even within coherent groups of genes. CONCLUSION: Although the ICA approach primarily detects hidden variables, these surfaced as highly correlated genes in time series data and in one instance in the tumor data. This further strengthens the biological relevance of latent variables detected by ICA

    Distinct evolutionary mechanisms for genomic imbalances in high-risk and low-risk neuroblastomas

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Neuroblastoma (NB) is the most common extracranial solid tumour of childhood. Several genomic imbalances correlate to prognosis in NB, with structural rearrangements, including gene amplification, in a near-diploid setting typically signifying high-risk tumours and numerical changes in a near-triploid setting signifying low-risk tumours. Little is known about the temporal sequence in which these imbalances occur during the carcinogenic process.</p> <p>Methods</p> <p>We have reconstructed the appearance of cytogenetic imbalances in 270 NBs by first grouping tumours and imbalances through principal component analysis and then using the number of imbalances in each tumour as an indicator of evolutionary progression.</p> <p>Results</p> <p>Tumours clustered in four sub-groups, dominated respectively by (1) gene amplification in double minute chromosomes and few other aberrations, (2) gene amplification and loss of 1p sequences, (3) loss of 1p and other structural aberrations including gain of 17q, and (4) whole-chromosome gains and losses. Temporal analysis showed that the structural changes in groups 1–3 were acquired in a step-wise fashion, with loss of 1p sequences and the emergence of double minute chromosomes as the earliest cytogenetic events. In contrast, the gains and losses of whole chromosomes in group 4 occurred through multiple simultaneous events leading to a near-triploid chromosome number.</p> <p>Conclusion</p> <p>The finding of different temporal patterns for the acquisition of genomic imbalances in high-risk and low-risk NBs lends strong support to the hypothesis that these tumours are biologically diverse entities, evolving through distinct genetic mechanisms.</p

    Robust assignment of cancer subtypes from expression data using a uni-variate gene expression average as classifier

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome wide gene expression data is a rich source for the identification of gene signatures suitable for clinical purposes and a number of statistical algorithms have been described for both identification and evaluation of such signatures. Some employed algorithms are fairly complex and hence sensitive to over-fitting whereas others are more simple and straight forward. Here we present a new type of simple algorithm based on ROC analysis and the use of metagenes that we believe will be a good complement to existing algorithms.</p> <p>Results</p> <p>The basis for the proposed approach is the use of metagenes, instead of collections of individual genes, and a feature selection using AUC values obtained by ROC analysis. Each gene in a data set is assigned an AUC value relative to the tumor class under investigation and the genes are ranked according to these values. Metagenes are then formed by calculating the mean expression level for an increasing number of ranked genes, and the metagene expression value that optimally discriminates tumor classes in the training set is used for classification of new samples. The performance of the metagene is then evaluated using LOOCV and balanced accuracies.</p> <p>Conclusions</p> <p>We show that the simple uni-variate gene expression average algorithm performs as well as several alternative algorithms such as discriminant analysis and the more complex approaches such as SVM and neural networks. The R package <it>rocc </it>is freely available at <url>http://cran.r-project.org/web/packages/rocc/index.html</url>.</p

    CODUSA - Customize Optimal Donor Using Simulated Annealing In Heart Transplantation.

    Get PDF
    In heart transplantation, selection of an optimal recipient-donor match has been constrained by the lack of individualized prediction models. Here we developed a customized donor-matching model (CODUSA) for patients requiring heart transplantations, by combining simulated annealing and artificial neural networks. Using this approach, by analyzing 59,698 adult heart transplant patients, we found that donor age matching was the variable most strongly associated with long-term survival. Female hearts were given to 21% of the women and 0% of the men, and recipients with blood group B received identical matched blood group in only 18% of best-case match compared with 73% for the original match. By optimizing the donor profile, the survival could be improved with 33 months. These findings strongly suggest that the CODUSA model can improve the ability to select optimal match and avoid worst-case match in the clinical setting. This is an important step towards personalized medicine

    Clinical and genetic studies of ETV6/ABL1-positive chronic myeloid leukaemia in blast crisis treated with imatinib mesylate.

    Get PDF
    Most chronic myeloid leukaemia (CML) patients are genetically characterized by the t(9;22)(q34;q11), generating the BCR/ABL1 fusion gene. However, a few CML patients with rearrangements of 9q34 and 12p13, leading to ETV6/ABL1 chimaeras, have also been reported. Here we describe the clinical and genetic response to imatinib mesylate treatment of an ETV6/ABL1-positive CML patient diagnosed in blast crisis (BC). A chronic phase was achieved after acute myeloid leukaemia induction therapy. Then, treatment with imatinib mesylate (600 mg/d) was initiated and the effect was assessed clinically as well as genetically, including by repeated interphase fluorescence in situ hybridization studies. Until d 71 of imatinib mesylate therapy, stable improvements in the clinical and laboratory features were noted, and the frequency of ABL1-rearranged peripheral blood cells decreased from 56% to 11%. At d 92, an additional t(12;13)(p12;q13), with the 12p breakpoint proximal to ETV6, was found. The patient relapsed into BC 126 d after the start of the imatinib mesylate treatment and succumbed to the disease shortly afterwards. No mutations in the tyrosine kinase domain of ABL1 of the ETV6/ABL1 fusion were identified in the second BC. However, whereas the ETV6/ABL1 expression was seemingly the same at diagnosis and at second BC, the expression of ETV6 was markedly lower at the second BC. This decreased expression of wild-type ETV6 may have been a contributory factor for the relapse

    A pooled analysis of karyotypic patterns, breakpoints and imbalances in 783 cytogenetically abnormal multiple myelomas reveals frequently involved chromosome segments as well as significant age- and sex-related differences.

    Get PDF
    The cytogenetic features (ploidy, complexity, breakpoints, imbalances) were ascertained in 783 abnormal multiple myeloma (MM) cases to identify frequently involved chromosomal regions as well as a possible impact of age/sex. The series included MM patients from the Mitelman Database of Chromosome Aberrations in Cancer and from our own laboratory. Hyperdiploidy was most common, followed by hypodiploidy, pseudodiploidy and tri-/tetraploidy. Most cases were complex, with a median of eight changes per patient. The distribution of modal numbers differed between younger and older patients, but was not related to sex. No sex- or age-related differences regarding the number of anomalies were found. The most frequent genomic breakpoints were 14q32, 11q13, 1q10, 8q24, 1p11, 1q21, 22q11, 1p13, 1q11, 19q13, 1p22, 6q21 and 17p11. Breaks in 1p13, 6q21 and 11q13 were more common in the younger age group. The most frequent imbalances were + 9, - 13, + 15, + 19, + 11 and - Y. Trisomy 11 and monosomy 16 were more common among men, while -X was more frequent among women. Loss of Y as the sole change and + 5 were more common in elderly patients, and - 14 was more frequent in the younger age group. The present findings strongly suggest that some karyotypic features of MM are influenced by endogenous and/or exogenous factors
    corecore