6 research outputs found

    Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data

    Get PDF
    BACKGROUND: Using DNA microarrays, we have developed two novel models for tumor classification and target gene prediction. First, gene expression profiles are summarized by optimally selected Self-Organizing Maps (SOMs), followed by tumor sample classification by Fuzzy C-means clustering. Then, the prediction of marker genes is accomplished by either manual feature selection (visualizing the weighted/mean SOM component plane) or automatic feature selection (by pair-wise Fisher's linear discriminant). RESULTS: The proposed models were tested on four published datasets: (1) Leukemia (2) Colon cancer (3) Brain tumors and (4) NCI cancer cell lines. The models gave class prediction with markedly reduced error rates compared to other class prediction approaches, and the importance of feature selection on microarray data analysis was also emphasized. CONCLUSIONS: Our models identify marker genes with predictive potential, often better than other available methods in the literature. The models are potentially useful for medical diagnostics and may reveal some insights into cancer classification. Additionally, we illustrated two limitations in tumor classification from microarray data related to the biology underlying the data, in terms of (1) the class size of data, and (2) the internal structure of classes. These limitations are not specific for the classification models used

    Genes of cell-cell interactions, chemotherapy detoxification and apoptosis are induced during chemotherapy of acute myeloid leukemia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The molecular changes <it>in vivo </it>in acute myeloid leukemia cells early after start of conventional genotoxic chemotherapy are incompletely understood, and it is not known if early molecular modulations reflect clinical response.</p> <p>Methods</p> <p>The gene expression was examined by whole genome 44 k oligo microarrays and 12 k cDNA microarrays in peripheral blood leukocytes collected from seven leukemia patients before treatment, 2–4 h and 18–24 h after start of chemotherapy and validated by real-time quantitative PCR. Statistically significantly upregulated genes were classified using gene ontology (GO) terms. Parallel samples were examined by flow cytometry for apoptosis by annexin V-binding and the expression of selected proteins were confirmed by immunoblotting.</p> <p>Results</p> <p>Significant differential modulation of 151 genes were found at 4 h after start of induction therapy with cytarabine and anthracycline, including significant overexpression of 31 genes associated with p53 regulation. Within 4 h of chemotherapy the BCL2/BAX and BCL2/PUMA ratio were attenuated in proapoptotic direction. FLT3 mutations indicated that non-responders (5/7 patients, 8 versus 49 months survival) are characterized by a unique gene response profile before and at 4 h. At 18–24 h after chemotherapy, the gene expression of p53 target genes was attenuated, while genes involved in chemoresistance, cytarabine detoxification, chemokine networks and T cell receptor were prominent. No signs of apoptosis were observed in the collected cells, suggesting the treated patients as a physiological source of pre-apoptotic cells.</p> <p>Conclusion</p> <p>Pre-apoptotic gene expression can be monitored within hours after start of chemotherapy in patients with acute myeloid leukemia, and may be useful in future determination of therapy responders. The low number of patients and the heterogeneity of acute myeloid leukemia limited the identification of gene expression predictive of therapy response. Therapy-induced gene expression reflects the complex biological processes involved in clinical cancer cell eradication and should be explored for future enhancement of therapy.</p

    LSimpute: accurate estimation of missing values in microarray data with least squares methods

    Get PDF
    Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortun ately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm
    corecore