672 research outputs found

    Gene expression profiling in acute leukemias: New insights into biology and a global approach to the diagnosis of leukemia using microarray technology

    Get PDF
    The application of global gene expression profiling allows to obtain detailed molecular fingerprints of underlying gene expression in any cell of interest. In this work gene expression profiles were generated from a comprehensive cohort of leukemia patients and healthy donors referred to and diagnosed in the Laboratory for Leukemia Diagnostics, Munich, Germany, which is a nation-wide reference center for the diagnosis of hematologic malignancies. Thoroughly characterized clinical samples were analyzed by high-density microarrays interrogating the expression status of more than 33,000 transcripts. In one specific aspect of this work the potential application of gene expression signatures for the prediction and classification of specific leukemia subtypes was assessed. Today the diagnosis and subclassification of leukemias is based on a controlled application of various techniques including cytomorphology, cytogenetics, fluorescence in situ hybridization, multiparameter flow cytometry, and PCR-based methods. The diagnostic procedure is performed according to a specific algorithm, but is time-consuming, cost-intensive, and requires expert knowledge. Based on a very low number of candidate genes it is demonstrated in this work that prognostically relevant acute leukemia subtypes can be classified using microarray technology. Moreover, in an expanded analysis including 937 patient samples representing 12 distinct clinically relevant acute and chronic leukemia subtypes and healthy, non-leukemia bone marrow specimens a diagnostic prediction accuracy of ~95% was achieved. Thus, given these results it can be postulated that the occurring patterns in gene expression would be so robust that they would allow to predict the leukemia subtype using global gene expression profiling technology. This finding is further substantiated through the demonstration that reported differentially expressed genes from the literature, namely pediatric gene expression signatures representing various acute lymphoblastic leukemia (ALL) subtypes, can be used to independently predict the corresponding adult ALL subtypes. Furthermore, it could be demonstrated that microarrays both confirm and reproduce data from standard diagnostic procedures, but also provide very robust results. Parameters such as partial RNA degradation, shipment time of the samples, varying periods of storage of the samples, or target preparations at different time points from either bone marrow or peripheral blood specimens by different operators did not dramatically influence the diagnostic gene expression signatures. In another major aspect of this work gene expression signatures were examined in detail to obtain new insights into the underlying biology of acute promyelocytic leukemia (APL) and t(11q23)/MLL leukemias. In APL, microarrays led to a deeper understanding of morphological and clinical characteristics. Firstly, genes which have a functional relevance in blood coagulation were found to be differentially expressed when APL was compared to other acute myeloid leukemia (AML) subtypes. Secondly, a supervised pairwise comparison between the two different APL phenotypes, M3 and its variant M3v, for the first time revealed differentially expressed genes encoding for biological functions and pathways such as granulation and maturation. With respect to 11q23 leukemias it could be demonstrated that leukemias with rearrangements of the MLL gene are characterized by a common specific gene expression signature. Additionally, in unsupervised and supervised data analysis algorithms ALL and AML cases with t(11q23)/MLL segregated according to the lineage, i.e., myeloid or lymphoid, respectively. This segregation could be explained by a highly differing transcriptional program. Through the use of biological network analyses essential regulators of early B cell development, PAX5 and EBF, were shown to be associated with a clear B-lineage commitment in lymphoblastic t(11q23)/MLL leukemias. Also, the influence of the different MLL translocation partners on the transcriptional program was directly assessed. But interestingly, gene expression profiles did not reveal a clear distinct pattern associated with one of the analyzed partner genes. Taken together, the identified molecular expression pattern of MLL fusion gene samples and biological networks revealed new insights into the aberrant transcriptional program in t(11q23)/MLL leukemias. In addition, a series of analyses was targeted to obtain new insights into the underlying biology in heterogeneous B-lineage leukemias not positive for BCR/ABL or MLL gene rearrangements. It could be demonstrated that the genetically more heterogeneous precursor B-ALL samples intercalate with BCR/ABL-positive cases, but their profiles were clearly distinct from T-ALL and t(11q23)/MLL cases. In conclusion, various unsupervised and supervised data analysis strategies demonstrated that defined leukemia subtypes can be characterized on the basis of distinct gene expression signatures. Specific gene expression patterns reproduced the taxonomy of this hematologic malignancy, provided new insights into different disease subtypes, and identified critical pathway components that might be considered for future therapeutic intervention. Based on these results it is now further possible to develop a one-step diagnostic approach for the diagnosis of leukemias using a customized microarray

    Comparative Analysis of Data Mining Tools and Classification Techniques using WEKA in Medical Bioinformatics

    Get PDF
    The availability of huge amounts of data resulted in great need of data mining technique in order to generate useful knowledge. In the present study we provide detailed information about data mining techniques with more focus on classification techniques as one important supervised learning technique. We also discuss WEKA software as a tool of choice to perform classification analysis for different kinds of available data. A detailed methodology is provided to facilitate utilizing the software by a wide range of users. The main features of WEKA are 49 data preprocessing tools, 76 classification/regression algorithms, 8 clustering algorithms, 3 algorithms for finding association rules, 15 attribute/subset evaluators plus 10 search algorithms for feature selection. WEKA extracts useful information from data and enables a suitable algorithm for generating an accurate predictive model from it to be identified.  Moreover, medical bioinformatics analyses have been performed to illustrate the usage of WEKA in the diagnosis of Leukemia. Keywords: Data mining, WEKA, Bioinformatics, Knowledge discovery, Gene Expression

    An integrated method for cancer classification and rule extraction from microarray data

    Get PDF
    Different microarray techniques recently have been successfully used to investigate useful information for cancer diagnosis at the gene expression level due to their ability to measure thousands of gene expression levels in a massively parallel way. One important issue is to improve classification performance of microarray data. However, it would be ideal that influential genes and even interpretable rules can be explored at the same time to offer biological insight

    Advance in the Treatment of Pediatric Leukemia

    Get PDF
    The book gives an overview on the progress that has been made in the treatment of acute lymphoblastic leukemia (ALL), of acute and chronic myeloid leukemia (AML, CML) and of juvenile myelomonocytic leukemia (JMML). Leukemia is the most common malignant disease in children, and 80% of patients are diagnosed with ALL and 15–20% with AML, whereas CML and JMML are rather rare. Although ALL was considered an incurable disease until the early 1960s, with the availability of cytotoxic drugs and the start of clinical multicenter studies, ALL has become an almost curable disease with a survival rate exceeding 90 % in high-income countries. These impressive results have mainly been achieved by a deeper understanding of the genomic landscape of the disease and the introduction of risk stratifications based on genetic features and response to chemotherapy as determined by the presence or absence of minimal residual disease (MRD). Immunotherapies including bispecific T-cell Engagers (BiTEs), Chimeric Antigen Receptor (CAR) T cells, monoclonal antibodies and improvements in the outcome of allogeneic stem cell transplantation (HSCT) have shown impressive results in chemorefractory or relapsed patients, and it is anticipated that the cure rate can be further increased. For countries with less resources, therapies have to be adapted to increase survival as well. This book also updates on the progress made in the treatment of AML. As in ALL, risk classification based on genetic factors and response to chemotherapy is most important for therapy guidance. The book also provides updates and guidance for the treatment of CML and JMML

    Extreme Value Distribution Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using Logistic Regression

    Full text link
    One important issue commonly encountered in the analysis of microarray data is to decide which and how many genes should be selected for further studies. For discriminant microarray data analyses based on statistical models, such as the logistic regression models, gene selection can be accomplished by a comparison of the maximum likelihood of the model given the real data, L^(DM)\hat{L}(D|M), and the expected maximum likelihood of the model given an ensemble of surrogate data with randomly permuted label, L^(D0M)\hat{L}(D_0|M). Typically, the computational burden for obtaining L^(D0M)\hat{L}(D_0|M) is immense, often exceeding the limits of computing available resources by orders of magnitude. Here, we propose an approach that circumvents such heavy computations by mapping the simulation problem to an extreme-value problem. We present the derivation of an asymptotic distribution of the extreme-value as well as its mean, median, and variance. Using this distribution, we propose two gene selection criteria, and we apply them to two microarray datasets and three classification tasks for illustration.Comment: to be published in Journal of Computational Biology (2004

    Computational Biology-Driven Genomic and Epigenomic Delineation of Acute Myeloid Leukemia

    Get PDF
    Hematopoiesis is the deterministic process of blood cell formation taking place in the bone marrow. Mature blood cells are produced by a tightly controlled mechanism from hematopoietic stem cells (HSCs) residing in the bone marrow. Upon maturation blood cells are released into the peripheral blood and from this point onward can be transported to the different locations of the body. The mature blood cells exert different functions dependent on a strictly controlled path of maturation. The distinct leukocytes comprising granulocytes, monocytes, macrophages, natural killer cells and lymphocytes are essential for the defense against pathogens and foreign invaders, erythrocytes play a pivotal role in the transportation of oxygen to remote organs, and platelets confer the process of blood clotting. Mature blood cells are short-lived and require continuous replenishment. The control of the production and the total number of blood cells is conferred by multipotent progenitors and a small population of pluripotent HSCs (Figure 1). HSCs reside in the bone marrow of adult mammals at the apex of a hierarchy of progenitors which become progressively restricted to several and eventually single lineages of blood cells. Additionally these pluripotent stem cells have the unique ability to self-renew, generating a source for continuous replenishment of the complete blood cell system. The hematopoietic stem cell compartment contains stem cells with progressively decreased self-renewal capacity with the retention of multi-lineage reconstitution. The rare long term HSC (LT-HSC) is at the pinnacle of the hematopoietic hierarchy and is mainly quiescent. With the most conserved rate of self-renewal it prevents the depletion of the stem cell pool. The less rare short term HSC (ST-HSC) still retains a minimal ability for self-renewal and is the more active effector cell for hematopoietic replenishment in normal situations. The main constituent of the hematopoietic stem cell compartment is the multipotent progenitor (MPP) which lost its self-renewal capacity, however, kept the ability to give rise to daughter cells of different lineages. The daughter cells, common myeloid progenitor (CMP) and common lymphoid progenitor (CLP), are still oligopotent as they give rise to multiple blood cell types, e.g., lymphocytes, granulocytes, platelets and erythrocytes. The production of mature blood cells is a strictly controlled process that adapts to the needs of human physiology, e.g., erythrocyte production after blood loss. The control is asserted mainly by external stimuli, e.g., hematopoietic cytokines or growth factors, which are produced by constituents of the regulatory microenvironment within the bone marrow niche, other blood cells or cytokine secreting organs. The microenvironment plays a pivotal role in the formation of adequate numbers of blood cells of the correct type and the hematopoietic cytokines it produces allows the hematopoietic system to dynamically adapt to extramedullary events, e.g., blood loss, infection or cancer immunoediting

    Identification of Interaction Patterns and Classification with Applications to Microarray Data

    Get PDF
    Emerging patterns represent a class of interaction structures which has been recently proposed as a tool in data mining. In this paper, a new and more general definition refering to underlying probabilities is proposed. The defined interaction patterns carry information about the relevance of combinations of variables for distinguishing between classes. Since they are formally quite similar to the leaves of a classification tree, we propose a fast and simple method which is based on the CART algorithm to find the corresponding empirical patterns in data sets. In simulations, it can be shown that the method is quite effective in identifying patterns. In addition, the detected patterns can be used to define new variables for classification. Thus, we propose a simple scheme to use the patterns to improve the performance of classification procedures. The method may also be seen as a scheme to improve the performance of CARTs concerning the identification of interaction patterns as well as the accuracy of prediction

    Microarray-based Multiclass Classification using Relative Expression Analysis

    Get PDF
    Microarray gene expression profiling has led to a proliferation of statistical learning methods proposed for a variety of problems related to biological and clinical discoveries. One major problem is to identify gene expression-based biological markers for class discovery and prediction of complex diseases such as cancer. For example, expression patterns of genes are discovered to be associated with phenotypes (e.g., classes of disease) through statistical learning models. Early hopes that well-developed methods such as support vector machines would completely revolutionize the field have been moderated by the difficulties of analyzing microarray data. Hence, new and effective approaches need to be developed to address some common limitations encountered by current methods. This thesis is focused on improving statistical learning on microarray data through rank-based methodologies. The relative expression analysis introduced in Chapter 1 is the central concept for methodological development where the relative expression ordering (i.e., the relative ranks of expression levels) of genes is investigated instead of analyzing the actual expression values of individual genes. Supervised learning problems are studied where classification models are built for differentiating disease states. An unsupervised learning task is also examined in which subclasses are discovered by cluster analysis at the molecular level. Both types of problems under study consist of multiple classes. In Chapter 2, a novel rank-based classifier named Top Scoring Set (TSS) is developed for microarray classification of multiple disease states. It generalizes the Top Scoring Pair (TSP) method for binary classification problems to the multiclass case. Its main advantage lies in the simplicity and power of its decision rule, which provides transparent boundaries and allows for potential biological interpretations. Since TSS requires a dimension reduction in the training process, a greedy search algorithm is proposed to perform a fast search over the feature space. In addition, ensemble classification based on TSS is also investigated. In Chapter 3, an efficient and biologically meaningful dimension reduction for the TSS classifier is introduced using the publicly available pathway databases. Pre-defined functional gene groups are analyzed for microarray classification. The pathway-based TSS classifier is validated on an extremely large cohort of leukemia cancer patients. Also, the unsupervised learning ability of relative expression analysis is studied and a rank-based clustering approach is introduced to identify molecularly distinct subtypes of breast cancer patients. Based on the clustering results, the TSP classifier is used for predicting the subtypes of individual breast cancer tumors. These rank-based methods provide an independent validation for the current identification of breast cancer subtypes. Overall, this thesis provides developments and validations of statistical learning methods based on relative expression analysis

    Accurate molecular classification of cancer using simple rules

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One intractable problem with using microarray data analysis for cancer classification is how to reduce the extremely high-dimensionality gene feature data to remove the effects of noise. Feature selection is often used to address this problem by selecting informative genes from among thousands or tens of thousands of genes. However, most of the existing methods of microarray-based cancer classification utilize too many genes to achieve accurate classification, which often hampers the interpretability of the models. For a better understanding of the classification results, it is desirable to develop simpler rule-based models with as few marker genes as possible.</p> <p>Methods</p> <p>We screened a small number of informative single genes and gene pairs on the basis of their depended degrees proposed in rough sets. Applying the decision rules induced by the selected genes or gene pairs, we constructed cancer classifiers. We tested the efficacy of the classifiers by leave-one-out cross-validation (LOOCV) of training sets and classification of independent test sets.</p> <p>Results</p> <p>We applied our methods to five cancerous gene expression datasets: leukemia (acute lymphoblastic leukemia [ALL] vs. acute myeloid leukemia [AML]), lung cancer, prostate cancer, breast cancer, and leukemia (ALL vs. mixed-lineage leukemia [MLL] vs. AML). Accurate classification outcomes were obtained by utilizing just one or two genes. Some genes that correlated closely with the pathogenesis of relevant cancers were identified. In terms of both classification performance and algorithm simplicity, our approach outperformed or at least matched existing methods.</p> <p>Conclusion</p> <p>In cancerous gene expression datasets, a small number of genes, even one or two if selected correctly, is capable of achieving an ideal cancer classification effect. This finding also means that very simple rules may perform well for cancerous class prediction.</p
    corecore