28 research outputs found

    progenyClust: an R package for Progeny Clustering

    Get PDF
    Identifying the optimal number of clusters is a common problem faced by data scientists in various research fields and industry applications. Though many clustering evaluation techniques have been developed to solve this problem, the recently developed algorithm Progeny Clustering is a much faster alternative and one that is relevant to biomedical applications. In this paper, we introduce an R package progenyClust that implements and extends the original Progeny Clustering algorithm for evaluating clustering stability and identifying the optimal cluster number. We illustrate its applicability using two examples: a simulated test dataset for proof-of-concept, and a cell imaging dataset for demonstrating its application potential in biomedical research. The progenyClust package is versatile in that it offers great flexibility for picking methods and tuning parameters. In addition, the default parameter setting as well as the plot and summary methods offered in the package make the application of Progeny Clustering straightforward and coherent

    Proteomics in Acute Myeloid Leukemia

    Get PDF
    Acute myeloid leukemia (AML) is an extremely heterogeneous and deadly hematological cancer. Cytogenetic abnormalities and genetic mutations, though well recognized and highly prognostic, do not fully capture the degree of heterogeneities manifested in AML clinically. Additionally, current treatment of AML still largely depends on chemotherapy and allogeneic stem cell transplantation, with few options for personalized and molecularly targeted therapies. Proteomics holds promise for unraveling biological heterogeneities in AML beyond the scope of cytogenetics and genomics. In recent years, proteomics has emerged as an important tool for discovering new diagnostic biomarkers, enabling more prognostic patient classifications, and identifying novel therapeutic targets. In this chapter, we review recent advances in proteomic studies of AML, including an overview of AML pathology, popular proteomic techniques, various applications of proteomics in AML from biomarker discovery to target identification, challenges and future directions in this field

    Shrinkage Clustering: A Fast and Size-Constrained Algorithm for Biomedical Applications

    Get PDF
    Motivation: Many common clustering algorithms require a two-step process that limits their efficiency. The algorithms need to be performed repetitively and need to be implemented together with a model selection criterion, in order to determine both the number of clusters present in the data and the corresponding cluster memberships. As biomedical datasets increase in size and prevalence, there is a growing need for new methods that are more convenient to implement and are more computationally efficient. In addition, it is often essential to obtain clusters of sufficient sample size to make the clustering result meaningful and interpretable for subsequent analysis. Results: We introduce Shrinkage Clustering, a novel clustering algorithm based on matrix factorization that simultaneously finds the optimal number of clusters while partitioning the data. We report its performances across multiple simulated and actual datasets, and demonstrate its strength in accuracy and speed in application to subtyping cancer and brain tissues. In addition, the algorithm offers a straightforward solution to clustering with cluster size constraints. Given its ease of implementation, computing efficiency and extensible structure, we believe Shrinkage Clustering can be applied broadly to solve biomedical clustering tasks especially when dealing with large datasets

    Clinical relevance of proteomic profiling in de novo pediatric acute myeloid leukemia:a Children’s Oncology Group study

    Get PDF
    Pediatric acute myeloid leukemia (AML) remains a fatal disease for at least 30% of patients, stressing the need for improved therapies and better risk stratification. As proteins are the unifying feature of (epi)genetic and environmental alterations, and are often targeted by novel chemotherapeutic agents, we studied the proteomic landscape of pediatric AML. Protein expression and activation levels were measured in 500 bulk leukemic patients’ samples and 30 control CD34(+) cell samples, using reverse phase protein arrays with 296 strictly validated antibodies. The multistep MetaGalaxy analysis methodology was applied and identified nine protein expression signatures (PrSIG), based on strong recurrent protein expression patterns. PrSIG were associated with cytogenetics and mutational state, and with favorable or unfavorable prognosis. Analysis based on treatment (i.e., ADE vs. ADE plus bortezomib) identified three PrSIG that did better with ADE plus bortezomib than with ADE alone. When PrSIG were studied in the context of cytogenetic risk groups, PrSIG were independently prognostic after multivariate analysis, suggesting a potential value for proteomics in combination with current classification systems. Proteins with universally increased (n=7) or decreased (n=17) expression were observed across PrSIG. Certain proteins significantly differentially expressed from normal could be identified, forming a hypothetical platform for personalized medicine

    Inferring causal molecular networks: empirical assessment through a community-based effort.

    Get PDF
    It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense

    Inferring causal molecular networks: empirical assessment through a community-based effort

    Get PDF
    Inferring molecular networks is a central challenge in computational biology. However, it has remained unclear whether causal, rather than merely correlational, relationships can be effectively inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge that focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results constitute the most comprehensive assessment of causal network inference in a mammalian setting carried out to date and suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess the causal validity of inferred molecular networks

    Inferring causal molecular networks: empirical assessment through a community-based effort

    Get PDF
    It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense

    Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications

    No full text
    Abstract Background Many common clustering algorithms require a two-step process that limits their efficiency. The algorithms need to be performed repetitively and need to be implemented together with a model selection criterion. These two steps are needed in order to determine both the number of clusters present in the data and the corresponding cluster memberships. As biomedical datasets increase in size and prevalence, there is a growing need for new methods that are more convenient to implement and are more computationally efficient. In addition, it is often essential to obtain clusters of sufficient sample size to make the clustering result meaningful and interpretable for subsequent analysis. Results We introduce Shrinkage Clustering, a novel clustering algorithm based on matrix factorization that simultaneously finds the optimal number of clusters while partitioning the data. We report its performances across multiple simulated and actual datasets, and demonstrate its strength in accuracy and speed applied to subtyping cancer and brain tissues. In addition, the algorithm offers a straightforward solution to clustering with cluster size constraints. Conclusions Given its ease of implementation, computing efficiency and extensible structure, Shrinkage Clustering can be applied broadly to solve biomedical clustering tasks especially when dealing with large datasets

    Progeny Clustering: A Method to Identify Biological Phenotypes

    No full text
    Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset
    corecore