34 research outputs found

    Hierarchy of Gene Expression Data is Predictive of Future Breast Cancer Outcome

    Full text link
    We calculate measures of hierarchy in gene and tissue networks of breast cancer patients. We find that the likelihood of metastasis in the future is correlated with increased values of network hierarchy for expression networks of cancer-associated genes, due to correlated expression of cancer-specific pathways. Conversely, future metastasis and quick relapse times are negatively correlated with values of network hierarchy in the expression network of all genes, due to dedifferentiation of gene pathways and circuits. These results suggest that hierarchy of gene expression may be useful as an additional biomarker for breast cancer prognosis.Comment: 14 pages, 5 figure

    Do logarithmic proximity measures outperform plain ones in graph clustering?

    Full text link
    We consider a number of graph kernels and proximity measures including commute time kernel, regularized Laplacian kernel, heat kernel, exponential diffusion kernel (also called "communicability"), etc., and the corresponding distances as applied to clustering nodes in random graphs and several well-known datasets. The model of generating random graphs involves edge probabilities for the pairs of nodes that belong to the same class or different predefined classes of nodes. It turns out that in most cases, logarithmic measures (i.e., measures resulting after taking logarithm of the proximities) perform better while distinguishing underlying classes than the "plain" measures. A comparison in terms of reject curves of inter-class and intra-class distances confirms this conclusion. A similar conclusion can be made for several well-known datasets. A possible origin of this effect is that most kernels have a multiplicative nature, while the nature of distances used in cluster algorithms is an additive one (cf. the triangle inequality). The logarithmic transformation is a tool to transform the first nature to the second one. Moreover, some distances corresponding to the logarithmic measures possess a meaningful cutpoint additivity property. In our experiments, the leader is usually the logarithmic Communicability measure. However, we indicate some more complicated cases in which other measures, typically, Communicability and plain Walk, can be the winners.Comment: 11 pages, 5 tables, 9 figures. Accepted for publication in the Proceedings of 6th International Conference on Network Analysis, May 26-28, 2016, Nizhny Novgorod, Russi

    Hierarchy in Gene Expression is Predictive for Adult Acute Myeloid Leukemia

    Full text link
    Cancer progresses with a change in the structure of the gene network in normal cells. We define a measure of organizational hierarchy in gene networks of affected cells in adult acute myeloid leukemia (AML) patients. With a retrospective cohort analysis based on the gene expression profiles of 116 acute myeloid leukemia patients, we find that the likelihood of future cancer relapse and the level of clinical risk are directly correlated with the level of organization in the cancer related gene network. We also explore the variation of the level of organization in the gene network with cancer progression. We find that this variation is non-monotonic, which implies the fitness landscape in the evolution of AML cancer cells is nontrivial. We further find that the hierarchy in gene expression at the time of diagnosis may be a useful biomarker in AML prognosis.Comment: 18 pages, 5 figures, to appear in Physical Biolog

    A Statistical Toolbox For Mining And Modeling Spatial Data

    Get PDF
    Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran鈥檚 and the Geary鈥檚 coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free) software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP), valuable in exploratory spatial data analysis

    Improved Image Segmentation Algorithm Using Graph-Edges

    Get PDF
    In this paper an efficient algorithm for segment digital image has beendeveloped by measuring the evidence for a boundary between two regions in an imageusing (graph-edges). The regions in the image were sorted as components, where eachregion in an image represents a component in the graph. The region comparisonpredicate evaluates if there is evidence for a boundary between a pair of componentsby checking if the difference between the components, is large relative to the internaldifference within at least one of the components. A threshold function is used tocontrol the degree the difference between components must be larger than minimuminternal difference. An important characteristic of the method is its ability to preservedetail in important image regions while ignoring detail in unimportant regions. Theclassical methods depend just on external difference and ignore the internaldifference, when segment two neighboring regions

    Large Scale Spectral Clustering Using Approximate Commute Time Embedding

    Full text link
    Spectral clustering is a novel clustering method which can detect complex shapes of data clusters. However, it requires the eigen decomposition of the graph Laplacian matrix, which is proportion to O(n3)O(n^3) and thus is not suitable for large scale systems. Recently, many methods have been proposed to accelerate the computational time of spectral clustering. These approximate methods usually involve sampling techniques by which a lot information of the original data may be lost. In this work, we propose a fast and accurate spectral clustering approach using an approximate commute time embedding, which is similar to the spectral embedding. The method does not require using any sampling technique and computing any eigenvector at all. Instead it uses random projection and a linear time solver to find the approximate embedding. The experiments in several synthetic and real datasets show that the proposed approach has better clustering quality and is faster than the state-of-the-art approximate spectral clustering methods
    corecore