2 research outputs found
Towards Quantifying Vertex Similarity in Networks
Vertex similarity is a major problem in network science with a wide range of
applications. In this work we provide novel perspectives on finding
(dis)similar vertices within a network and across two networks with the same
number of vertices (graph matching). With respect to the former problem, we
propose to optimize a geometric objective which allows us to express each
vertex uniquely as a convex combination of a few extreme types of vertices. Our
method has the important advantage of supporting efficiently several types of
queries such as "which other vertices are most similar to this vertex?" by the
use of the appropriate data structures and of mining interesting patterns in
the network. With respect to the latter problem (graph matching), we propose
the generalized condition number --a quantity widely used in numerical
analysis-- of the Laplacian matrix representations of
as a measure of graph similarity, where are the graphs of interest. We
show that this objective has a solid theoretical basis and propose a
deterministic and a randomized graph alignment algorithm. We evaluate our
algorithms on both synthetic and real data. We observe that our proposed
methods achieve high-quality results and provide us with significant insights
into the network structure.Comment: 16 papers, 5 figures, 2 table
Robust unmixing of tumor states in array comparative genomic hybridization data
Motivation: Tumorigenesis is an evolutionary process by which tumor cells acquire sequences of mutations leading to increased growth, invasiveness and eventually metastasis. It is hoped that by identifying the common patterns of mutations underlying major cancer sub-types, we can better understand the molecular basis of tumor development and identify new diagnostics and therapeutic targets. This goal has motivated several attempts to apply evolutionary tree reconstruction methods to assays of tumor state. Inference of tumor evolution is in principle aided by the fact that tumors are heterogeneous, retaining remnant populations of different stages along their development along with contaminating healthy cell populations. In practice, though, this heterogeneity complicates interpretation of tumor data because distinct cell types are conflated by common methods for assaying the tumor state. We previously proposed a method to computationally infer cell populations from measures of tumor-wide gene expression through a geometric interpretation of mixture type separation, but this approach deals poorly with noisy and outlier data