2,732 research outputs found

    On Spectral Graph Embedding: A Non-Backtracking Perspective and Graph Approximation

    Full text link
    Graph embedding has been proven to be efficient and effective in facilitating graph analysis. In this paper, we present a novel spectral framework called NOn-Backtracking Embedding (NOBE), which offers a new perspective that organizes graph data at a deep level by tracking the flow traversing on the edges with backtracking prohibited. Further, by analyzing the non-backtracking process, a technique called graph approximation is devised, which provides a channel to transform the spectral decomposition on an edge-to-edge matrix to that on a node-to-node matrix. Theoretical guarantees are provided by bounding the difference between the corresponding eigenvalues of the original graph and its graph approximation. Extensive experiments conducted on various real-world networks demonstrate the efficacy of our methods on both macroscopic and microscopic levels, including clustering and structural hole spanner detection.Comment: SDM 2018 (Full version including all proofs

    Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling

    Full text link
    We evaluate the impact of probabilistically-constructed digital identity data collected from Sep. to Dec. 2017 (approx.), in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying user. The identity data allows to generate "identity-based", rather than "identifier-based", user models, giving a fuller picture of the interests of the users underlying the identifiers. We employ off-policy techniques to evaluate the potential of identity-powered lookalike models without incurring the risk of allowing untested models to direct large amounts of ad spend or the large cost of performing A/B tests. We add to historical work on off-policy evaluation by noting a significant type of "finite-sample bias" that occurs for studies combining modestly-sized datasets and evaluation metrics involving rare events (e.g., conversions). We illustrate this bias using a simulation study that later informs the handling of inverse propensity weights in our analyses on real data. We demonstrate significant lift in identity-powered lookalikes versus an identity-ignorant baseline: on average ~70% lift in conversion rate. This rises to factors of ~(4-32)x for identifiers having little data themselves, but that can be inferred to belong to users with substantial data to aggregate across identifiers. This implies that identity-powered user modeling is especially important in the context of identifiers having very short lifespans (i.e., frequently churned cookies). Our work motivates and informs the use of probabilistically-constructed identities in marketing. It also deepens the canon of examples in which off-policy learning has been employed to evaluate the complex systems of the internet economy.Comment: Accepted by WSDM 201

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Fourteenth Biennial Status Report: März 2017 - February 2019

    No full text
    • …
    corecore