Search CORE

2,732 research outputs found

On Spectral Graph Embedding: A Non-Backtracking Perspective and Graph Approximation

Author: He Lifang
Jiang Fei
Xu Jin
Yu Philip S.
Zheng Yi
Zhu Enqiang
Publication venue
Publication date: 01/01/2018
Field of study

Graph embedding has been proven to be efficient and effective in facilitating graph analysis. In this paper, we present a novel spectral framework called NOn-Backtracking Embedding (NOBE), which offers a new perspective that organizes graph data at a deep level by tracking the flow traversing on the edges with backtracking prohibited. Further, by analyzing the non-backtracking process, a technique called graph approximation is devised, which provides a channel to transform the spectral decomposition on an edge-to-edge matrix to that on a node-to-node matrix. Theoretical guarantees are provided by bounding the difference between the corresponding eigenvalues of the original graph and its graph approximation. Extensive experiments conducted on various real-world networks demonstrate the efficacy of our methods on both macroscopic and microscopic levels, including clustering and structural hole spanner detection.Comment: SDM 2018 (Full version including all proofs

arXiv.org e-Print Archive

Crossref

Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling

Author: Cotta Randell
Hu Mingyang
Jiang Dan
Liao Peizhou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/01/2019
Field of study

We evaluate the impact of probabilistically-constructed digital identity data collected from Sep. to Dec. 2017 (approx.), in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying user. The identity data allows to generate "identity-based", rather than "identifier-based", user models, giving a fuller picture of the interests of the users underlying the identifiers. We employ off-policy techniques to evaluate the potential of identity-powered lookalike models without incurring the risk of allowing untested models to direct large amounts of ad spend or the large cost of performing A/B tests. We add to historical work on off-policy evaluation by noting a significant type of "finite-sample bias" that occurs for studies combining modestly-sized datasets and evaluation metrics involving rare events (e.g., conversions). We illustrate this bias using a simulation study that later informs the handling of inverse propensity weights in our analyses on real data. We demonstrate significant lift in identity-powered lookalikes versus an identity-ignorant baseline: on average ~70% lift in conversion rate. This rises to factors of ~(4-32)x for identifiers having little data themselves, but that can be inferred to belong to users with substantial data to aggregate across identifiers. This implies that identity-powered user modeling is especially important in the context of identifiers having very short lifespans (i.e., frequently churned cookies). Our work motivates and informs the use of probabilistically-constructed identities in marketing. It also deepens the canon of examples in which off-policy learning has been employed to evaluate the complex systems of the internet economy.Comment: Accepted by WSDM 201

arXiv.org e-Print Archive

Crossref

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Author: Korenblum Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

arXiv.org e-Print Archive

Directory of Open Access Journals

Fourteenth Biennial Status Report: März 2017 - February 2019

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2019
Field of study

MPG.PuRe

Recommended from our members

Finding event-specific influencers in dynamic social networks

Author: Schenk Christopher Brendan
Publication venue: CU Scholar
Publication date: 01/01/2010
Field of study

Reputation models are widely in use today in commercial transaction (ebay), product review (amazon, epinions), and news commentary websites (slashdot). The purpose of these reputation models is to provide behavioral or informational data for future users to determine whether or not he or she will trust the data. These models are dependent on explicit feedback mechanisms where users rate product, other users, or information. However, for many popular social network information sources on the web, no such explicit feedback systems exist where users rate information in order for consumers of this information to be able to judge the trustworthiness of the data source or the data itself. Here I describe the layers of the problem of determining reputation among users or data during events discussed on social networks, and evaluate data and network analysis methods from varying disciplines that may implicitly infer user or data reputation based on metadata, user relationships and user actions in social networks. I demonstrate that the HITS algorithm is not effective at finding influential users, and propose a new algorithm and demonstrate its effectiveness for finding influential users during an event

CU Scholar Institutional Repository

Recommended from our members

On the radius of centrality in evolving communication networks

Author: Grindrod Peter
Stoyanov Zhivko
Vukadinovic Greetham Danica
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2014
Field of study

In this article, we investigate how the choice of the attenuation factor in an extended version of Katz centrality influences the centrality of the nodes in evolving communication networks. For given snapshots of a network, observed over a period of time, recently developed communicability indices aim to identify the best broadcasters and listeners (receivers) in the network. Here we explore the attenuation factor constraint, in relation to the spectral radius (the largest eigenvalue) of the network at any point in time and its computation in the case of large networks. We compare three different communicability measures: standard, exponential, and relaxed (where the spectral radius bound on the attenuation factor is relaxed and the adjacency matrix is normalised, in order to maintain the convergence of the measure). Furthermore, using a vitality-based measure of both standard and relaxed communicability indices, we look at the ways of establishing the most important individuals for broadcasting and receiving of messages related to community bridging roles. We compare those measures with the scores produced by an iterative version of the PageRank algorithm and illustrate our findings with two examples of real-life evolving networks: the MIT reality mining data set, consisting of daily communications between 106 individuals over the period of one year, a UK Twitter mentions network, constructed from the direct \emph{tweets} between 12.4k individuals during one week, and a subset the Enron email data set

Central Archive at the University of Reading

Crossref

Open Research Online (The Open University)