2,022 research outputs found
Ranking Semantics Based on Subgraphs Analysis
An abstract argumentation framework [15] consists of a direct graph where nodes represent arguments and arrows represent an attack relation among arguments. A semantics is used to evaluate arguments’ acceptability. In the labelling approach [7], this evaluation is done by assigning to each argument a label in, out or undec, meaning that the argument is considered consistently acceptable, non-acceptable or undecided (i.e. no decision can be taken on arguments’ acceptability)
HitFraud: A Broad Learning Approach for Collective Fraud Detection in Heterogeneous Information Networks
On electronic game platforms, different payment transactions have different
levels of risk. Risk is generally higher for digital goods in e-commerce.
However, it differs based on product and its popularity, the offer type
(packaged game, virtual currency to a game or subscription service), storefront
and geography. Existing fraud policies and models make decisions independently
for each transaction based on transaction attributes, payment velocities, user
characteristics, and other relevant information. However, suspicious
transactions may still evade detection and hence we propose a broad learning
approach leveraging a graph based perspective to uncover relationships among
suspicious transactions, i.e., inter-transaction dependency. Our focus is to
detect suspicious transactions by capturing common fraudulent behaviors that
would not be considered suspicious when being considered in isolation. In this
paper, we present HitFraud that leverages heterogeneous information networks
for collective fraud detection by exploring correlated and fast evolving
fraudulent behaviors. First, a heterogeneous information network is designed to
link entities of interest in the transaction database via different semantics.
Then, graph based features are efficiently discovered from the network
exploiting the concept of meta-paths, and decisions on frauds are made
collectively on test instances. Experiments on real-world payment transaction
data from Electronic Arts demonstrate that the prediction performance is
effectively boosted by HitFraud with fast convergence where the computation of
meta-path based features is largely optimized. Notably, recall can be improved
up to 7.93% and F-score 4.62% compared to baselines.Comment: ICDM 201
Discovering Informative Connection Subgraphs in Multi-Relational Graphs
Discovering patterns in graphs has long been an area of interest. In most approaches to such pattern discovery either quantitative anomalies, frequency of substructure or maximum flow is used to measure the interestingness of a pattern. In this paper we introduce heuristics that guide a subgraph discovery algorithm away from banal paths towards more informative ones. Given an RDF graph a user might pose a question of the form: What are the most relevant ways in which entity X is related to entity Y? the response to which is a subgraph connecting X to Y. We use our heuristics to discover informative subgraphs within RDF graphs. Our heuristics are based on weighting mechanisms derived from edge semantics suggested by the RDF schema. We present an analysis of the quality of the subgraphs generated with respect to path ranking metrics. We then conclude presenting intuitions about which of our weighting schemes and heuristics produce higher quality subgraphs
An approach to graph-based analysis of textual documents
In this paper a new graph-based model is proposed for the representation of textual documents. Graph-structures are obtained from textual documents by making use of the well-known Part-Of-Speech (POS) tagging technique. More specifically, a simple rule-based (re) classifier is used to map each tag onto graph vertices and edges. As a result, a decomposition of textual documents is obtained where tokens are automatically parsed and attached to either a vertex or an edge. It is shown how textual documents can be aggregated through their graph-structures and finally, it is shown how vertex-ranking methods can be used to find relevant tokens.(1)
Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs
Many problems in areas as diverse as recommendation systems, social network
analysis, semantic search, and distributed root cause analysis can be modeled
as pattern search on labeled graphs (also called "heterogeneous information
networks" or HINs). Given a large graph and a query pattern with node and edge
label constraints, a fundamental challenge is to nd the top-k matches ac-
cording to a ranking function over edge and node weights. For users, it is di
cult to select value k . We therefore propose the novel notion of an any-k
ranking algorithm: for a given time budget, re- turn as many of the top-ranked
results as possible. Then, given additional time, produce the next lower-ranked
results quickly as well. It can be stopped anytime, but may have to continues
until all results are returned. This paper focuses on acyclic patterns over
arbitrary labeled graphs. We are interested in practical algorithms that
effectively exploit (1) properties of heterogeneous networks, in particular
selective constraints on labels, and (2) that the users often explore only a
fraction of the top-ranked results. Our solution, KARPET, carefully integrates
aggressive pruning that leverages the acyclic nature of the query, and
incremental guided search. It enables us to prove strong non-trivial time and
space guarantees, which is generally considered very hard for this type of
graph search problem. Through experimental studies we show that KARPET achieves
running times in the order of milliseconds for tree patterns on large networks
with millions of nodes and edges.Comment: To appear in WWW 201
Integrating and Ranking Uncertain Scientific Data
Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates
- …