4,784 research outputs found
Automatic Metadata Generation using Associative Networks
In spite of its tremendous value, metadata is generally sparse and
incomplete, thereby hampering the effectiveness of digital information
services. Many of the existing mechanisms for the automated creation of
metadata rely primarily on content analysis which can be costly and
inefficient. The automatic metadata generation system proposed in this article
leverages resource relationships generated from existing metadata as a medium
for propagation from metadata-rich to metadata-poor resources. Because of its
independence from content analysis, it can be applied to a wide variety of
resource media types and is shown to be computationally inexpensive. The
proposed method operates through two distinct phases. Occurrence and
co-occurrence algorithms first generate an associative network of repository
resources leveraging existing repository metadata. Second, using the
associative network as a substrate, metadata associated with metadata-rich
resources is propagated to metadata-poor resources by means of a discrete-form
spreading activation algorithm. This article discusses the general framework
for building associative networks, an algorithm for disseminating metadata
through such networks, and the results of an experiment and validation of the
proposed method using a standard bibliographic dataset
The Child is Father of the Man: Foresee the Success at the Early Stage
Understanding the dynamic mechanisms that drive the high-impact scientific
work (e.g., research papers, patents) is a long-debated research topic and has
many important implications, ranging from personal career development and
recruitment search, to the jurisdiction of research resources. Recent advances
in characterizing and modeling scientific success have made it possible to
forecast the long-term impact of scientific work, where data mining techniques,
supervised learning in particular, play an essential role. Despite much
progress, several key algorithmic challenges in relation to predicting
long-term scientific impact have largely remained open. In this paper, we
propose a joint predictive model to forecast the long-term scientific impact at
the early stage, which simultaneously addresses a number of these open
challenges, including the scholarly feature design, the non-linearity, the
domain-heterogeneity and dynamics. In particular, we formulate it as a
regularized optimization problem and propose effective and scalable algorithms
to solve it. We perform extensive empirical evaluations on large, real
scholarly data sets to validate the effectiveness and the efficiency of our
method.Comment: Correct some typos in our KDD pape
HitFraud: A Broad Learning Approach for Collective Fraud Detection in Heterogeneous Information Networks
On electronic game platforms, different payment transactions have different
levels of risk. Risk is generally higher for digital goods in e-commerce.
However, it differs based on product and its popularity, the offer type
(packaged game, virtual currency to a game or subscription service), storefront
and geography. Existing fraud policies and models make decisions independently
for each transaction based on transaction attributes, payment velocities, user
characteristics, and other relevant information. However, suspicious
transactions may still evade detection and hence we propose a broad learning
approach leveraging a graph based perspective to uncover relationships among
suspicious transactions, i.e., inter-transaction dependency. Our focus is to
detect suspicious transactions by capturing common fraudulent behaviors that
would not be considered suspicious when being considered in isolation. In this
paper, we present HitFraud that leverages heterogeneous information networks
for collective fraud detection by exploring correlated and fast evolving
fraudulent behaviors. First, a heterogeneous information network is designed to
link entities of interest in the transaction database via different semantics.
Then, graph based features are efficiently discovered from the network
exploiting the concept of meta-paths, and decisions on frauds are made
collectively on test instances. Experiments on real-world payment transaction
data from Electronic Arts demonstrate that the prediction performance is
effectively boosted by HitFraud with fast convergence where the computation of
meta-path based features is largely optimized. Notably, recall can be improved
up to 7.93% and F-score 4.62% compared to baselines.Comment: ICDM 201
- …