31,811 research outputs found
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
Integrating and Ranking Uncertain Scientific Data
Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates
HitFraud: A Broad Learning Approach for Collective Fraud Detection in Heterogeneous Information Networks
On electronic game platforms, different payment transactions have different
levels of risk. Risk is generally higher for digital goods in e-commerce.
However, it differs based on product and its popularity, the offer type
(packaged game, virtual currency to a game or subscription service), storefront
and geography. Existing fraud policies and models make decisions independently
for each transaction based on transaction attributes, payment velocities, user
characteristics, and other relevant information. However, suspicious
transactions may still evade detection and hence we propose a broad learning
approach leveraging a graph based perspective to uncover relationships among
suspicious transactions, i.e., inter-transaction dependency. Our focus is to
detect suspicious transactions by capturing common fraudulent behaviors that
would not be considered suspicious when being considered in isolation. In this
paper, we present HitFraud that leverages heterogeneous information networks
for collective fraud detection by exploring correlated and fast evolving
fraudulent behaviors. First, a heterogeneous information network is designed to
link entities of interest in the transaction database via different semantics.
Then, graph based features are efficiently discovered from the network
exploiting the concept of meta-paths, and decisions on frauds are made
collectively on test instances. Experiments on real-world payment transaction
data from Electronic Arts demonstrate that the prediction performance is
effectively boosted by HitFraud with fast convergence where the computation of
meta-path based features is largely optimized. Notably, recall can be improved
up to 7.93% and F-score 4.62% compared to baselines.Comment: ICDM 201
- …