26,195 research outputs found
Visualizing Image Content to Explain Novel Image Discovery
The initial analysis of any large data set can be divided into two phases:
(1) the identification of common trends or patterns and (2) the identification
of anomalies or outliers that deviate from those trends. We focus on the goal
of detecting observations with novel content, which can alert us to artifacts
in the data set or, potentially, the discovery of previously unknown phenomena.
To aid in interpreting and diagnosing the novel aspect of these selected
observations, we recommend the use of novelty detection methods that generate
explanations. In the context of large image data sets, these explanations
should highlight what aspect of a given image is new (color, shape, texture,
content) in a human-comprehensible form. We propose DEMUD-VIS, the first method
for providing visual explanations of novel image content by employing a
convolutional neural network (CNN) to extract image features, a method that
uses reconstruction error to detect novel content, and an up-convolutional
network to convert CNN feature representations back into image space. We
demonstrate this approach on diverse images from ImageNet, freshwater streams,
and the surface of Mars.Comment: Under Revie
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
Seeing Behind the Camera: Identifying the Authorship of a Photograph
We introduce the novel problem of identifying the photographer behind a
photograph. To explore the feasibility of current computer vision techniques to
address this problem, we created a new dataset of over 180,000 images taken by
41 well-known photographers. Using this dataset, we examined the effectiveness
of a variety of features (low and high-level, including CNN features) at
identifying the photographer. We also trained a new deep convolutional neural
network for this task. Our results show that high-level features greatly
outperform low-level features. We provide qualitative results using these
learned models that give insight into our method's ability to distinguish
between photographers, and allow us to draw interesting conclusions about what
specific photographers shoot. We also demonstrate two applications of our
method.Comment: Dataset downloadable at http://www.cs.pitt.edu/~chris/photographer To
Appear in CVPR 201
- …