2,022 research outputs found
Web-Scale Training for Face Identification
Scaling machine learning methods to very large datasets has attracted
considerable attention in recent years, thanks to easy access to ubiquitous
sensing and data from the web. We study face recognition and show that three
distinct properties have surprising effects on the transferability of deep
convolutional networks (CNN): (1) The bottleneck of the network serves as an
important transfer learning regularizer, and (2) in contrast to the common
wisdom, performance saturation may exist in CNN's (as the number of training
samples grows); we propose a solution for alleviating this by replacing the
naive random subsampling of the training set with a bootstrapping process.
Moreover, (3) we find a link between the representation norm and the ability to
discriminate in a target domain, which sheds lights on how such networks
represent faces. Based on these discoveries, we are able to improve face
recognition accuracy on the widely used LFW benchmark, both in the verification
(1:1) and identification (1:N) protocols, and directly compare, for the first
time, with the state of the art Commercially-Off-The-Shelf system and show a
sizable leap in performance
Analysis and Forecasting of Trending Topics in Online Media Streams
Among the vast information available on the web, social media streams capture
what people currently pay attention to and how they feel about certain topics.
Awareness of such trending topics plays a crucial role in multimedia systems
such as trend aware recommendation and automatic vocabulary selection for video
concept detection systems.
Correctly utilizing trending topics requires a better understanding of their
various characteristics in different social media streams. To this end, we
present the first comprehensive study across three major online and social
media streams, Twitter, Google, and Wikipedia, covering thousands of trending
topics during an observation period of an entire year. Our results indicate
that depending on one's requirements one does not necessarily have to turn to
Twitter for information about current events and that some media streams
strongly emphasize content of specific categories. As our second key
contribution, we further present a novel approach for the challenging task of
forecasting the life cycle of trending topics in the very moment they emerge.
Our fully automated approach is based on a nearest neighbor forecasting
technique exploiting our assumption that semantically similar topics exhibit
similar behavior.
We demonstrate on a large-scale dataset of Wikipedia page view statistics
that forecasts by the proposed approach are about 9-48k views closer to the
actual viewing statistics compared to baseline methods and achieve a mean
average percentage error of 45-19% for time periods of up to 14 days.Comment: ACM Multimedia 201
- …