980 research outputs found
Improving Small Object Proposals for Company Logo Detection
Many modern approaches for object detection are two-staged pipelines. The
first stage identifies regions of interest which are then classified in the
second stage. Faster R-CNN is such an approach for object detection which
combines both stages into a single pipeline. In this paper we apply Faster
R-CNN to the task of company logo detection. Motivated by its weak performance
on small object instances, we examine in detail both the proposal and the
classification stage with respect to a wide range of object sizes. We
investigate the influence of feature map resolution on the performance of those
stages.
Based on theoretical considerations, we introduce an improved scheme for
generating anchor proposals and propose a modification to Faster R-CNN which
leverages higher-resolution feature maps for small objects. We evaluate our
approach on the FlickrLogos dataset improving the RPN performance from 0.52 to
0.71 (MABO) and the detection performance from 0.52 to 0.67 (mAP).Comment: 8 Pages, ICMR 201
Ranking News-Quality Multimedia
News editors need to find the photos that best illustrate a news piece and
fulfill news-media quality standards, while being pressed to also find the most
recent photos of live events. Recently, it became common to use social-media
content in the context of news media for its unique value in terms of immediacy
and quality. Consequently, the amount of images to be considered and filtered
through is now too much to be handled by a person. To aid the news editor in
this process, we propose a framework designed to deliver high-quality,
news-press type photos to the user. The framework, composed of two parts, is
based on a ranking algorithm tuned to rank professional media highly and a
visual SPAM detection module designed to filter-out low-quality media. The core
ranking algorithm is leveraged by aesthetic, social and deep-learning semantic
features. Evaluation showed that the proposed framework is effective at finding
high-quality photos (true-positive rate) achieving a retrieval MAP of 64.5% and
a classification precision of 70%.Comment: To appear in ICMR'1
VideoAnalysis4ALL: An On-line Tool for the Automatic Fragmentation and Concept-based Annotation, and the Interactive Exploration of Videos.
This paper presents the VideoAnalysis4ALL tool that supports the automatic fragmentation and concept-based annotation of videos, and the exploration of the annotated video fragments through an interactive user interface. The developed web application decomposes the video into two different granularities, namely shots and scenes, and annotates each fragment by evaluating the existence of a number (several hundreds) of high-level visual concepts in the keyframes extracted from these fragments. Through the analysis the tool enables the identification and labeling of semantically coherent video fragments, while its user interfaces allow the discovery of these fragments with the help of human-interpretable concepts. The integrated state-of-the-art video analysis technologies perform very well and, by exploiting the processing capabilities of multi-thread / multi-core architectures, reduce the time required for analysis to approximately one third of the video’s duration, thus making the analysis three times faster than real-time processing
PANEL: Challenges for multimedia/multimodal research in the next decade
The multimedia and multimodal community is witnessing an
explosive transformation in the recent years with major
societal impact. With the unprecedented deployment of
multimedia devices and systems, multimedia research is
critical to our abilities and prospects in advancing state-of-theart technologies and solving real-world challenges facing the
society and the nation. To respond to these challenges and
further advance the frontiers of the field of multimedia, this
panel will discuss the challenges and visions that may guide
future research in the next ten years
Exquisitor: Breaking the Interaction Barrier for Exploration of 100 Million Images
International audienceIn this demonstration, we present Exquisitor, a media explorer capable of learning user preferences in real-time during interactions with the 99.2 million images of YFCC100M. Exquisitor owes its efficiency to innovations in data representation, compression, and indexing. Exquisitor can complete each interaction round, including learning preferences and presenting the most relevant results, in less than 30 ms using only a single CPU core and modest RAM. In short, Exquisitor can bring large-scale interactive learning to standard desktops and laptops, and even high-end mobile devices
A benchmark of visual storytelling in social media
CMUP-ERI/TIC/0046/2014Media editors in the newsroom are constantly pressed to provide a "like-being there" coverage of live events. Social media provides a disorganised collection of images and videos that media professionals need to grasp before publishing their latest news updated. Automated news visual storyline editing with social media content can be very challenging, as it not only entails the task of finding the right content but also making sure that news content evolves coherently over time. To tackle these issues, this paper proposes a benchmark for assessing social media visual storylines. The SocialStories benchmark, comprised by total of 40 curated stories covering sports and cultural events, provides the experimental setup and introduces novel quantitative metrics to perform a rigorous evaluation of visual storytelling with social media data.publishersversionpublishe
- …