2,018 research outputs found
Intelligent visual media processing: when graphics meets vision
The computer graphics and computer vision communities have been working closely together in recent
years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media
around us. There are three major driving forces behind this phenomenon: i) the availability of big data from the
Internet has created a demand for dealing with the ever increasing, vast amount of resources; ii) powerful processing
tools, such as deep neural networks, provide eļæ½ective ways for learning how to deal with heterogeneous visual data;
iii) new data capture devices, such as the Kinect, bridge between algorithms for 2D image understanding and
3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics
and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey
recent research on how computer vision techniques beneļæ½t computer graphics techniques and vice versa, and cover
research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest
possible further research directions
RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection
The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy
for the limited application scenarios of traditional RGB camera. The RGB-X
tasks, which rely on RGB input and another type of data input to resolve
specific problems, have become a popular research topic in multimedia. A
crucial part in two-branch RGB-X deep neural networks is how to fuse
information across modalities. Given the tremendous information inside RGB-X
networks, previous works typically apply naive fusion (e.g., average or max
fusion) or only focus on the feature fusion at the same scale(s). While in this
paper, we propose a novel method called RXFOOD for the fusion of features
across different scales within the same modality branch and from different
modality branches simultaneously in a unified attention mechanism. An Energy
Exchange Module is designed for the interaction of each feature map's energy
matrix, who reflects the inter-relationship of different positions and
different channels inside a feature map. The RXFOOD method can be easily
incorporated to any dual-branch encoder-decoder network as a plug-in module,
and help the original backbone network better focus on important positions and
channels for object of interest detection. Experimental results on RGB-NIR
salient object detection, RGB-D salient object detection, and RGBFrequency
image manipulation detection demonstrate the clear effectiveness of the
proposed RXFOOD.Comment: 10 page
Information extraction from multimedia web documents: an open-source platform and testbed
The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval
- ā¦