48,052 research outputs found
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image
3D reconstruction from a single image is a key problem in multiple
applications ranging from robotic manipulation to augmented reality. Prior
methods have tackled this problem through generative models which predict 3D
reconstructions as voxels or point clouds. However, these methods can be
computationally expensive and miss fine details. We introduce a new
differentiable layer for 3D data deformation and use it in DeformNet to learn a
model for 3D reconstruction-through-deformation. DeformNet takes an image
input, searches the nearest shape template from a database, and deforms the
template to match the query image. We evaluate our approach on the ShapeNet
dataset and show that - (a) the Free-Form Deformation layer is a powerful new
building block for Deep Learning models that manipulate 3D data (b) DeformNet
uses this FFD layer combined with shape retrieval for smooth and
detail-preserving 3D reconstruction of qualitatively plausible point clouds
with respect to a single query image (c) compared to other state-of-the-art 3D
reconstruction methods, DeformNet quantitatively matches or outperforms their
benchmarks by significant margins. For more information, visit:
https://deformnet-site.github.io/DeformNet-website/ .Comment: 11 pages, 9 figures, NIP
Sketch-based 3D Shape Retrieval using Convolutional Neural Networks
Retrieving 3D models from 2D human sketches has received considerable
attention in the areas of graphics, image retrieval, and computer vision.
Almost always in state of the art approaches a large amount of "best views" are
computed for 3D models, with the hope that the query sketch matches one of
these 2D projections of 3D models using predefined features.
We argue that this two stage approach (view selection -- matching) is
pragmatic but also problematic because the "best views" are subjective and
ambiguous, which makes the matching inputs obscure. This imprecise nature of
matching further makes it challenging to choose features manually. Instead of
relying on the elusive concept of "best views" and the hand-crafted features,
we propose to define our views using a minimalism approach and learn features
for both sketches and views. Specifically, we drastically reduce the number of
views to only two predefined directions for the whole dataset. Then, we learn
two Siamese Convolutional Neural Networks (CNNs), one for the views and one for
the sketches. The loss function is defined on the within-domain as well as the
cross-domain similarities. Our experiments on three benchmark datasets
demonstrate that our method is significantly better than state of the art
approaches, and outperforms them in all conventional metrics.Comment: CVPR 201
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
- …