10,141 research outputs found
MirBot: A collaborative object recognition system for smartphones using convolutional neural networks
MirBot is a collaborative application for smartphones that allows users to
perform object recognition. This app can be used to take a photograph of an
object, select the region of interest and obtain the most likely class (dog,
chair, etc.) by means of similarity search using features extracted from a
convolutional neural network (CNN). The answers provided by the system can be
validated by the user so as to improve the results for future queries. All the
images are stored together with a series of metadata, thus enabling a
multimodal incremental dataset labeled with synset identifiers from the WordNet
ontology. This dataset grows continuously thanks to the users' feedback, and is
publicly available for research. This work details the MirBot object
recognition system, analyzes the statistics gathered after more than four years
of usage, describes the image classification methodology, and performs an
exhaustive evaluation using handcrafted features, convolutional neural codes
and different transfer learning techniques. After comparing various models and
transformation methods, the results show that the CNN features maintain the
accuracy of MirBot constant over time, despite the increasing number of new
classes. The app is freely available at the Apple and Google Play stores.Comment: Accepted in Neurocomputing, 201
3D Face Tracking and Texture Fusion in the Wild
We present a fully automatic approach to real-time 3D face reconstruction
from monocular in-the-wild videos. With the use of a cascaded-regressor based
face tracking and a 3D Morphable Face Model shape fitting, we obtain a
semi-dense 3D face shape. We further use the texture information from multiple
frames to build a holistic 3D face representation from the video frames. Our
system is able to capture facial expressions and does not require any
person-specific training. We demonstrate the robustness of our approach on the
challenging 300 Videos in the Wild (300-VW) dataset. Our real-time fitting
framework is available as an open source library at http://4dface.org
- …