18,581 research outputs found
MegDet: A Large Mini-Batch Object Detector
The improvements in recent CNN-based object detection works, from R-CNN [11],
Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly
come from new network, new framework, or novel loss design. But mini-batch
size, a key factor in the training, has not been well studied. In this paper,
we propose a Large MiniBatch Object Detector (MegDet) to enable the training
with much larger mini-batch size than before (e.g. from 16 to 256), so that we
can effectively utilize multiple GPUs (up to 128 in our experiments) to
significantly shorten the training time. Technically, we suggest a learning
rate policy and Cross-GPU Batch Normalization, which together allow us to
successfully train a large mini-batch detector in much less time (e.g., from 33
hours to 4 hours), and achieve even better accuracy. The MegDet is the backbone
of our submission (mmAP 52.5%) to COCO 2017 Challenge, where we won the 1st
place of Detection task
Deep Feature-based Face Detection on Mobile Devices
We propose a deep feature-based face detector for mobile devices to detect
user's face acquired by the front facing camera. The proposed method is able to
detect faces in images containing extreme pose and illumination variations as
well as partial faces. The main challenge in developing deep feature-based
algorithms for mobile devices is the constrained nature of the mobile platform
and the non-availability of CUDA enabled GPUs on such devices. Our
implementation takes into account the special nature of the images captured by
the front-facing camera of mobile devices and exploits the GPUs present in
mobile devices without CUDA-based frameorks, to meet these challenges.Comment: ISBA 201
Strategies for Searching Video Content with Text Queries or Video Examples
The large number of user-generated videos uploaded on to the Internet
everyday has led to many commercial video search engines, which mainly rely on
text metadata for search. However, metadata is often lacking for user-generated
videos, thus these videos are unsearchable by current search engines.
Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity
problem by directly analyzing the visual and audio streams of each video. CBVR
encompasses multiple research topics, including low-level feature design,
feature fusion, semantic detector training and video search/reranking. We
present novel strategies in these topics to enhance CBVR in both accuracy and
speed under different query inputs, including pure textual queries and query by
video examples. Our proposed strategies have been incorporated into our
submission for the TRECVID 2014 Multimedia Event Detection evaluation, where
our system outperformed other submissions in both text queries and video
example queries, thus demonstrating the effectiveness of our proposed
approaches
Deep Learning in the Automotive Industry: Applications and Tools
Deep Learning refers to a set of machine learning techniques that utilize
neural networks with many hidden layers for tasks, such as image
classification, speech recognition, language understanding. Deep learning has
been proven to be very effective in these domains and is pervasively used by
many Internet services. In this paper, we describe different automotive uses
cases for deep learning in particular in the domain of computer vision. We
surveys the current state-of-the-art in libraries, tools and infrastructures
(e.\,g.\ GPUs and clouds) for implementing, training and deploying deep neural
networks. We particularly focus on convolutional neural networks and computer
vision use cases, such as the visual inspection process in manufacturing plants
and the analysis of social media data. To train neural networks, curated and
labeled datasets are essential. In particular, both the availability and scope
of such datasets is typically very limited. A main contribution of this paper
is the creation of an automotive dataset, that allows us to learn and
automatically recognize different vehicle properties. We describe an end-to-end
deep learning application utilizing a mobile app for data collection and
process support, and an Amazon-based cloud backend for storage and training.
For training we evaluate the use of cloud and on-premises infrastructures
(including multiple GPUs) in conjunction with different neural network
architectures and frameworks. We assess both the training times as well as the
accuracy of the classifier. Finally, we demonstrate the effectiveness of the
trained classifier in a real world setting during manufacturing process.Comment: 10 page
Activity recognition from videos with parallel hypergraph matching on GPUs
In this paper, we propose a method for activity recognition from videos based
on sparse local features and hypergraph matching. We benefit from special
properties of the temporal domain in the data to derive a sequential and fast
graph matching algorithm for GPUs.
Traditionally, graphs and hypergraphs are frequently used to recognize
complex and often non-rigid patterns in computer vision, either through graph
matching or point-set matching with graphs. Most formulations resort to the
minimization of a difficult discrete energy function mixing geometric or
structural terms with data attached terms involving appearance features.
Traditional methods solve this minimization problem approximately, for instance
with spectral techniques.
In this work, instead of solving the problem approximatively, the exact
solution for the optimal assignment is calculated in parallel on GPUs. The
graphical structure is simplified and regularized, which allows to derive an
efficient recursive minimization algorithm. The algorithm distributes
subproblems over the calculation units of a GPU, which solves them in parallel,
allowing the system to run faster than real-time on medium-end GPUs
- …