35 research outputs found
Deep Learning Networks for Real-time Object Detection on Mobile Devices
The Activis works to provide a practical help for visual impaired people, developing an Android application to navigate in indoor environment. The need for an Object Detection model for this purpose is clear. In this thesis work are proposed and analyzed the performances of two models: SSD-Lite with Mobilenet V2 and Tiny-DSOD. It is also proposed a new dataset called Office dataset, useful to test the two algorithms.ope
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
To address the challenging task of instance-aware human part parsing, a new
bottom-up regime is proposed to learn category-level human semantic
segmentation as well as multi-person pose estimation in a joint and end-to-end
manner. It is a compact, efficient and powerful framework that exploits
structural information over different human granularities and eases the
difficulty of person partitioning. Specifically, a dense-to-sparse projection
field, which allows explicitly associating dense human semantics with sparse
keypoints, is learnt and progressively improved over the network feature
pyramid for robustness. Then, the difficult pixel grouping problem is cast as
an easier, multi-person joint assembling task. By formulating joint association
as maximum-weight bipartite matching, a differentiable solution is developed to
exploit projected gradient descent and Dykstra's cyclic projection algorithm.
This makes our method end-to-end trainable and allows back-propagating the
grouping error to directly supervise multi-granularity human representation
learning. This is distinguished from current bottom-up human parsers or pose
estimators which require sophisticated post-processing or heuristic greedy
algorithms. Experiments on three instance-aware human parsing datasets show
that our model outperforms other bottom-up alternatives with much more
efficient inference.Comment: CVPR 2021 (Oral). Code: https://github.com/tfzhou/MG-HumanParsin
Integral Human Pose Regression
State-of-the-art human pose estimation methods are based on heat map
representation. In spite of the good performance, the representation has a few
issues in nature, such as not differentiable and quantization error. This work
shows that a simple integral operation relates and unifies the heat map
representation and joint regression, thus avoiding the above issues. It is
differentiable, efficient, and compatible with any heat map based methods. Its
effectiveness is convincingly validated via comprehensive ablation experiments
under various settings, specifically on 3D pose estimation, for the first time
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Deep Learning-Based Action Recognition
The classification of human action or behavior patterns is very important for analyzing situations in the field and maintaining social safety. This book focuses on recent research findings on recognizing human action patterns. Technology for the recognition of human action pattern includes the processing technology of human behavior data for learning, technology of expressing feature values of images, technology of extracting spatiotemporal information of images, technology of recognizing human posture, and technology of gesture recognition. Research on these technologies has recently been conducted using general deep learning network modeling of artificial intelligence technology, and excellent research results have been included in this edition
Advanced Biometrics with Deep Learning
Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others
A Review of Object Visual Detection for Intelligent Vehicles
This paper contains the details of different object detection (OD) techniques, object iden-tification's relationship with video investigation, and picture understanding, it has pulled in much exploration consideration as of late. Customary item identification strat-egies are based on high-quality highlights and shallow teachable models. This survey paper presents one such strategy which is named as Optical Flow method (OFM). This strategy is discovered to be stronger and more effective for moving item recognition and the equivalent has been appeared by an investigation in this review paper. Applying optical stream to a picture gives stream vectors of the focuses comparing to the moving items. Next piece of denoting the necessary moving object of interest checks to the post-preparing. Post handling is the real commitment of the review paper for moving item identification issues. Their presentation effectively deteriorates by developing com-plex troupes which join numerous low-level picture highlights with significant level set-ting from object indicators and scene classifiers. With the fast advancement in profound learning, all the more useful assets, which can learn semantic, significant level, further highlights, are acquainted with address the issues existing in customary designs. These models carry on contrastingly in network design, preparing system, and advancement work, and so on in this review paper, we give an audit on profound learning-based item location systems. Our survey starts with a short presentation on the historical backdrop of profound learning and its agent device, in particular, Convolutional Neural Network (CNN) and region-based convolutional neural networks (R-CNN)