803 research outputs found
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to
Computer Vision and Image Processing. Some of the exciting application areas of
CNN include Image Classification and Segmentation, Object Detection, Video
Processing, Natural Language Processing, and Speech Recognition. The powerful
learning ability of deep CNN is primarily due to the use of multiple feature
extraction stages that can automatically learn representations from the data.
The availability of a large amount of data and improvement in the hardware
technology has accelerated the research in CNNs, and recently interesting deep
CNN architectures have been reported. Several inspiring ideas to bring
advancements in CNNs have been explored, such as the use of different
activation and loss functions, parameter optimization, regularization, and
architectural innovations. However, the significant improvement in the
representational capacity of the deep CNN is achieved through architectural
innovations. Notably, the ideas of exploiting spatial and channel information,
depth and width of architecture, and multi-path information processing have
gained substantial attention. Similarly, the idea of using a block of layers as
a structural unit is also gaining popularity. This survey thus focuses on the
intrinsic taxonomy present in the recently reported deep CNN architectures and,
consequently, classifies the recent innovations in CNN architectures into seven
different categories. These seven categories are based on spatial exploitation,
depth, multi-path, width, feature-map exploitation, channel boosting, and
attention. Additionally, the elementary understanding of CNN components,
current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11.
Artif Intell Rev (2020
Towards real-time object recognition and pose estimation in point clouds
Object recognition and 6DoF pose estimation are quite challenging tasks in
computer vision applications. Despite efficiency in such tasks, standard
methods deliver far from real-time processing rates. This paper presents a
novel pipeline to estimate a fine 6DoF pose of objects, applied to realistic
scenarios in real-time. We split our proposal into three main parts. Firstly, a
Color feature classification leverages the use of pre-trained CNN color
features trained on the ImageNet for object detection. A Feature-based
registration module conducts a coarse pose estimation, and finally, a
Fine-adjustment step performs an ICP-based dense registration. Our proposal
achieves, in the best case, an accuracy performance of almost 83\% on the RGB-D
Scenes dataset. Regarding processing time, the object detection task is done at
a frame processing rate up to 90 FPS, and the pose estimation at almost 14 FPS
in a full execution strategy. We discuss that due to the proposal's modularity,
we could let the full execution occurs only when necessary and perform a
scheduled execution that unlocks real-time processing, even for multitask
situations.Comment: Accepted as Full paper at VISAPP202
A Survey on Content-Aware Video Analysis for Sports
Sports data analysis is becoming increasingly large-scale, diversified, and
shared, but difficulty persists in rapidly accessing the most crucial
information. Previous surveys have focused on the methodologies of sports video
analysis from the spatiotemporal viewpoint instead of a content-based
viewpoint, and few of these studies have considered semantics. This study
develops a deeper interpretation of content-aware sports video analysis by
examining the insight offered by research into the structure of content under
different scenarios. On the basis of this insight, we provide an overview of
the themes particularly relevant to the research on content-aware systems for
broadcast sports. Specifically, we focus on the video content analysis
techniques applied in sportscasts over the past decade from the perspectives of
fundamentals and general review, a content hierarchical model, and trends and
challenges. Content-aware analysis methods are discussed with respect to
object-, event-, and context-oriented groups. In each group, the gap between
sensation and content excitement must be bridged using proper strategies. In
this regard, a content-aware approach is required to determine user demands.
Finally, the paper summarizes the future trends and challenges for sports video
analysis. We believe that our findings can advance the field of research on
content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
A Survey on Deep Learning Methods for Robot Vision
Deep learning has allowed a paradigm shift in pattern recognition, from using
hand-crafted features together with statistical classifiers to using
general-purpose learning procedures for learning data-driven representations,
features, and classifiers together. The application of this new paradigm has
been particularly successful in computer vision, in which the development of
deep learning methods for vision applications has become a hot research topic.
Given that deep learning has already attracted the attention of the robot
vision community, the main purpose of this survey is to address the use of deep
learning in robot vision. To achieve this, a comprehensive overview of deep
learning and its usage in computer vision is given, that includes a description
of the most frequently used neural models and their main application areas.
Then, the standard methodology and tools used for designing deep-learning based
vision systems are presented. Afterwards, a review of the principal work using
deep learning in robot vision is presented, as well as current and future
trends related to the use of deep learning in robotics. This survey is intended
to be a guide for the developers of robot vision systems
A Cluster-Based Opposition Differential Evolution Algorithm Boosted by a Local Search for ECG Signal Classification
Electrocardiogram (ECG) signals, which capture the heart's electrical
activity, are used to diagnose and monitor cardiac problems. The accurate
classification of ECG signals, particularly for distinguishing among various
types of arrhythmias and myocardial infarctions, is crucial for the early
detection and treatment of heart-related diseases. This paper proposes a novel
approach based on an improved differential evolution (DE) algorithm for ECG
signal classification for enhancing the performance. In the initial stages of
our approach, the preprocessing step is followed by the extraction of several
significant features from the ECG signals. These extracted features are then
provided as inputs to an enhanced multi-layer perceptron (MLP). While MLPs are
still widely used for ECG signal classification, using gradient-based training
methods, the most widely used algorithm for the training process, has
significant disadvantages, such as the possibility of being stuck in local
optimums. This paper employs an enhanced differential evolution (DE) algorithm
for the training process as one of the most effective population-based
algorithms. To this end, we improved DE based on a clustering-based strategy,
opposition-based learning, and a local search. Clustering-based strategies can
act as crossover operators, while the goal of the opposition operator is to
improve the exploration of the DE algorithm. The weights and biases found by
the improved DE algorithm are then fed into six gradient-based local search
algorithms. In other words, the weights found by the DE are employed as an
initialization point. Therefore, we introduced six different algorithms for the
training process (in terms of different local search algorithms). In an
extensive set of experiments, we showed that our proposed training algorithm
could provide better results than the conventional training algorithms.Comment: 44 pages, 9 figure
Automatic human face detection in color images
Automatic human face detection in digital image has been an active area of research over the past decade. Among its numerous applications, face detection plays a key role in face recognition system for biometric personal identification, face tracking for intelligent human computer interface (HCI), and face segmentation for object-based video coding. Despite significant progress in the field in recent years, detecting human faces in unconstrained and complex images remains a challenging problem in computer vision. An automatic system that possesses a similar capability as the human vision system in detecting faces is still a far-reaching goal. This thesis focuses on the problem of detecting human laces in color images. Although many early face detection algorithms were designed to work on gray-scale Images, strong evidence exists to suggest face detection can be done more efficiently by taking into account color characteristics of the human face. In this thesis, we present a complete and systematic face detection algorithm that combines the strengths of both analytic and holistic approaches to face detection. The algorithm is developed to detect quasi-frontal faces in complex color Images. This face class, which represents typical detection scenarios in most practical applications of face detection, covers a wide range of face poses Including all in-plane rotations and some out-of-plane rotations. The algorithm is organized into a number of cascading stages including skin region segmentation, face candidate selection, and face verification. In each of these stages, various visual cues are utilized to narrow the search space for faces. In this thesis, we present a comprehensive analysis of skin detection using color pixel classification, and the effects of factors such as the color space, color classification algorithm on segmentation performance. We also propose a novel and efficient face candidate selection technique that is based on color-based eye region detection and a geometric face model. This candidate selection technique eliminates the computation-intensive step of window scanning often employed In holistic face detection, and simplifies the task of detecting rotated faces. Besides various heuristic techniques for face candidate verification, we developface/nonface classifiers based on the naive Bayesian model, and investigate three feature extraction schemes, namely intensity, projection on face subspace and edge-based. Techniques for improving face/nonface classification are also proposed, including bootstrapping, classifier combination and using contextual information. On a test set of face and nonface patterns, the combination of three Bayesian classifiers has a correct detection rate of 98.6% at a false positive rate of 10%. Extensive testing results have shown that the proposed face detector achieves good performance in terms of both detection rate and alignment between the detected faces and the true faces. On a test set of 200 images containing 231 faces taken from the ECU face detection database, the proposed face detector has a correct detection rate of 90.04% and makes 10 false detections. We have found that the proposed face detector is more robust In detecting in-plane rotated laces, compared to existing face detectors. +D2
Deep Learning for Generic Object Detection: A Survey
Object detection, one of the most fundamental and challenging problems in
computer vision, seeks to locate object instances from a large number of
predefined categories in natural images. Deep learning techniques have emerged
as a powerful strategy for learning feature representations directly from data
and have led to remarkable breakthroughs in the field of generic object
detection. Given this period of rapid evolution, the goal of this paper is to
provide a comprehensive survey of the recent achievements in this field brought
about by deep learning techniques. More than 300 research contributions are
included in this survey, covering many aspects of generic object detection:
detection frameworks, object feature representation, object proposal
generation, context modeling, training strategies, and evaluation metrics. We
finish the survey by identifying promising directions for future research.Comment: IJCV Mino
Accurate Online Video Tagging via Probabilistic Hybrid Modeling
Ministry of Education, Singapore under its Academic Research Funding Tier
Smart Cameras
We review camera architecture in the age of artificial intelligence. Modern
cameras use physical components and software to capture, compress and display
image data. Over the past 5 years, deep learning solutions have become superior
to traditional algorithms for each of these functions. Deep learning enables
10-100x reduction in electrical sensor power per pixel, 10x improvement in
depth of field and dynamic range and 10-100x improvement in image pixel count.
Deep learning enables multiframe and multiaperture solutions that fundamentally
shift the goals of physical camera design. Here we review the state of the art
of deep learning in camera operations and consider the impact of AI on the
physical design of cameras
- …