816 research outputs found
A Taxonomy of Deep Convolutional Neural Nets for Computer Vision
Traditional architectures for solving computer vision problems and the degree
of success they enjoyed have been heavily reliant on hand-crafted features.
However, of late, deep learning techniques have offered a compelling
alternative -- that of automatically learning problem-specific features. With
this new paradigm, every problem in computer vision is now being re-examined
from a deep learning perspective. Therefore, it has become important to
understand what kind of deep networks are suitable for a given problem.
Although general surveys of this fast-moving paradigm (i.e. deep-networks)
exist, a survey specific to computer vision is missing. We specifically
consider one form of deep networks widely used in computer vision -
convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN
and then examine the broad variations proposed over time to suit different
applications. We hope that our recipe-style survey will serve as a guide,
particularly for novice practitioners intending to use deep-learning techniques
for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm
Memory-Efficient Deep Salient Object Segmentation Networks on Gridized Superpixels
Computer vision algorithms with pixel-wise labeling tasks, such as semantic
segmentation and salient object detection, have gone through a significant
accuracy increase with the incorporation of deep learning. Deep segmentation
methods slightly modify and fine-tune pre-trained networks that have hundreds
of millions of parameters. In this work, we question the need to have such
memory demanding networks for the specific task of salient object segmentation.
To this end, we propose a way to learn a memory-efficient network from scratch
by training it only on salient object detection datasets. Our method encodes
images to gridized superpixels that preserve both the object boundaries and the
connectivity rules of regular pixels. This representation allows us to use
convolutional neural networks that operate on regular grids. By using these
encoded images, we train a memory-efficient network using only 0.048\% of the
number of parameters that other deep salient object detection networks have.
Our method shows comparable accuracy with the state-of-the-art deep salient
object detection methods and provides a faster and a much more memory-efficient
alternative to them. Due to its easy deployment, such a network is preferable
for applications in memory limited devices such as mobile phones and IoT
devices.Comment: 6 pages, submitted to MMSP 201
Face Centered Image Analysis Using Saliency and Deep Learning Based Techniques
Image analysis starts with the purpose of configuring vision machines that can perceive like human to intelligently infer general principles and sense the surrounding situations from imagery. This dissertation studies the face centered image analysis as the core problem in high level computer vision research and addresses the problem by tackling three challenging subjects: Are there anything interesting in the image? If there is, what is/are that/they? If there is a person presenting, who is he/she? What kind of expression he/she is performing? Can we know his/her age? Answering these problems results in the saliency-based object detection, deep learning structured objects categorization and recognition, human facial landmark detection and multitask biometrics.
To implement object detection, a three-level saliency detection based on the self-similarity technique (SMAP) is firstly proposed in the work. The first level of SMAP accommodates statistical methods to generate proto-background patches, followed by the second level that implements local contrast computation based on image self-similarity characteristics. At last, the spatial color distribution constraint is considered to realize the saliency detection. The outcome of the algorithm is a full resolution image with highlighted saliency objects and well-defined edges.
In object recognition, the Adaptive Deconvolution Network (ADN) is implemented to categorize the objects extracted from saliency detection. To improve the system performance, L1/2 norm regularized ADN has been proposed and tested in different applications. The results demonstrate the efficiency and significance of the new structure.
To fully understand the facial biometrics related activity contained in the image, the low rank matrix decomposition is introduced to help locate the landmark points on the face images. The natural extension of this work is beneficial in human facial expression recognition and facial feature parsing research.
To facilitate the understanding of the detected facial image, the automatic facial image analysis becomes essential. We present a novel deeply learnt tree-structured face representation to uniformly model the human face with different semantic meanings. We show that the proposed feature yields unified representation in multi-task facial biometrics and the multi-task learning framework is applicable to many other computer vision tasks
Statistical analysis driven optimized deep learning system for intrusion detection
Attackers have developed ever more sophisticated and intelligent ways to hack
information and communication technology systems. The extent of damage an
individual hacker can carry out upon infiltrating a system is well understood.
A potentially catastrophic scenario can be envisaged where a nation-state
intercepting encrypted financial data gets hacked. Thus, intelligent
cybersecurity systems have become inevitably important for improved protection
against malicious threats. However, as malware attacks continue to dramatically
increase in volume and complexity, it has become ever more challenging for
traditional analytic tools to detect and mitigate threat. Furthermore, a huge
amount of data produced by large networks has made the recognition task even
more complicated and challenging. In this work, we propose an innovative
statistical analysis driven optimized deep learning system for intrusion
detection. The proposed intrusion detection system (IDS) extracts optimized and
more correlated features using big data visualization and statistical analysis
methods (human-in-the-loop), followed by a deep autoencoder for potential
threat detection. Specifically, a pre-processing module eliminates the outliers
and converts categorical variables into one-hot-encoded vectors. The feature
extraction module discard features with null values and selects the most
significant features as input to the deep autoencoder model (trained in a
greedy-wise manner). The NSL-KDD dataset from the Canadian Institute for
Cybersecurity is used as a benchmark to evaluate the feasibility and
effectiveness of the proposed architecture. Simulation results demonstrate the
potential of our proposed system and its outperformance as compared to existing
state-of-the-art methods and recently published novel approaches. Ongoing work
includes further optimization and real-time evaluation of our proposed IDS.Comment: To appear in the 9th International Conference on Brain Inspired
Cognitive Systems (BICS 2018
Saliency Prediction for Mobile User Interfaces
We introduce models for saliency prediction for mobile user interfaces. A
mobile interface may include elements like buttons, text, etc. in addition to
natural images which enable performing a variety of tasks. Saliency in natural
images is a well studied area. However, given the difference in what
constitutes a mobile interface, and the usage context of these devices, we
postulate that saliency prediction for mobile interface images requires a fresh
approach. Mobile interface design involves operating on elements, the building
blocks of the interface. We first collected eye-gaze data from mobile devices
for free viewing task. Using this data, we develop a novel autoencoder based
multi-scale deep learning model that provides saliency prediction at the mobile
interface element level. Compared to saliency prediction approaches developed
for natural images, we show that our approach performs significantly better on
a range of established metrics.Comment: Paper accepted at WACV 201
- …