2,069 research outputs found
Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition
Generative Adversarial Networks (GAN) have attracted much research attention
recently, leading to impressive results for natural image generation. However,
to date little success was observed in using GAN generated images for improving
classification tasks. Here we attempt to explore, in the context of car license
plate recognition, whether it is possible to generate synthetic training data
using GAN to improve recognition accuracy. With a carefully-designed pipeline,
we show that the answer is affirmative. First, a large-scale image set is
generated using the generator of GAN, without manual annotation. Then, these
images are fed to a deep convolutional neural network (DCNN) followed by a
bidirectional recurrent neural network (BRNN) with long short-term memory
(LSTM), which performs the feature learning and sequence labelling. Finally,
the pre-trained model is fine-tuned on real images. Our experimental results on
a few data sets demonstrate the effectiveness of using GAN images: an
improvement of 7.5% over a strong baseline with moderate-sized real data being
available. We show that the proposed framework achieves competitive recognition
accuracy on challenging test datasets. We also leverage the depthwise separate
convolution to construct a lightweight convolutional RNN, which is about half
size and 2x faster on CPU. Combining this framework and the proposed pipeline,
we make progress in performing accurate recognition on mobile and embedded
devices
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to
Computer Vision and Image Processing. Some of the exciting application areas of
CNN include Image Classification and Segmentation, Object Detection, Video
Processing, Natural Language Processing, and Speech Recognition. The powerful
learning ability of deep CNN is primarily due to the use of multiple feature
extraction stages that can automatically learn representations from the data.
The availability of a large amount of data and improvement in the hardware
technology has accelerated the research in CNNs, and recently interesting deep
CNN architectures have been reported. Several inspiring ideas to bring
advancements in CNNs have been explored, such as the use of different
activation and loss functions, parameter optimization, regularization, and
architectural innovations. However, the significant improvement in the
representational capacity of the deep CNN is achieved through architectural
innovations. Notably, the ideas of exploiting spatial and channel information,
depth and width of architecture, and multi-path information processing have
gained substantial attention. Similarly, the idea of using a block of layers as
a structural unit is also gaining popularity. This survey thus focuses on the
intrinsic taxonomy present in the recently reported deep CNN architectures and,
consequently, classifies the recent innovations in CNN architectures into seven
different categories. These seven categories are based on spatial exploitation,
depth, multi-path, width, feature-map exploitation, channel boosting, and
attention. Additionally, the elementary understanding of CNN components,
current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11.
Artif Intell Rev (2020
Context-Aware Mixed Reality: A Framework for Ubiquitous Interaction
Mixed Reality (MR) is a powerful interactive technology that yields new types
of user experience. We present a semantic based interactive MR framework that
exceeds the current geometry level approaches, a step change in generating
high-level context-aware interactions. Our key insight is to build semantic
understanding in MR that not only can greatly enhance user experience through
object-specific behaviours, but also pave the way for solving complex
interaction design challenges. The framework generates semantic properties of
the real world environment through dense scene reconstruction and deep image
understanding. We demonstrate our approach with a material-aware prototype
system for generating context-aware physical interactions between the real and
the virtual objects. Quantitative and qualitative evaluations are carried out
and the results show that the framework delivers accurate and fast semantic
information in interactive MR environment, providing effective semantic level
interactions
Convolutional Neural Networks for Automatic Meter Reading
In this paper, we tackle Automatic Meter Reading (AMR) by leveraging the high
capability of Convolutional Neural Networks (CNNs). We design a two-stage
approach that employs the Fast-YOLO object detector for counter detection and
evaluates three different CNN-based approaches for counter recognition. In the
AMR literature, most datasets are not available to the research community since
the images belong to a service company. In this sense, we introduce a new
public dataset, called UFPR-AMR dataset, with 2,000 fully and manually
annotated images. This dataset is, to the best of our knowledge, three times
larger than the largest public dataset found in the literature and contains a
well-defined evaluation protocol to assist the development and evaluation of
AMR methods. Furthermore, we propose the use of a data augmentation technique
to generate a balanced training set with many more examples to train the CNN
models for counter recognition. In the proposed dataset, impressive results
were obtained and a detailed speed/accuracy trade-off evaluation of each model
was performed. In a public dataset, state-of-the-art results were achieved
using less than 200 images for training
V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation
Convolutional Neural Networks (CNNs) have been recently employed to solve
problems from both the computer vision and medical image analysis fields.
Despite their popularity, most approaches are only able to process 2D images
while most medical data used in clinical practice consists of 3D volumes. In
this work we propose an approach to 3D image segmentation based on a
volumetric, fully convolutional, neural network. Our CNN is trained end-to-end
on MRI volumes depicting prostate, and learns to predict segmentation for the
whole volume at once. We introduce a novel objective function, that we optimise
during training, based on Dice coefficient. In this way we can deal with
situations where there is a strong imbalance between the number of foreground
and background voxels. To cope with the limited number of annotated volumes
available for training, we augment the data applying random non-linear
transformations and histogram matching. We show in our experimental evaluation
that our approach achieves good performances on challenging test data while
requiring only a fraction of the processing time needed by other previous
methods
Hough-CNN: Deep Learning for Segmentation of Deep Brain Regions in MRI and Ultrasound
In this work we propose a novel approach to perform segmentation by
leveraging the abstraction capabilities of convolutional neural networks
(CNNs). Our method is based on Hough voting, a strategy that allows for fully
automatic localisation and segmentation of the anatomies of interest. This
approach does not only use the CNN classification outcomes, but it also
implements voting by exploiting the features produced by the deepest portion of
the network. We show that this learning-based segmentation method is robust,
multi-region, flexible and can be easily adapted to different modalities. In
the attempt to show the capabilities and the behaviour of CNNs when they are
applied to medical image analysis, we perform a systematic study of the
performances of six different network architectures, conceived according to
state-of-the-art criteria, in various situations. We evaluate the impact of
both different amount of training data and different data dimensionality (2D,
2.5D and 3D) on the final results. We show results on both MRI and transcranial
US volumes depicting respectively 26 regions of the basal ganglia and the
midbrain
SCOPS: Self-Supervised Co-Part Segmentation
Parts provide a good intermediate representation of objects that is robust
with respect to the camera, pose and appearance variations. Existing works on
part segmentation is dominated by supervised approaches that rely on large
amounts of manual annotations and can not generalize to unseen object
categories. We propose a self-supervised deep learning approach for part
segmentation, where we devise several loss functions that aids in predicting
part segments that are geometrically concentrated, robust to object variations
and are also semantically consistent across different object instances.
Extensive experiments on different types of image collections demonstrate that
our approach can produce part segments that adhere to object boundaries and
also more semantically consistent across object instances compared to existing
self-supervised techniques.Comment: Accepted in CVPR 2019. Project page:
http://varunjampani.github.io/scop
AON: Towards Arbitrarily-Oriented Text Recognition
Recognizing text from natural images is a hot research topic in computer
vision due to its various applications. Despite the enduring research of
several decades on optical character recognition (OCR), recognizing texts from
natural images is still a challenging task. This is because scene texts are
often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted)
arrangements, which have not yet been well addressed in the literature.
Existing methods on text recognition mainly work with regular (horizontal and
frontal) texts and cannot be trivially generalized to handle irregular texts.
In this paper, we develop the arbitrary orientation network (AON) to directly
capture the deep features of irregular texts, which are combined into an
attention-based decoder to generate character sequence. The whole network can
be trained end-to-end by using only images and word-level annotations.
Extensive experiments on various benchmarks, including the CUTE80,
SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed
AON-based method achieves the-state-of-the-art performance in irregular
datasets, and is comparable to major existing methods in regular datasets.Comment: Accepted by CVPR201
URBAN-i: From urban scenes to mapping slums, transport modes, and pedestrians in cities using deep learning and computer vision
Within the burgeoning expansion of deep learning and computer vision across
the different fields of science, when it comes to urban development, deep
learning and computer vision applications are still limited towards the notions
of smart cities and autonomous vehicles. Indeed, a wide gap of knowledge
appears when it comes to cities and urban regions in less developed countries
where the chaos of informality is the dominant scheme. How can deep learning
and Artificial Intelligence (AI) untangle the complexities of informality to
advance urban modelling and our understanding of cities? Various questions and
debates can be raised concerning the future of cities of the North and the
South in the paradigm of AI and computer vision. In this paper, we introduce a
new method for multipurpose realistic-dynamic urban modelling relying on deep
learning and computer vision, using deep Convolutional Neural Networks (CNN),
to sense and detect informality and slums in urban scenes from aerial and
street view images in addition to detection of pedestrian and transport modes.
The model has been trained on images of urban scenes in cities across the
globe. The model shows a good validation of understanding a wide spectrum of
nuances among the planned and the unplanned regions, including informal and
slum areas. We attempt to advance urban modelling for better understanding the
dynamics of city developments. We also aim to exemplify the significant impacts
of AI in cities beyond how smart cities are discussed and perceived in the
mainstream. The algorithms of the URBAN-i model are fully-coded in Python
programming with the pre-trained deep learning models to be used as a tool for
mapping and city modelling in the various corner of the globe, including
informal settlements and slum regions.Comment: 12 pages, 9 figure
SGDN: Segmentation-Based Grasp Detection Network For Unsymmetrical Three-Finger Gripper
In this paper, we present Segmentation-Based Grasp Detection Network (SGDN)
to predict a feasible robotic grasping for a unsymmetrical three-finger robotic
gripper using RGB images. The feasible grasping of a target should be a
collection of grasp regions with the same grasp angle and width. In other
words, a simplified planar grasp representation should be pixel-level rather
than region-level such as five-dimensional grasp representation.Therefore, we
propose a pixel-level grasp representation, oriented base-fixed triangle. It is
also more suitable for unsymmetrical three-finger gripper which cannot grasp
symmetrically when grasping some objects, the grasp angle is at [0, 2{\pi})
instead of [0, {\pi}) of parallel plate gripper.In order to predict the
appropriate grasp region and its corresponding grasp angle and width in the RGB
image, SGDN uses DeepLabv3+ as a feature extractor, and uses a three-channel
grasp predictor to predict feasible oriented base-fixed triangle grasp
representation of each pixel.On the re-annotated Cornell Grasp Dataset, our
model achieves an accuracy of 96.8% and 92.27% on image-wise split and
object-wise split respectively, and obtains accurate predictions consistent
with the state-of-the-art methods.Comment: 9 pages, 8 figures. arXiv admin note: text overlap with
arXiv:1803.02209 by other author
- …