14,731 research outputs found
End-to-End Text Recognition with Hybrid HMM Maxout Models
The problem of detecting and recognizing text in natural scenes has proved to
be more challenging than its counterpart in documents, with most of the
previous work focusing on a single part of the problem. In this work, we
propose new solutions to the character and word recognition problems and then
show how to combine these solutions in an end-to-end text-recognition system.
We do so by leveraging the recently introduced Maxout networks along with
hybrid HMM models that have proven useful for voice recognition. Using these
elements, we build a tunable and highly accurate recognition system that beats
state-of-the-art results on all the sub-problems for both the ICDAR 2003 and
SVT benchmark datasets.Comment: 9 pages, 7 figure
Approximation Algorithms for Cascading Prediction Models
We present an approximation algorithm that takes a pool of pre-trained models
as input and produces from it a cascaded model with similar accuracy but lower
average-case cost. Applied to state-of-the-art ImageNet classification models,
this yields up to a 2x reduction in floating point multiplications, and up to a
6x reduction in average-case memory I/O. The auto-generated cascades exhibit
intuitive properties, such as using lower-resolution input for easier images
and requiring higher prediction confidence when using a computationally cheaper
model
Adaptive Neural Networks for Efficient Inference
We present an approach to adaptively utilize deep neural networks in order to
reduce the evaluation time on new examples without loss of accuracy. Rather
than attempting to redesign or approximate existing networks, we propose two
schemes that adaptively utilize networks. We first pose an adaptive network
evaluation scheme, where we learn a system to adaptively choose the components
of a deep network to be evaluated for each example. By allowing examples
correctly classified using early layers of the system to exit, we avoid the
computational time associated with full evaluation of the network. We extend
this to learn a network selection system that adaptively selects the network to
be evaluated for each example. We show that computational time can be
dramatically reduced by exploiting the fact that many examples can be correctly
classified using relatively efficient networks and that complex,
computationally costly networks are only necessary for a small fraction of
examples. We pose a global objective for learning an adaptive early exit or
network selection policy and solve it by reducing the policy learning problem
to a layer-by-layer weighted binary classification problem. Empirically, these
approaches yield dramatic reductions in computational cost, with up to a 2.8x
speedup on state-of-the-art networks from the ImageNet image recognition
challenge with minimal (<1%) loss of top5 accuracy
Real time face recognition using adaboost improved fast PCA algorithm
This paper presents an automated system for human face recognition in a real
time background world for a large homemade dataset of persons face. The task is
very difficult as the real time background subtraction in an image is still a
challenge. Addition to this there is a huge variation in human face image in
terms of size, pose and expression. The system proposed collapses most of this
variance. To detect real time human face AdaBoost with Haar cascade is used and
a simple fast PCA and LDA is used to recognize the faces detected. The matched
face is then used to mark attendance in the laboratory, in our case. This
biometric system is a real time attendance system based on the human face
recognition with a simple and fast algorithms and gaining a high accuracy
rate..Comment: 14 pages; ISSN : 0975-900X (Online), 0976-2191 (Print
Text Flow: A Unified Text Detection System in Natural Scene Images
The prevalent scene text detection approach follows four sequential steps
comprising character candidate detection, false character candidate removal,
text line extraction, and text line verification. However, errors occur and
accumulate throughout each of these sequential steps which often lead to low
detection performance. To address these issues, we propose a unified scene text
detection system, namely Text Flow, by utilizing the minimum cost (min-cost)
flow network model. With character candidates detected by cascade boosting, the
min-cost flow network model integrates the last three sequential steps into a
single process which solves the error accumulation problem at both character
level and text line level effectively. The proposed technique has been tested
on three public datasets, i.e, ICDAR2011 dataset, ICDAR2013 dataset and a
multilingual dataset and it outperforms the state-of-the-art methods on all
three datasets with much higher recall and F-score. The good performance on the
multilingual dataset shows that the proposed technique can be used for the
detection of texts in different languages.Comment: 9 pages, ICCV 201
Deep Learning for Generic Object Detection: A Survey
Object detection, one of the most fundamental and challenging problems in
computer vision, seeks to locate object instances from a large number of
predefined categories in natural images. Deep learning techniques have emerged
as a powerful strategy for learning feature representations directly from data
and have led to remarkable breakthroughs in the field of generic object
detection. Given this period of rapid evolution, the goal of this paper is to
provide a comprehensive survey of the recent achievements in this field brought
about by deep learning techniques. More than 300 research contributions are
included in this survey, covering many aspects of generic object detection:
detection frameworks, object feature representation, object proposal
generation, context modeling, training strategies, and evaluation metrics. We
finish the survey by identifying promising directions for future research.Comment: IJCV Mino
A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities
The explosive growth in fake news and its erosion to democracy, justice, and
public trust has increased the demand for fake news detection and intervention.
This survey reviews and evaluates methods that can detect fake news from four
perspectives: (1) the false knowledge it carries, (2) its writing style, (3)
its propagation patterns, and (4) the credibility of its source. The survey
also highlights some potential research tasks based on the review. In
particular, we identify and detail related fundamental theories across various
disciplines to encourage interdisciplinary research on fake news. We hope this
survey can facilitate collaborative efforts among experts in computer and
information sciences, social sciences, political science, and journalism to
research fake news, where such efforts can lead to fake news detection that is
not only efficient but more importantly, explainable.Comment: ACM Computing Surveys (CSUR), 37 page
Machine Learning for Heterogeneous Ultra-Dense Networks with Graphical Representations
Heterogeneous ultra-dense network (H-UDN) is envisioned as a promising
solution to sustain the explosive mobile traffic demand through network
densification. By placing access points, processors, and storage units as close
as possible to mobile users, H-UDNs bring forth a number of advantages,
including high spectral efficiency, high energy efficiency, and low latency.
Nonetheless, the high density and diversity of network entities in H-UDNs
introduce formidable design challenges in collaborative signal processing and
resource management. This article illustrates the great potential of machine
learning techniques in solving these challenges. In particular, we show how to
utilize graphical representations of H-UDNs to design efficient machine
learning algorithms
DN-ResNet: Efficient Deep Residual Network for Image Denoising
A deep learning approach to blind denoising of images without complete
knowledge of the noise statistics is considered. We propose DN-ResNet, which is
a deep convolutional neural network (CNN) consisting of several residual blocks
(ResBlocks). With cascade training, DN-ResNet is more accurate and more
computationally efficient than the state of art denoising networks. An
edge-aware loss function is further utilized in training DN-ResNet, so that the
denoising results have better perceptive quality compared to conventional loss
function. Next, we introduce the depthwise separable DN-ResNet (DS-DN-ResNet)
utilizing the proposed Depthwise Seperable ResBlock (DS-ResBlock) instead of
standard ResBlock, which has much less computational cost. DS-DN-ResNet is
incrementally evolved by replacing the ResBlocks in DN-ResNet by DS-ResBlocks
stage by stage. As a result, high accuracy and good computational efficiency
are achieved concurrently. Whereas previous state of art deep learning methods
focused on denoising either Gaussian or Poisson corrupted images, we consider
denoising images having the more practical Poisson with additive Gaussian noise
as well. The results show that DN-ResNets are more efficient, robust, and
perform better denoising than current state of art deep learning methods, as
well as the popular variants of the BM3D algorithm, in cases of blind and
non-blind denoising of images corrupted with Poisson, Gaussian or
Poisson-Gaussian noise. Our network also works well for other image enhancement
task such as compressed image restoration
Stability Properties of Graph Neural Networks
Graph neural networks (GNNs) have emerged as a powerful tool for nonlinear
processing of graph signals, exhibiting success in recommender systems, power
outage prediction, and motion planning, among others. GNNs consists of a
cascade of layers, each of which applies a graph convolution, followed by a
pointwise nonlinearity. In this work, we study the impact that changes in the
underlying topology have on the output of the GNN. First, we show that GNNs are
permutation equivariant, which implies that they effectively exploit internal
symmetries of the underlying topology. Then, we prove that graph convolutions
with integral Lipschitz filters, in combination with the frequency mixing
effect of the corresponding nonlinearities, yields an architecture that is both
stable to small changes in the underlying topology and discriminative of
information located at high frequencies. These are two properties that cannot
simultaneously hold when using only linear graph filters, which are either
discriminative or stable, thus explaining the superior performance of GNNs.Comment: Submitted to IEEE Transactions on Signal Processin
- …