137,009 research outputs found
Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques
The estimation of missing input vector elements in real time processing
applications requires a system that possesses the knowledge of certain
characteristics such as correlations between variables, which are inherent in
the input space. Computational intelligence techniques and maximum likelihood
techniques do possess such characteristics and as a result are important for
imputation of missing data. This paper compares two approaches to the problem
of missing data estimation. The first technique is based on the current state
of the art approach to this problem, that being the use of Maximum Likelihood
(ML) and Expectation Maximisation (EM. The second approach is the use of a
system based on auto-associative neural networks and the Genetic Algorithm as
discussed by Adbella and Marwala3. The estimation ability of both of these
techniques is compared, based on three datasets and conclusions are made.Comment: 24 pages, 7 figures, 4 table
Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications
Automatic age and gender classification based on unconstrained images has
become essential techniques on mobile devices. With limited computing power,
how to develop a robust system becomes a challenging task. In this paper, we
present an efficient convolutional neural network (CNN) called lightweight
multi-task CNN for simultaneous age and gender classification. Lightweight
multi-task CNN uses depthwise separable convolution to reduce the model size
and save the inference time. On the public challenging Adience dataset, the
accuracy of age and gender classification is better than baseline multi-task
CNN methods.Comment: To publish in the IEEE first International Conference on Multimedia
Information Processing and Retrieval, 2018. (IEEE MIPR 2018
Data-driven Prognostics with Predictive Uncertainty Estimation using Ensemble of Deep Ordinal Regression Models
Prognostics or Remaining Useful Life (RUL) Estimation from multi-sensor time
series data is useful to enable condition-based maintenance and ensure high
operational availability of equipment. We propose a novel deep learning based
approach for Prognostics with Uncertainty Quantification that is useful in
scenarios where: (i) access to labeled failure data is scarce due to rarity of
failures (ii) future operational conditions are unobserved and (iii) inherent
noise is present in the sensor readings. All three scenarios mentioned are
unavoidable sources of uncertainty in the RUL estimation process often
resulting in unreliable RUL estimates. To address (i), we formulate RUL
estimation as an Ordinal Regression (OR) problem, and propose LSTM-OR: deep
Long Short Term Memory (LSTM) network based approach to learn the OR function.
We show that LSTM-OR naturally allows for incorporation of censored operational
instances in training along with the failed instances, leading to more robust
learning. To address (ii), we propose a simple yet effective approach to
quantify predictive uncertainty in the RUL estimation models by training an
ensemble of LSTM-OR models. Through empirical evaluation on C-MAPSS turbofan
engine benchmark datasets, we demonstrate that LSTM-OR is significantly better
than the commonly used deep metric regression based approaches for RUL
estimation, especially when failed training instances are scarce. Further, our
uncertainty quantification approach yields high quality predictive uncertainty
estimates while also leading to improved RUL estimates compared to single best
LSTM-OR models.Comment: Accepted at International Journal of Prognostics and Health
Management (IJPHM), 201
Learn Stereo, Infer Mono: Siamese Networks for Self-Supervised, Monocular, Depth Estimation
The field of self-supervised monocular depth estimation has seen huge
advancements in recent years. Most methods assume stereo data is available
during training but usually under-utilize it and only treat it as a reference
signal. We propose a novel self-supervised approach which uses both left and
right images equally during training, but can still be used with a single input
image at test time, for monocular depth estimation. Our Siamese network
architecture consists of two, twin networks, each learns to predict a disparity
map from a single image. At test time, however, only one of these networks is
used in order to infer depth. We show state-of-the-art results on the standard
KITTI Eigen split benchmark as well as being the highest scoring
self-supervised method on the new KITTI single view benchmark. To demonstrate
the ability of our method to generalize to new data sets, we further provide
results on the Make3D benchmark, which was not used during training
Missing Data using Decision Forest and Computational Intelligence
Autoencoder neural network is implemented to estimate the missing data.
Genetic algorithm is implemented for network optimization and estimating the
missing data. Missing data is treated as Missing At Random mechanism by
implementing maximum likelihood algorithm. The network performance is
determined by calculating the mean square error of the network prediction. The
network is further optimized by implementing Decision Forest. The impact of
missing data is then investigated and decision forrests are found to improve
the results
Improved graph-based SFA: Information preservation complements the slowness principle
Slow feature analysis (SFA) is an unsupervised-learning algorithm that
extracts slowly varying features from a multi-dimensional time series. A
supervised extension to SFA for classification and regression is graph-based
SFA (GSFA). GSFA is based on the preservation of similarities, which are
specified by a graph structure derived from the labels. It has been shown that
hierarchical GSFA (HGSFA) allows learning from images and other
high-dimensional data. The feature space spanned by HGSFA is complex due to the
composition of the nonlinearities of the nodes in the network. However, we show
that the network discards useful information prematurely before it reaches
higher nodes, resulting in suboptimal global slowness and an under-exploited
feature space.
To counteract these problems, we propose an extension called hierarchical
information-preserving GSFA (HiGSFA), where information preservation
complements the slowness-maximization goal. We build a 10-layer HiGSFA network
to estimate human age from facial photographs of the MORPH-II database,
achieving a mean absolute error of 3.50 years, improving the state-of-the-art
performance. HiGSFA and HGSFA support multiple-labels and offer a rich feature
space, feed-forward training, and linear complexity in the number of samples
and dimensions. Furthermore, HiGSFA outperforms HGSFA in terms of feature
slowness, estimation accuracy and input reconstruction, giving rise to a
promising hierarchical supervised-learning approach.Comment: 40 pages, 9 figures, 9 tables, submitted to Pattern Recognitio
Multi-Expert Gender Classification on Age Group by Integrating Deep Neural Networks
Generally, facial age variations affect gender classification accuracy
significantly, because facial shape and skin texture change as they grow old.
This requires re-examination on the gender classification system to consider
facial age information. In this paper, we propose Multi-expert Gender
Classification on Age Group (MGA), an end-to-end multi-task learning schemes of
age estimation and gender classification. First, two types of deep neural
networks are utilized; Convolutional Appearance Network (CAN) for facial
appearance feature and Deep Geometry Network (DGN) for facial geometric
feature. Then, CAN and DGN are integrated by the proposed model integration
strategy and fine-tuned in order to improve age and gender classification
accuracy. The facial images are categorized into one of three age groups
(young, adult and elder group) based on their estimated age, and the system
makes a gender prediction according to average fusion strategy of three gender
classification experts, which are trained to fit gender characteristics of each
age group. Rigorous experimental results conducted on the challenging databases
suggest that the proposed MGA outperforms several state-of-art researches with
smaller computational cost.Comment: 12 page
3D Interpreter Networks for Viewer-Centered Wireframe Modeling
Understanding 3D object structure from a single image is an important but
challenging task in computer vision, mostly due to the lack of 3D object
annotations to real images. Previous research tackled this problem by either
searching for a 3D shape that best explains 2D annotations, or training purely
on synthetic data with ground truth 3D information. In this work, we propose 3D
INterpreter Networks (3D-INN), an end-to-end trainable framework that
sequentially estimates 2D keypoint heatmaps and 3D object skeletons and poses.
Our system learns from both 2D-annotated real images and synthetic 3D data.
This is made possible mainly by two technical innovations. First, heatmaps of
2D keypoints serve as an intermediate representation to connect real and
synthetic data. 3D-INN is trained on real images to estimate 2D keypoint
heatmaps from an input image; it then predicts 3D object structure from
heatmaps using knowledge learned from synthetic 3D shapes. By doing so, 3D-INN
benefits from the variation and abundance of synthetic 3D objects, without
suffering from the domain difference between real and synthesized images, often
due to imperfect rendering. Second, we propose a Projection Layer, mapping
estimated 3D structure back to 2D. During training, it ensures 3D-INN to
predict 3D structure whose projection is consistent with the 2D annotations to
real images. Experiments show that the proposed system performs well on both 2D
keypoint estimation and 3D structure recovery. We also demonstrate that the
recovered 3D information has wide vision applications, such as image retrieval.Comment: Journal preprint of arXiv:1604.08685 (IJCV, 2018). The first two
authors contributed equally to this work. Project page:
http://3dinterpreter.csail.mit.ed
Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR
One challenging problem of robust automatic speech recognition (ASR) is how
to measure the goodness of a speech enhancement algorithm (SEA) without
calculating the word error rate (WER) due to the high costs of manual
transcriptions, language modeling and decoding process. Traditional measures
like PESQ and STOI for evaluating the speech quality and intelligibility were
verified to have relatively low correlations with WER. In this study, a novel
acoustics-guided evaluation (AGE) measure is proposed for estimating
performance of SEAs for robust ASR. AGE consists of three consecutive steps,
namely the low-level representations via the feature extraction, high-level
representations via the nonlinear mapping with the acoustic model (AM), and the
final AGE calculation between the representations of clean speech and degraded
speech. Specifically, state posterior probabilities from neural network based
AM are adopted for the high-level representations and the cross-entropy
criterion is used to calculate AGE. Experiments demonstrate AGE could yield
consistently highest correlations with WER and give the most accurate
estimation of ASR performance compared with PESQ, STOI, and acoustic confidence
measure using Entropy. Potentially, AGE could be adopted to guide the parameter
optimization of deep learning based SEAs to further improve the recognition
performance.Comment: Submitted to ICASSP 201
Attended End-to-end Architecture for Age Estimation from Facial Expression Videos
The main challenges of age estimation from facial expression videos lie not
only in the modeling of the static facial appearance, but also in the capturing
of the temporal facial dynamics. Traditional techniques to this problem focus
on constructing handcrafted features to explore the discriminative information
contained in facial appearance and dynamics separately. This relies on
sophisticated feature-refinement and framework-design. In this paper, we
present an end-to-end architecture for age estimation, called Spatially-Indexed
Attention Model (SIAM), which is able to simultaneously learn both the
appearance and dynamics of age from raw videos of facial expressions.
Specifically, we employ convolutional neural networks to extract effective
latent appearance representations and feed them into recurrent networks to
model the temporal dynamics. More importantly, we propose to leverage attention
models for salience detection in both the spatial domain for each single image
and the temporal domain for the whole video as well. We design a specific
spatially-indexed attention mechanism among the convolutional layers to extract
the salient facial regions in each individual image, and a temporal attention
layer to assign attention weights to each frame. This two-pronged approach not
only improves the performance by allowing the model to focus on informative
frames and facial areas, but it also offers an interpretable correspondence
between the spatial facial regions as well as temporal frames, and the task of
age estimation. We demonstrate the strong performance of our model in
experiments on a large, gender-balanced database with 400 subjects with ages
spanning from 8 to 76 years. Experiments reveal that our model exhibits
significant superiority over the state-of-the-art methods given sufficient
training data.Comment: Accepted by Transactions on Image Processing (TIP
- …