137,009 research outputs found

    Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques

    Full text link
    The estimation of missing input vector elements in real time processing applications requires a system that possesses the knowledge of certain characteristics such as correlations between variables, which are inherent in the input space. Computational intelligence techniques and maximum likelihood techniques do possess such characteristics and as a result are important for imputation of missing data. This paper compares two approaches to the problem of missing data estimation. The first technique is based on the current state of the art approach to this problem, that being the use of Maximum Likelihood (ML) and Expectation Maximisation (EM. The second approach is the use of a system based on auto-associative neural networks and the Genetic Algorithm as discussed by Adbella and Marwala3. The estimation ability of both of these techniques is compared, based on three datasets and conclusions are made.Comment: 24 pages, 7 figures, 4 table

    Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications

    Full text link
    Automatic age and gender classification based on unconstrained images has become essential techniques on mobile devices. With limited computing power, how to develop a robust system becomes a challenging task. In this paper, we present an efficient convolutional neural network (CNN) called lightweight multi-task CNN for simultaneous age and gender classification. Lightweight multi-task CNN uses depthwise separable convolution to reduce the model size and save the inference time. On the public challenging Adience dataset, the accuracy of age and gender classification is better than baseline multi-task CNN methods.Comment: To publish in the IEEE first International Conference on Multimedia Information Processing and Retrieval, 2018. (IEEE MIPR 2018

    Data-driven Prognostics with Predictive Uncertainty Estimation using Ensemble of Deep Ordinal Regression Models

    Full text link
    Prognostics or Remaining Useful Life (RUL) Estimation from multi-sensor time series data is useful to enable condition-based maintenance and ensure high operational availability of equipment. We propose a novel deep learning based approach for Prognostics with Uncertainty Quantification that is useful in scenarios where: (i) access to labeled failure data is scarce due to rarity of failures (ii) future operational conditions are unobserved and (iii) inherent noise is present in the sensor readings. All three scenarios mentioned are unavoidable sources of uncertainty in the RUL estimation process often resulting in unreliable RUL estimates. To address (i), we formulate RUL estimation as an Ordinal Regression (OR) problem, and propose LSTM-OR: deep Long Short Term Memory (LSTM) network based approach to learn the OR function. We show that LSTM-OR naturally allows for incorporation of censored operational instances in training along with the failed instances, leading to more robust learning. To address (ii), we propose a simple yet effective approach to quantify predictive uncertainty in the RUL estimation models by training an ensemble of LSTM-OR models. Through empirical evaluation on C-MAPSS turbofan engine benchmark datasets, we demonstrate that LSTM-OR is significantly better than the commonly used deep metric regression based approaches for RUL estimation, especially when failed training instances are scarce. Further, our uncertainty quantification approach yields high quality predictive uncertainty estimates while also leading to improved RUL estimates compared to single best LSTM-OR models.Comment: Accepted at International Journal of Prognostics and Health Management (IJPHM), 201

    Learn Stereo, Infer Mono: Siamese Networks for Self-Supervised, Monocular, Depth Estimation

    Full text link
    The field of self-supervised monocular depth estimation has seen huge advancements in recent years. Most methods assume stereo data is available during training but usually under-utilize it and only treat it as a reference signal. We propose a novel self-supervised approach which uses both left and right images equally during training, but can still be used with a single input image at test time, for monocular depth estimation. Our Siamese network architecture consists of two, twin networks, each learns to predict a disparity map from a single image. At test time, however, only one of these networks is used in order to infer depth. We show state-of-the-art results on the standard KITTI Eigen split benchmark as well as being the highest scoring self-supervised method on the new KITTI single view benchmark. To demonstrate the ability of our method to generalize to new data sets, we further provide results on the Make3D benchmark, which was not used during training

    Missing Data using Decision Forest and Computational Intelligence

    Full text link
    Autoencoder neural network is implemented to estimate the missing data. Genetic algorithm is implemented for network optimization and estimating the missing data. Missing data is treated as Missing At Random mechanism by implementing maximum likelihood algorithm. The network performance is determined by calculating the mean square error of the network prediction. The network is further optimized by implementing Decision Forest. The impact of missing data is then investigated and decision forrests are found to improve the results

    Improved graph-based SFA: Information preservation complements the slowness principle

    Full text link
    Slow feature analysis (SFA) is an unsupervised-learning algorithm that extracts slowly varying features from a multi-dimensional time series. A supervised extension to SFA for classification and regression is graph-based SFA (GSFA). GSFA is based on the preservation of similarities, which are specified by a graph structure derived from the labels. It has been shown that hierarchical GSFA (HGSFA) allows learning from images and other high-dimensional data. The feature space spanned by HGSFA is complex due to the composition of the nonlinearities of the nodes in the network. However, we show that the network discards useful information prematurely before it reaches higher nodes, resulting in suboptimal global slowness and an under-exploited feature space. To counteract these problems, we propose an extension called hierarchical information-preserving GSFA (HiGSFA), where information preservation complements the slowness-maximization goal. We build a 10-layer HiGSFA network to estimate human age from facial photographs of the MORPH-II database, achieving a mean absolute error of 3.50 years, improving the state-of-the-art performance. HiGSFA and HGSFA support multiple-labels and offer a rich feature space, feed-forward training, and linear complexity in the number of samples and dimensions. Furthermore, HiGSFA outperforms HGSFA in terms of feature slowness, estimation accuracy and input reconstruction, giving rise to a promising hierarchical supervised-learning approach.Comment: 40 pages, 9 figures, 9 tables, submitted to Pattern Recognitio

    Multi-Expert Gender Classification on Age Group by Integrating Deep Neural Networks

    Full text link
    Generally, facial age variations affect gender classification accuracy significantly, because facial shape and skin texture change as they grow old. This requires re-examination on the gender classification system to consider facial age information. In this paper, we propose Multi-expert Gender Classification on Age Group (MGA), an end-to-end multi-task learning schemes of age estimation and gender classification. First, two types of deep neural networks are utilized; Convolutional Appearance Network (CAN) for facial appearance feature and Deep Geometry Network (DGN) for facial geometric feature. Then, CAN and DGN are integrated by the proposed model integration strategy and fine-tuned in order to improve age and gender classification accuracy. The facial images are categorized into one of three age groups (young, adult and elder group) based on their estimated age, and the system makes a gender prediction according to average fusion strategy of three gender classification experts, which are trained to fit gender characteristics of each age group. Rigorous experimental results conducted on the challenging databases suggest that the proposed MGA outperforms several state-of-art researches with smaller computational cost.Comment: 12 page

    3D Interpreter Networks for Viewer-Centered Wireframe Modeling

    Full text link
    Understanding 3D object structure from a single image is an important but challenging task in computer vision, mostly due to the lack of 3D object annotations to real images. Previous research tackled this problem by either searching for a 3D shape that best explains 2D annotations, or training purely on synthetic data with ground truth 3D information. In this work, we propose 3D INterpreter Networks (3D-INN), an end-to-end trainable framework that sequentially estimates 2D keypoint heatmaps and 3D object skeletons and poses. Our system learns from both 2D-annotated real images and synthetic 3D data. This is made possible mainly by two technical innovations. First, heatmaps of 2D keypoints serve as an intermediate representation to connect real and synthetic data. 3D-INN is trained on real images to estimate 2D keypoint heatmaps from an input image; it then predicts 3D object structure from heatmaps using knowledge learned from synthetic 3D shapes. By doing so, 3D-INN benefits from the variation and abundance of synthetic 3D objects, without suffering from the domain difference between real and synthesized images, often due to imperfect rendering. Second, we propose a Projection Layer, mapping estimated 3D structure back to 2D. During training, it ensures 3D-INN to predict 3D structure whose projection is consistent with the 2D annotations to real images. Experiments show that the proposed system performs well on both 2D keypoint estimation and 3D structure recovery. We also demonstrate that the recovered 3D information has wide vision applications, such as image retrieval.Comment: Journal preprint of arXiv:1604.08685 (IJCV, 2018). The first two authors contributed equally to this work. Project page: http://3dinterpreter.csail.mit.ed

    Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR

    Full text link
    One challenging problem of robust automatic speech recognition (ASR) is how to measure the goodness of a speech enhancement algorithm (SEA) without calculating the word error rate (WER) due to the high costs of manual transcriptions, language modeling and decoding process. Traditional measures like PESQ and STOI for evaluating the speech quality and intelligibility were verified to have relatively low correlations with WER. In this study, a novel acoustics-guided evaluation (AGE) measure is proposed for estimating performance of SEAs for robust ASR. AGE consists of three consecutive steps, namely the low-level representations via the feature extraction, high-level representations via the nonlinear mapping with the acoustic model (AM), and the final AGE calculation between the representations of clean speech and degraded speech. Specifically, state posterior probabilities from neural network based AM are adopted for the high-level representations and the cross-entropy criterion is used to calculate AGE. Experiments demonstrate AGE could yield consistently highest correlations with WER and give the most accurate estimation of ASR performance compared with PESQ, STOI, and acoustic confidence measure using Entropy. Potentially, AGE could be adopted to guide the parameter optimization of deep learning based SEAs to further improve the recognition performance.Comment: Submitted to ICASSP 201

    Attended End-to-end Architecture for Age Estimation from Facial Expression Videos

    Full text link
    The main challenges of age estimation from facial expression videos lie not only in the modeling of the static facial appearance, but also in the capturing of the temporal facial dynamics. Traditional techniques to this problem focus on constructing handcrafted features to explore the discriminative information contained in facial appearance and dynamics separately. This relies on sophisticated feature-refinement and framework-design. In this paper, we present an end-to-end architecture for age estimation, called Spatially-Indexed Attention Model (SIAM), which is able to simultaneously learn both the appearance and dynamics of age from raw videos of facial expressions. Specifically, we employ convolutional neural networks to extract effective latent appearance representations and feed them into recurrent networks to model the temporal dynamics. More importantly, we propose to leverage attention models for salience detection in both the spatial domain for each single image and the temporal domain for the whole video as well. We design a specific spatially-indexed attention mechanism among the convolutional layers to extract the salient facial regions in each individual image, and a temporal attention layer to assign attention weights to each frame. This two-pronged approach not only improves the performance by allowing the model to focus on informative frames and facial areas, but it also offers an interpretable correspondence between the spatial facial regions as well as temporal frames, and the task of age estimation. We demonstrate the strong performance of our model in experiments on a large, gender-balanced database with 400 subjects with ages spanning from 8 to 76 years. Experiments reveal that our model exhibits significant superiority over the state-of-the-art methods given sufficient training data.Comment: Accepted by Transactions on Image Processing (TIP
    corecore