Search CORE

615 research outputs found

Label-Dependencies Aware Recurrent Neural Networks

Author: JL Elman
M Dinarelli
M Schuster
MI Jordan
MP Marcus
N Srivastava
P Werbos
R Collobert
R Mori De
S Hochreiter
Y Bengio
Y Bengio
Publication venue
Publication date: 06/06/2017
Field of study

In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model \emph{sequence labeling} is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are re-injected as input into the network, and converted into embeddings, in the same way as words. We compare this RNN variant to all the other RNN models, Elman and Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language Understanding (SLU). Thanks to label embeddings and their combination at the hidden layer, the proposed variant, which uses more parameters than Elman and Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other RNNs, but also outperforms sophisticated CRF models.Comment: 22 pages, 3 figures. Accepted at CICling 2017 conference. Best Verifiability, Reproducibility, and Working Description awar

arXiv.org e-Print Archive

Crossref

Inducing Language Networks from Continuous Space Word Representations

Author: G.E. Hinton
K. Beyer
M. Sigman
R. Collobert
R.F.I. Cancho
Y. Bengio
Publication venue
Publication date: 01/01/2014
Field of study

Recent advancements in unsupervised feature learning have developed powerful latent representations of words. However, it is still not clear what makes one representation better than another and how we can learn the ideal representation. Understanding the structure of latent spaces attained is key to any future advancement in unsupervised learning. In this work, we introduce a new view of continuous space word representations as language networks. We explore two techniques to create language networks from learned features by inducing them for two popular word representation methods and examining the properties of their resulting networks. We find that the induced networks differ from other methods of creating language networks, and that they contain meaningful community structure.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Denoising Autoencoders for fast Combinatorial Black Box Optimization

Author: Bengio Y.
Deb K.
Krizhevsky A.
Larrañaga P.
Miller B. L.
Pelikan M.
Watson R. A.
Publication venue
Publication date: 21/09/2015
Field of study

Estimation of Distribution Algorithms (EDAs) require flexible probability models that can be efficiently learned and sampled. Autoencoders (AE) are generative stochastic networks with these desired properties. We integrate a special type of AE, the Denoising Autoencoder (DAE), into an EDA and evaluate the performance of DAE-EDA on several combinatorial optimization problems with a single objective. We asses the number of fitness evaluations as well as the required CPU times. We compare the results to the performance to the Bayesian Optimization Algorithm (BOA) and RBM-EDA, another EDA which is based on a generative neural network which has proven competitive with BOA. For the considered problem instances, DAE-EDA is considerably faster than BOA and RBM-EDA, sometimes by orders of magnitude. The number of fitness evaluations is higher than for BOA, but competitive with RBM-EDA. These results show that DAEs can be useful tools for problems with low but non-negligible fitness evaluation costs.Comment: corrected typos and small inconsistencie

arXiv.org e-Print Archive

Crossref

ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids

Author: D Jayaraman
D Jayaraman
GE Hinton
GE Hinton
J Masci
L Maaten van der
M Noroozi
M Tatarchenko
R Gao
R Girdhar
RN Shepard
T Zhou
TY Lin
Y Bengio
Publication venue
Publication date: 30/07/2018
Field of study

We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation. The main idea is a self-supervised training objective that, given only a single 2D image, requires all unseen views of the object to be predictable from learned features. We implement this idea as an encoder-decoder convolutional neural network. The network maps an input image of an unknown category and unknown viewpoint to a latent space, from which a deconvolutional decoder can best "lift" the image to its complete viewgrid showing the object from all viewing angles. Our class-agnostic training procedure encourages the representation to capture fundamental shape primitives and semantic regularities in a data-driven manner---without manual semantic labels. Our results on two widely-used shape datasets show 1) our approach successfully learns to perform "mental rotation" even for objects unseen during training, and 2) the learned latent space is a powerful representation for object recognition, outperforming several existing unsupervised feature learning methods.Comment: To appear at ECCV 201

arXiv.org e-Print Archive

Crossref

Collaborative Deep Learning for Recommender Systems

Author: Baldi P.
Bengio Y.
Bishop C. M.
Chen M.
Georgiev K.
Hu L.
Krizhevsky A.
Mikolov T.
Oord A. V. D.
Purushotham S.
Rendle S.
Salakhutdinov R.
Salakhutdinov R.
Wager S.
Wang H.
Wang H.
Wang N.
Publication venue
Publication date: 18/06/2015
Field of study

Collaborative filtering (CF) is a successful approach commonly used by many recommender systems. Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation. However, the ratings are often very sparse in many applications, causing CF-based methods to degrade significantly in their recommendation performance. To address this sparsity problem, auxiliary information such as item content information may be utilized. Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information. Nevertheless, the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse. To address this problem, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (CDL), which jointly performs deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix. Extensive experiments on three real-world datasets from different domains show that CDL can significantly advance the state of the art

arXiv.org e-Print Archive

CiteSeerX

Crossref

A framework for selecting deep learning hyper-parameters

Author: A Arauzo-Azofra
EJ Humphrey
G Hinton
GE Hinton
J Bergstra
K Deckers
M Roantree
MP Boxtel van
MPJ Boxtel van
N Donnelly
R Bellazzi
Y Bengio
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/07/2015
Field of study

Recent research has found that deep learning architectures show significant improvements over traditional shallow algorithms when mining high dimensional datasets. When the choice of algorithm employed, hyper-parameter setting, number of hidden layers and nodes within a layer are combined, the identification of an optimal configuration can be a lengthy process. Our work provides a framework for building deep learning architectures via a stepwise approach, together with an evaluation methodology to quickly identify poorly performing architectural configurations. Using a dataset with high dimensionality, we illustrate how different architectures perform and how one algorithm configuration can provide input for fine-tuning more complex models

Crossref

Irish Universities

DCU Online Research Access Service

Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition

Author: Bengio Y.
Dan Claudiu Cireşan
Hochreiter S.
Jürgen Schmidhuber
LeCun Y.
Luca Maria Gambardella
Nair V.
Ranzato M.
Ruetsch G.
Rumelhart D. E.
Russell S.
Salakhutdinov R.
Steinkraus D.
Ueli Meier
Publication venue: 'MIT Press - Journals'
Publication date: 01/03/2010
Field of study

Good old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the famous MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images, and graphics cards to greatly speed up learning.Comment: 14 pages, 2 figures, 4 listing

arXiv.org e-Print Archive

Crossref

A Recurrent Neural Network Survival Model: Predicting Web User Return Time

Author: A Graves
AG Hawkes
B Efron
DR Cox
DR Cox
DR Cox
FE Harrell
H Ishwaran
JD Kalbfleisch
JP Klein
M Han
N Breslow
R Chandra
S Hochreiter
X Cai
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2018
Field of study

The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.Comment: Accepted into ECML PKDD 2018; 8 figures and 1 tabl

arXiv.org e-Print Archive

Crossref

Deep Learning to Analyze RNA-Seq Gene Expression Data

Author: B Li
DC Cireşan
F Ciompi
J Friedman
M Leung
N Srivastava
R Tibshirani
Y Bengio
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Deep learning models are currently being applied in several areas with great success. However, their application for the analysis of high-throughput sequencing data remains a challenge for the research community due to the fact that this family of models are known to work very well in big datasets with lots of samples available, just the opposite scenario typically found in biomedical areas. In this work, a first approximation on the use of deep learning for the analysis of RNA-Seq gene expression profiles data is provided. Three public cancer-related databases are analyzed using a regularized linear model (standard LASSO) as baseline model, and two deep learning models that differ on the feature selection technique used prior to the application of a deep neural net model. The results indicate that a straightforward application of deep nets implementations available in public scientific tools and under the conditions described within this work is not enough to outperform simpler models like LASSO. Therefore, smarter and more complex ways that incorporate prior biological knowledge into the estimation procedure of deep learning models may be necessary in order to obtain better results in terms of predictive performance.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

Repositorio Institucional Universidad de Málaga