Search CORE

733,854 research outputs found

HoME: a Household Multimodal Environment

Author: Anand Ankesh
Brodeur Simon
Celotti Luca
Courville Aaron
Golemo Florian
Larochelle Hugo
Perez Ethan
Rouat Jean
Strub Florian
Publication venue
Publication date: 29/11/2017
Field of study

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.Comment: Presented at NIPS 2017's Visually-Grounded Interaction and Language Worksho

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Learning Vision-and-Language Navigation from YouTube Videos

Author: Chen Peihao
Gan Chuang
Huang Diwei
Li Thomas H.
Lin Kunyang
Tan Mingkui
Publication venue
Publication date: 22/07/2023
Field of study

Vision-and-language navigation (VLN) requires an embodied agent to navigate in realistic 3D environments using natural language instructions. Existing VLN methods suffer from training on small-scale environments or unreasonable path-instruction datasets, limiting the generalization to unseen environments. There are massive house tour videos on YouTube, providing abundant real navigation experiences and layout information. However, these videos have not been explored for VLN before. In this paper, we propose to learn an agent from these videos by creating a large-scale dataset which comprises reasonable path-instruction pairs from house tour videos and pre-training the agent on it. To achieve this, we have to tackle the challenges of automatically constructing path-instruction pairs and exploiting real layout knowledge from raw and unlabeled videos. To address these, we first leverage an entropy-based method to construct the nodes of a path trajectory. Then, we propose an action-aware generator for generating instructions from unlabeled trajectories. Last, we devise a trajectory judgment pretext task to encourage the agent to mine the layout knowledge. Experimental results show that our method achieves state-of-the-art performance on two popular benchmarks (R2R and REVERIE). Code is available at https://github.com/JeremyLinky/YouTube-VLNComment: Accepted by ICCV 202

arXiv.org e-Print Archive

Universal Language Model Fine-tuning for Text Classification

Author: Howard Jeremy
Ruder Sebastian
Publication venue
Publication date: 01/01/2018
Field of study

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code.Comment: ACL 2018, fixed denominator in Equation 3, line

arXiv.org e-Print Archive

Crossref

Deep learning for supervised classification

Author: DI CIACCIO AGOSTINO
GIORGI Giovanni Maria
Publication venue: CLEUP
Publication date: 01/01/2016
Field of study

One of the most recent area in the Machine Learning research is Deep Learning. Deep Learning algorithms have been applied successfully to computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics. The key idea of Deep Learning is to combine the best techniques from Machine Learning to build powerful general‑purpose learning algorithms. It is a mistake to identify Deep Neural Networks with Deep Learning Algorithms. Other approaches are possible, and in this paper we illustrate a generalization of Stacking which has very competitive performances. In particular, we show an application of this approach to a real classification problem, where a three-stages Stacking has proved to be very effective

Archivio della ricerca- Università di Roma La Sapienza