54 research outputs found
Boosting Multi-Task Weak Learners with Applications to Textual and Social Data
International audienceLearning multiple related tasks from data simultaneously can improve predictive performance relative to learning these tasks independently. In this paper we propose a novel multi-task learning algorithm called MT-Adaboost: it extends Adaboost algorithm to the multi-task setting; it uses as multi-task weak classifier a multi-task decision stump. This allows to learn different dependencies between tasks for different regions of the learning space. Thus, we relax the conventional hypothesis that tasks behave similarly in the whole learning space. Moreover, MT-Adaboost can learn multiple tasks without imposing the constraint of sharing the same label set and/or examples between tasks. A theoretical analysis is derived from the analysis of the original Adaboost. Experiments for multiple tasks over large scale textual data sets with social context (Enron and Tobacco) give rise to very promising results
End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon
Most recent work in goal oriented visual navigation resorts to large-scale
machine learning in simulated environments. The main challenge lies in learning
compact representations generalizable to unseen environments and in learning
high-capacity perception modules capable of reasoning on high-dimensional
input. The latter is particularly difficult when the goal is not given as a
category ("ObjectNav") but as an exemplar image ("ImageNav"), as the perception
module needs to learn a comparison strategy requiring to solve an underlying
visual correspondence problem. This has been shown to be difficult from reward
alone or with standard auxiliary tasks. We address this problem through a
sequence of two pretext tasks, which serve as a prior for what we argue is one
of the main bottleneck in perception, extremely wide-baseline relative pose
estimation and visibility prediction in complex scenes. The first pretext task,
cross-view completion is a proxy for the underlying visual correspondence
problem, while the second task addresses goal detection and finding directly.
We propose a new dual encoder with a large-capacity binocular ViT model and
show that correspondence solutions naturally emerge from the training signals.
Experiments show significant improvements and SOTA performance on the two
benchmarks, ImageNav and the Instance-ImageNav variant, where camera intrinsics
and height differ between observation and goal
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Despite impressive performance for high-level downstream tasks,
self-supervised pre-training methods have not yet fully delivered on dense
geometric vision tasks such as stereo matching or optical flow. The application
of self-supervised concepts, such as instance discrimination or masked image
modeling, to geometric tasks is an active area of research. In this work, we
build on the recent cross-view completion framework, a variation of masked
image modeling that leverages a second view from the same scene which makes it
well suited for binocular downstream tasks. The applicability of this concept
has so far been limited in at least two ways: (a) by the difficulty of
collecting real-world image pairs -- in practice only synthetic data have been
used -- and (b) by the lack of generalization of vanilla transformers to dense
downstream tasks for which relative position is more meaningful than absolute
position. We explore three avenues of improvement. First, we introduce a method
to collect suitable real-world image pairs at large scale. Second, we
experiment with relative positional embeddings and show that they enable vision
transformers to perform substantially better. Third, we scale up vision
transformer based cross-completion architectures, which is made possible by the
use of large amounts of data. With these improvements, we show for the first
time that state-of-the-art results on stereo matching and optical flow can be
reached without using any classical task-specific techniques like correlation
volume, iterative estimation, image warping or multi-scale reasoning, thus
paving the way towards universal vision models.Comment: ICCV 202
The IPIN 2019 Indoor Localisation Competition—Description and Results
IPIN 2019 Competition, sixth in a series of IPIN competitions, was held at the CNR Research Area of Pisa (IT), integrated into the program of the IPIN 2019 Conference. It included two on-site real-time Tracks and three off-site Tracks. The four Tracks presented in this paper were set in the same environment, made of two buildings close together for a total usable area of 1000 m 2 outdoors and and 6000 m 2 indoors over three floors, with a total path length exceeding 500 m. IPIN competitions, based on the EvAAL framework, have aimed at comparing the accuracy performance of personal positioning systems in fair and realistic conditions: past editions of the competition were carried in big conference settings, university campuses and a shopping mall. Positioning accuracy is computed while the person carrying the system under test walks at normal walking speed, uses lifts and goes up and down stairs or briefly stops at given points. Results presented here are a showcase of state-of-the-art systems tested side by side in real-world settings as part of the on-site real-time competition Tracks. Results for off-site Tracks allow a detailed and reproducible comparison of the most recent positioning and tracking algorithms in the same environment as the on-site Tracks
- …