54 research outputs found

    Boosting Multi-Task Weak Learners with Applications to Textual and Social Data

    Get PDF
    International audienceLearning multiple related tasks from data simultaneously can improve predictive performance relative to learning these tasks independently. In this paper we propose a novel multi-task learning algorithm called MT-Adaboost: it extends Adaboost algorithm to the multi-task setting; it uses as multi-task weak classifier a multi-task decision stump. This allows to learn different dependencies between tasks for different regions of the learning space. Thus, we relax the conventional hypothesis that tasks behave similarly in the whole learning space. Moreover, MT-Adaboost can learn multiple tasks without imposing the constraint of sharing the same label set and/or examples between tasks. A theoretical analysis is derived from the analysis of the original Adaboost. Experiments for multiple tasks over large scale textual data sets with social context (Enron and Tobacco) give rise to very promising results

    End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon

    Full text link
    Most recent work in goal oriented visual navigation resorts to large-scale machine learning in simulated environments. The main challenge lies in learning compact representations generalizable to unseen environments and in learning high-capacity perception modules capable of reasoning on high-dimensional input. The latter is particularly difficult when the goal is not given as a category ("ObjectNav") but as an exemplar image ("ImageNav"), as the perception module needs to learn a comparison strategy requiring to solve an underlying visual correspondence problem. This has been shown to be difficult from reward alone or with standard auxiliary tasks. We address this problem through a sequence of two pretext tasks, which serve as a prior for what we argue is one of the main bottleneck in perception, extremely wide-baseline relative pose estimation and visibility prediction in complex scenes. The first pretext task, cross-view completion is a proxy for the underlying visual correspondence problem, while the second task addresses goal detection and finding directly. We propose a new dual encoder with a large-capacity binocular ViT model and show that correspondence solutions naturally emerge from the training signals. Experiments show significant improvements and SOTA performance on the two benchmarks, ImageNav and the Instance-ImageNav variant, where camera intrinsics and height differ between observation and goal

    CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow

    Full text link
    Despite impressive performance for high-level downstream tasks, self-supervised pre-training methods have not yet fully delivered on dense geometric vision tasks such as stereo matching or optical flow. The application of self-supervised concepts, such as instance discrimination or masked image modeling, to geometric tasks is an active area of research. In this work, we build on the recent cross-view completion framework, a variation of masked image modeling that leverages a second view from the same scene which makes it well suited for binocular downstream tasks. The applicability of this concept has so far been limited in at least two ways: (a) by the difficulty of collecting real-world image pairs -- in practice only synthetic data have been used -- and (b) by the lack of generalization of vanilla transformers to dense downstream tasks for which relative position is more meaningful than absolute position. We explore three avenues of improvement. First, we introduce a method to collect suitable real-world image pairs at large scale. Second, we experiment with relative positional embeddings and show that they enable vision transformers to perform substantially better. Third, we scale up vision transformer based cross-completion architectures, which is made possible by the use of large amounts of data. With these improvements, we show for the first time that state-of-the-art results on stereo matching and optical flow can be reached without using any classical task-specific techniques like correlation volume, iterative estimation, image warping or multi-scale reasoning, thus paving the way towards universal vision models.Comment: ICCV 202

    The IPIN 2019 Indoor Localisation Competition—Description and Results

    Get PDF
    IPIN 2019 Competition, sixth in a series of IPIN competitions, was held at the CNR Research Area of Pisa (IT), integrated into the program of the IPIN 2019 Conference. It included two on-site real-time Tracks and three off-site Tracks. The four Tracks presented in this paper were set in the same environment, made of two buildings close together for a total usable area of 1000 m 2 outdoors and and 6000 m 2 indoors over three floors, with a total path length exceeding 500 m. IPIN competitions, based on the EvAAL framework, have aimed at comparing the accuracy performance of personal positioning systems in fair and realistic conditions: past editions of the competition were carried in big conference settings, university campuses and a shopping mall. Positioning accuracy is computed while the person carrying the system under test walks at normal walking speed, uses lifts and goes up and down stairs or briefly stops at given points. Results presented here are a showcase of state-of-the-art systems tested side by side in real-world settings as part of the on-site real-time competition Tracks. Results for off-site Tracks allow a detailed and reproducible comparison of the most recent positioning and tracking algorithms in the same environment as the on-site Tracks
    • …
    corecore