7,467 research outputs found

    MSPlayer: Multi-Source and multi-Path LeverAged YoutubER

    Full text link
    Online video streaming through mobile devices has become extremely popular nowadays. YouTube, for example, reported that the percentage of its traffic streaming to mobile devices has soared from 6% to more than 40% over the past two years. Moreover, people are constantly seeking to stream high quality video for better experience while often suffering from limited bandwidth. Thanks to the rapid deployment of content delivery networks (CDNs), popular videos are now replicated at different sites, and users can stream videos from close-by locations with low latencies. As mobile devices nowadays are equipped with multiple wireless interfaces (e.g., WiFi and 3G/4G), aggregating bandwidth for high definition video streaming has become possible. We propose a client-based video streaming solution, MSPlayer, that takes advantage of multiple video sources as well as multiple network paths through different interfaces. MSPlayer reduces start-up latency and provides high quality video streaming and robust data transport in mobile scenarios. We experimentally demonstrate our solution on a testbed and through the YouTube video service.Comment: accepted to ACM CoNEXT'1

    Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

    Get PDF
    First-person vision is gaining interest as it offers a unique viewpoint on people's interaction with objects, their attention, and even intention. However, progress in this challenging domain has been relatively slow due to the lack of sufficiently large datasets. In this paper, we introduce EPIC-KITCHENS, a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict nonscripted daily activities: we simply asked each participant to start recording every time they entered their kitchen. Recording took place in 4 cities (in North America and Europe) by participants belonging to 10 different nationalities, resulting in highly diverse cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labeled for a total of 39.6K action segments and 454.3K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. Dataset and Project page: http://epic-kitchens.github.ioComment: European Conference on Computer Vision (ECCV) 2018 Dataset and Project page: http://epic-kitchens.github.i

    EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

    Full text link
    Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks. However, existing egocentric VLP frameworks utilize separate video and language encoders and learn task-specific cross-modal information only during fine-tuning, limiting the development of a unified system. In this work, we introduce the second generation of egocentric video-language pre-training (EgoVLPv2), a significant improvement from the previous generation, by incorporating cross-modal fusion directly into the video and language backbones. EgoVLPv2 learns strong video-text representation during pre-training and reuses the cross-modal attention modules to support different downstream tasks in a flexible and efficient manner, reducing fine-tuning costs. Moreover, our proposed fusion in the backbone strategy is more lightweight and compute-efficient than stacking additional fusion-specific layers. Extensive experiments on a wide range of VL tasks demonstrate the effectiveness of EgoVLPv2 by achieving consistent state-of-the-art performance over strong baselines across all downstream. Our project page can be found at https://shramanpramanick.github.io/EgoVLPv2/.Comment: Published in ICCV 202
    • …
    corecore