Search CORE

221 research outputs found

Deep Architectures and Ensembles for Semantic Video Classification

Author: Bober Miroslaw
Bober-Irizar Mikel
Husain Sameed
Ong Eng-Jon
Publication venue
Publication date: 01/01/2018
Field of study

This work addresses the problem of accurate semantic labelling of short videos. To this end, a multitude of different deep nets, ranging from traditional recurrent neural networks (LSTM, GRU), temporal agnostic networks (FV,VLAD,BoW), fully connected neural networks mid-stage AV fusion and others. Additionally, we also propose a residual architecture-based DNN for video classification, with state-of-the art classification performance at significantly reduced complexity. Furthermore, we propose four new approaches to diversity-driven multi-net ensembling, one based on fast correlation measure and three incorporating a DNN-based combiner. We show that significant performance gains can be achieved by ensembling diverse nets and we investigate factors contributing to high diversity. Based on the extensive YouTube8M dataset, we provide an in-depth evaluation and analysis of their behaviour. We show that the performance of the ensemble is state-of-the-art achieving the highest accuracy on the YouTube-8M Kaggle test data. The performance of the ensemble of classifiers was also evaluated on the HMDB51 and UCF101 datasets, and show that the resulting method achieves comparable accuracy with state-of-the-art methods using similar input features

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

Apprentissage vidéo et langage naturel à grande échelle

Author: Miech Antoine
Publication venue: HAL CCSD
Publication date: 14/10/2020
Field of study

The goal of this thesis is to build and train machine learning models capable of understanding the content of videos. Current video understanding approaches mainly rely on large-scale manually annotated video datasets for training. However, collecting and annotating such dataset is cumbersome, expensive and time-consuming. To address this issue, this thesis focuses on leveraging large amounts of readily-available, but noisy annotations in the form of natural language. In particular, we exploit a diverse corpus of textual metadata such as movie scripts, web video titles and descriptions or automatically transcribed speech obtained from narrated videos. Training video models on such readily-available textual data is challenging as such annotation is often imprecise or wrong. In this thesis, we introduce learning approaches to deal with weak annotation and design specialized training objectives and neural network architectures.Nous nous intéressons à l’apprentissage automatique d’algorithmes pour la compréhension automatique de vidéos. Une majorité des approaches en compréhension de vidéos dépend de large base de données de vidéos manuellement annotées pour l’entraînement. Cependant, la collection et l’annotation de telles base de données est fastidieuse, coûte cher et prend du temps. Pour palier à ce problème, cette thèse se concentre sur l’exploitation de large quantité d’annotations publiquement disponible, cependant bruitées, sous forme de language naturel. En particulier, nous nous intéressons à un corpus divers de métadonnées textuelles incluant des scripts de films, des titres et descriptions de vidéos internet ou encore des transcriptions de paroles. L’usage de ce type de données publiquement disponibles est difficile car l’annotation y est faible. Pour cela, nous introduisons différentes approches d’apprentissage telles que de nouvelles fonctions de coûts ou architectures de réseaux de neurones, adaptées à de faibles annotations

INRIA a CCSD electronic archive server

Systems and Methods for Measuring and Improving End-User Application Performance on Mobile Devices

Author: Nikravesh Ashkan
Publication venue
Publication date: 01/01/2018
Field of study

In today's rapidly growing smartphone society, the time users are spending on their smartphones is continuing to grow and mobile applications are becoming the primary medium for providing services and content to users. With such fast paced growth in smart-phone usage, cellular carriers and internet service providers continuously upgrade their infrastructure to the latest technologies and expand their capacities to improve the performance and reliability of their network and to satisfy exploding user demand for mobile data. On the other side of the spectrum, content providers and e-commerce companies adopt the latest protocols and techniques to provide smooth and feature-rich user experiences on their applications. To ensure a good quality of experience, monitoring how applications perform on users' devices is necessary. Often, network and content providers lack such visibility into the end-user application performance. In this dissertation, we demonstrate that having visibility into the end-user perceived performance, through system design for efficient and coordinated active and passive measurements of end-user application and network performance, is crucial for detecting, diagnosing, and addressing performance problems on mobile devices. My dissertation consists of three projects to support this statement. First, to provide such continuous monitoring on smartphones with constrained resources that operate in such a highly dynamic mobile environment, we devise efficient, adaptive, and coordinated systems, as a platform, for active and passive measurements of end-user performance. Second, using this platform and other passive data collection techniques, we conduct an in-depth user trial of mobile multipath to understand how Multipath TCP (MPTCP) performs in practice. Our measurement study reveals several limitations of MPTCP. Based on the insights gained from our measurement study, we propose two different schemes to address the identified limitations of MPTCP. Last, we show how to provide visibility into the end- user application performance for internet providers and in particular home WiFi routers by passively monitoring users' traffic and utilizing per-app models mapping various network quality of service (QoS) metrics to the application performance.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146014/1/ashnik_1.pd

Deep Blue Documents at the University of Michigan

Integrating knowledge of hands and objects into egocentric action recognition

Author: Ma Jian
Publication venue
Publication date: 05/12/2023
Field of study

Explore Bristol Research