Search CORE

60,580 research outputs found

SoccerDB: A Large-Scale Database for Comprehensive Video Understanding

Author: Chen Leilei
Cui Kaixu
Jiang Yudong
Wang Canjin
Xu Changliang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/09/2020
Field of study

Soccer videos can serve as a perfect research object for video understanding because soccer games are played under well-defined rules while complex and intriguing enough for researchers to study. In this paper, we propose a new soccer video database named SoccerDB, comprising 171,191 video segments from 346 high-quality soccer games. The database contains 702,096 bounding boxes, 37,709 essential event labels with time boundary and 17,115 highlight annotations for object detection, action recognition, temporal action localization, and highlight detection tasks. To our knowledge, it is the largest database for comprehensive sports video understanding on various aspects. We further survey a collection of strong baselines on SoccerDB, which have demonstrated state-of-the-art performances on independent tasks. Our evaluation suggests that we can benefit significantly when jointly considering the inner correlations among those tasks. We believe the release of SoccerDB will tremendously advance researches around comprehensive video understanding. {\itshape Our dataset and code published on https://github.com/newsdata/SoccerDB.}Comment: accepted by MM2020 sports worksho

arXiv.org e-Print Archive

Crossref

Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition

Author: Ballan Lamberto
Kapil Ansh
Liu Nan
Rupprecht Christian
Tombari Federico
Publication venue: 'Elsevier BV'
Publication date: 14/06/2017
Field of study

Webly-supervised learning has recently emerged as an alternative paradigm to traditional supervised learning based on large-scale datasets with manual annotations. The key idea is that models such as CNNs can be learned from the noisy visual data available on the web. In this work we aim to exploit web data for video understanding tasks such as action recognition and detection. One of the main problems in webly-supervised learning is cleaning the noisy labeled data from the web. The state-of-the-art paradigm relies on training a first classifier on noisy data that is then used to clean the remaining dataset. Our key insight is that this procedure biases the second classifier towards samples that the first one understands. Here we train two independent CNNs, a RGB network on web images and video frames and a second network using temporal information from optical flow. We show that training the networks independently is vastly superior to selecting the frames for the flow classifier by using our RGB network. Moreover, we show benefits in enriching the training set with different data sources from heterogeneous public web databases. We demonstrate that our framework outperforms all other webly-supervised methods on two public benchmarks, UCF-101 and Thumos'14.Comment: Submitted to CVIU SI: Computer Vision and the We

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

Author: Chan Jonathan
Chang Shih-Fu
Miyazawa Kazuyuki
Shou Zheng
Zareian Alireza
Publication venue
Publication date: 13/06/2017
Field of study

Temporal action localization is an important yet challenging problem. Given a long, untrimmed video consisting of multiple action instances and complex background contents, we need not only to recognize their action categories, but also to localize the start time and end time of each instance. Many state-of-the-art systems use segment-level classifiers to select and rank proposal segments of pre-determined boundaries. However, a desirable model should move beyond segment-level and make dense predictions at a fine granularity in time to determine precise temporal boundaries. To this end, we design a novel Convolutional-De-Convolutional (CDC) network that places CDC filters on top of 3D ConvNets, which have been shown to be effective for abstracting action semantics but reduce the temporal length of the input data. The proposed CDC filter performs the required temporal upsampling and spatial downsampling operations simultaneously to predict actions at the frame-level granularity. It is unique in jointly modeling action semantics in space-time and fine-grained temporal dynamics. We train the CDC network in an end-to-end manner efficiently. Our model not only achieves superior performance in detecting actions in every frame, but also significantly boosts the precision of localizing temporal boundaries. Finally, the CDC network demonstrates a very high efficiency with the ability to process 500 frames per second on a single GPU server. We will update the camera-ready version and publish the source codes online soon.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201

arXiv.org e-Print Archive

Crossref

Ego-Downward and Ambient Video based Person Location Association

Author: Huo Zhouyuan
Jiang Hao
Xiao Jizhong
Yang Liang
Publication venue
Publication date: 02/12/2018
Field of study

Using an ego-centric camera to do localization and tracking is highly needed for urban navigation and indoor assistive system when GPS is not available or not accurate enough. The traditional hand-designed feature tracking and estimation approach would fail without visible features. Recently, there are several works exploring to use context features to do localization. However, all of these suffer severe accuracy loss if given no visual context information. To provide a possible solution to this problem, this paper proposes a camera system with both ego-downward and third-static view to perform localization and tracking in a learning approach. Besides, we also proposed a novel action and motion verification model for cross-view verification and localization. We performed comparative experiments based on our collected dataset which considers the same dressing, gender, and background diversity. Results indicate that the proposed model can achieve

18.32 \%

improvement in accuracy performance. Eventually, we tested the model on multi-people scenarios and obtained an average

67.767 \%

accuracy

arXiv.org e-Print Archive

Crossref

Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

Author: Chen Kwang-Cheng
Hanzo Lajos
Jiang Chunxiao
Ren Yong
Wang Jingjing
Zhang Haijun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/01/2019
Field of study

Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig

arXiv.org e-Print Archive

Southampton (e-Prints Soton)