Search CORE

8 research outputs found

Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020

Author: Bailer Werner
Gurrin Cathal
Jónsson Björn Thór
Kovalčík Gregor
Lokoč Jakub
Mejzlík František
Rossetto Luca
Sauter Loris
Schoeffmann Klaus
Song Jaeyub
Souček Tomáš
Veselý Patrik
Vrochidis Stefanos
Wu Jiaxin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2021
Field of study

The IT University of Copenhagen's Repository

ZORA

An overview on the evaluated video retrieval tasks at TRECVID 2022

Author: Awad George
Butt Asad
Curtis Keith
Delgado Andrew
Diduch Lukas
Fiscus Jonathan
Godard Eliot
Godil Afzal
Graham Yvette
Lee Yooyoung
Liu Jeffrey
Quenot Georges
Publication venue
Publication date: 22/06/2023
Field of study

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, tasks-based evaluation supported by metrology. Over the last twenty-one years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2022 planned for the following six tasks: Ad-hoc video search, Video to text captioning, Disaster scene description and indexing, Activity in extended videos, deep video understanding, and movie summarization. In total, 35 teams from various research organizations worldwide signed up to join the evaluation campaign this year. This paper introduces the tasks, datasets used, evaluation frameworks and metrics, as well as a high-level results overview.Comment: arXiv admin note: substantial text overlap with arXiv:2104.13473, arXiv:2009.0998

arXiv.org e-Print Archive

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

Author: Cheng Lechao
Hao Yanbin
Ngo Chong-Wah
Zhang Hao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/07/2022
Field of study

Video transformer naturally incurs a heavier computation burden than a static vision transformer, as the former processes

T

times longer sequence than the latter under the current attention of quadratic complexity

(T^2N^2)

. The existing works treat the temporal axis as a simple extension of spatial axes, focusing on shortening the spatio-temporal sequence by either generic pooling or local windowing without utilizing temporal redundancy. However, videos naturally contain redundant information between neighboring frames; thereby, we could potentially suppress attention on visually similar frames in a dilated manner. Based on this hypothesis, we propose the LAPS, a long-term ``\textbf{\textit{Leap Attention}}'' (LA), short-term ``\textbf{\textit{Periodic Shift}}'' (\textit{P}-Shift) module for video transformers, with

(2TN^2)

complexity. Specifically, the ``LA'' groups long-term frames into pairs, then refactors each discrete pair via attention. The ``\textit{P}-Shift'' exchanges features between temporal neighbors to confront the loss of short-term dynamics. By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead (

\sim

2.6\%). Experiments on the standard Kinetics-400 benchmark demonstrate that our LAPS transformer could achieve competitive performances in terms of accuracy, FLOPs, and Params among CNN and transformer SOTAs. We open-source our project in \sloppy \href{https://github.com/VideoNetworks/LAPS-transformer}{\textit{\color{magenta}{https://github.com/VideoNetworks/LAPS-transformer}}} .Comment: Accepted by ACM Multimedia 2022, 10 pages, 4 figure

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

Author: Bao Xu
Cheng Zhi-Qi
Geng Yifeng
He Jun-Yan
Li Chenyang
Liu Hanbing
Liu Wei
Luo Bin
Sun Jingdong
Xiang Wangmeng
Xie Xuansong
Publication venue
Publication date: 18/08/2023
Field of study

In the realm of facial analysis, accurate landmark detection is crucial for various applications, ranging from face recognition and expression analysis to animation. Conventional heatmap or coordinate regression-based techniques, however, often face challenges in terms of computational burden and quantization errors. To address these issues, we present the KeyPoint Positioning System (KeyPosS) - a groundbreaking facial landmark detection framework that stands out from existing methods. The framework utilizes a fully convolutional network to predict a distance map, which computes the distance between a Point of Interest (POI) and multiple anchor points. These anchor points are ingeniously harnessed to triangulate the POI's position through the True-range Multilateration algorithm. Notably, the plug-and-play nature of KeyPosS enables seamless integration into any decoding stage, ensuring a versatile and adaptable solution. We conducted a thorough evaluation of KeyPosS's performance by benchmarking it against state-of-the-art models on four different datasets. The results show that KeyPosS substantially outperforms leading methods in low-resolution settings while requiring a minimal time overhead. The code is available at https://github.com/zhiqic/KeyPosS.Comment: Accepted to ACM Multimedia 2023; 10 pages, 7 figures, 6 tables; the code is at https://github.com/zhiqic/KeyPos

arXiv.org e-Print Archive

Machine Learning Architectures for Video Annotation and Retrieval

Author: Markatopoulou Foteini
Publication venue: 'Queen Mary University of London'
Publication date: 18/09/2018
Field of study

PhDIn this thesis we are designing machine learning methodologies for solving the problem of video annotation and retrieval using either pre-defined semantic concepts or ad-hoc queries. Concept-based video annotation refers to the annotation of video fragments with one or more semantic concepts (e.g. hand, sky, running), chosen from a predefined concept list. Ad-hoc queries refer to textual descriptions that may contain objects, activities, locations etc., and combinations of the former. Our contributions are: i) A thorough analysis on extending and using different local descriptors towards improved concept-based video annotation and a stacking architecture that uses in the first layer, concept classifiers trained on local descriptors and improves their prediction accuracy by implicitly capturing concept relations, in the last layer of the stack. ii) A cascade architecture that orders and combines many classifiers, trained on different visual descriptors, for the same concept. iii) A deep learning architecture that exploits concept relations at two different levels. At the first level, we build on ideas from multi-task learning, and propose an approach to learn concept-specific representations that are sparse, linear combinations of representations of latent concepts. At a second level, we build on ideas from structured output learning, and propose the introduction, at training time, of a new cost term that explicitly models the correlations between the concepts. By doing so, we explicitly model the structure in the output space (i.e., the concept labels). iv) A fully-automatic ad-hoc video search architecture that combines concept-based video annotation and textual query analysis, and transforms concept-based keyframe and query representations into a common semantic embedding space. Our architectures have been extensively evaluated on the TRECVID SIN 2013, the TRECVID AVS 2016, and other large-scale datasets presenting their effectiveness compared to other similar approaches

Queen Mary Research Online

Vireo @ TRecViD 2017: Video-to-text, ad-hoc video search and video hyperlinking

Author: CHENG Zhi-Qi
LI Qing
LU Yi-Jie
NGO Chong-wah
NGUYEN Phuong Anh
WU Xiao
ZHANG Hao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2017
Field of study

Institutional Knowledge at Singapore Management University