8 research outputs found
An overview on the evaluated video retrieval tasks at TRECVID 2022
The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis
and retrieval evaluation with the goal of promoting progress in research and
development of content-based exploitation and retrieval of information from
digital video via open, tasks-based evaluation supported by metrology. Over the
last twenty-one years this effort has yielded a better understanding of how
systems can effectively accomplish such processing and how one can reliably
benchmark their performance. TRECVID has been funded by NIST (National
Institute of Standards and Technology) and other US government agencies. In
addition, many organizations and individuals worldwide contribute significant
time and effort. TRECVID 2022 planned for the following six tasks: Ad-hoc video
search, Video to text captioning, Disaster scene description and indexing,
Activity in extended videos, deep video understanding, and movie summarization.
In total, 35 teams from various research organizations worldwide signed up to
join the evaluation campaign this year. This paper introduces the tasks,
datasets used, evaluation frameworks and metrics, as well as a high-level
results overview.Comment: arXiv admin note: substantial text overlap with arXiv:2104.13473,
arXiv:2009.0998
TRECVID 2014 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics
International audienceThe TREC Video Retrieval Evaluation (TRECVID) 2014 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last dozen years this effort has yielded a better under- standing of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST with support from other US government agencies. Many organizations and individuals worldwide contribute significant time and effort
TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search
International audienc
TRECVID 2015 – An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics
International audienc
Detección de eventos en secuencias con multitudes
El objetivo de este proyecto es la implementación de un sistema de reconocimiento automático de eventos anóomalos en secuencias de vÃdeo donde se vean involucradas un gran número de personas. Para desarrollar y probar el sistema se ha utilizado la base de datos PETS Dataset S3, High Level, que contiene siete secuencias de vÃdeo en las aparecen los siguientes eventos: - Walking: Representa a un n úmero significativo de personas desplaz andose lentamente. - Running: Representa a un número significativo de personas desplazándose rápidamente. - Evacuation: Representa la dispersión rápida en diferentes direcciones de una multitud. - Crowd Formation: Representa la unión en un grupo de un gran número de individuos provenientes de diferentes direcciones. - Crowd Splitting: Representa la división de un grupo de individuos, en dos o más grupos que toman diferentes direcciones. - Local Dispersion: Representa la dispersión de un pequeño grupo de individuos de una multitud.IngenierÃa de Sistemas Audiovisuale
Intelligent Data Analytics using Deep Learning for Data Science
Nowadays, data science stimulates the interest of academics and practitioners because it can assist in the extraction of significant insights from massive amounts of data. From the years 2018 through 2025, the Global Datasphere is expected to rise from 33 Zettabytes to 175 Zettabytes, according to the International Data Corporation. This dissertation proposes an intelligent data analytics framework that uses deep learning to tackle several difficulties when implementing a data science application. These difficulties include dealing with high inter-class similarity, the availability and quality of hand-labeled data, and designing a feasible approach for modeling significant correlations in features gathered from various data sources. The proposed intelligent data analytics framework employs a novel strategy for improving data representation learning by incorporating supplemental data from various sources and structures. First, the research presents a multi-source fusion approach that utilizes confident learning techniques to improve the data quality from many noisy sources. Meta-learning methods based on advanced techniques such as the mixture of experts and differential evolution combine the predictive capacity of individual learners with a gating mechanism, ensuring that only the most trustworthy features or predictions are integrated to train the model. Then, a Multi-Level Convolutional Fusion is presented to train a model on the correspondence between local-global deep feature interactions to identify easily confused samples of different classes. The convolutional fusion is further enhanced with the power of Graph Transformers, aggregating the relevant neighboring features in graph-based input data structures and achieving state-of-the-art performance on a large-scale building damage dataset. Finally, weakly-supervised strategies, noise regularization, and label propagation are proposed to train a model on sparse input labeled data, ensuring the model\u27s robustness to errors and supporting the automatic expansion of the training set. The suggested approaches outperformed competing strategies in effectively training a model on a large-scale dataset of 500k photos, with just about 7% of the images annotated by a human. The proposed framework\u27s capabilities have benefited various data science applications, including fluid dynamics, geometric morphometrics, building damage classification from satellite pictures, disaster scene description, and storm-surge visualization