48,406 research outputs found
Context-Aware Neural Video Compression on Solar Dynamics Observatory
NASA's Solar Dynamics Observatory (SDO) mission collects large data volumes
of the Sun's daily activity. Data compression is crucial for space missions to
reduce data storage and video bandwidth requirements by eliminating
redundancies in the data. In this paper, we present a novel neural
Transformer-based video compression approach specifically designed for the SDO
images. Our primary objective is to efficiently exploit the temporal and
spatial redundancies inherent in solar images to obtain a high compression
ratio. Our proposed architecture benefits from a novel Transformer block called
Fused Local-aware Window (FLaWin), which incorporates window-based
self-attention modules and an efficient fused local-aware feed-forward (FLaFF)
network. This architectural design allows us to simultaneously capture
short-range and long-range information while facilitating the extraction of
rich and diverse contextual representations. Moreover, this design choice
results in reduced computational complexity. Experimental results demonstrate
the significant contribution of the FLaWin Transformer block to the compression
performance, outperforming conventional hand-engineered video codecs such as
H.264 and H.265 in terms of rate-distortion trade-off.Comment: Accepted to IEEE 22 International Conference on Machine
Learning and Applications 2023 (ICMLA) - Selected for Oral Presentatio
Extraction of activity patterns on large video recordings
International audienceExtracting the hidden and useful knowledge embedded within video sequences and thereby discovering relations between the various elements to help an efficient decision-making process is a challenging task. The task of knowledge discovery and information analysis is possible because of recent advancements in object detection and tracking. The authors present how video information is processed with the ultimate aim to achieve knowledge discovery of people activity and also extract the relationship between the people and contextual objects in the scene. First, the object of interest and its semantic characteristics are derived in real-time. The semantic information related to the objects is represented in a suitable format for knowledge discovery. Next, two clustering processes are applied to derive the knowledge from the video data. Agglomerative hierarchical clustering is used to find the main trajectory patterns of people and relational analysis clustering is employed to extract the relationship between people, contextual objects and events. Finally, the authors evaluate the proposed activity extraction model using real video sequences from underground metro networks (CARETAKER) and a building hall (CAVIAR)
Contextual cropping and scaling of TV productions
This is the author's accepted manuscript. The final publication is available at Springer via http://dx.doi.org/10.1007/s11042-011-0804-3. Copyright @ Springer Science+Business Media, LLC 2011.In this paper, an application is presented which automatically adapts SDTV (Standard Definition Television) sports productions to smaller displays through intelligent cropping and scaling. It crops regions of interest of sports productions based on a smart combination of production metadata and systematic video analysis methods. This approach allows a context-based composition of cropped images. It provides a differentiation between the original SD version of the production and the processed one adapted to the requirements for mobile TV. The system has been comprehensively evaluated by comparing the outcome of the proposed method with manually and statically cropped versions, as well as with non-cropped versions. Envisaged is the integration of the tool in post-production and live workflows
Multimedia search without visual analysis: the value of linguistic and contextual information
This paper addresses the focus of this special issue by analyzing the potential contribution of linguistic content and other non-image aspects to the processing of audiovisual data. It summarizes the various ways in which linguistic content analysis contributes to enhancing the semantic annotation of multimedia content, and, as a consequence, to improving the effectiveness of conceptual media access tools. A number of techniques are presented, including the time-alignment of textual resources, audio and speech processing, content reduction and reasoning tools, and the exploitation of surface features
Unsupervised video anomaly detection in UAVs: a new approach based on learning and inference
In this paper, an innovative approach to detecting anomalous occurrences in video data without supervision is introduced, leveraging contextual data derived from visual characteristics and effectively addressing the semantic discrepancy that exists between visual information and the interpretation of atypical incidents. Our work incorporates Unmanned Aerial Vehicles (UAVs) to capture video data from a different perspective and to provide a unique set of visual features. Specifically, we put forward a technique for discerning context through scene comprehension, which entails the construction of a spatio-temporal contextual graph to represent various aspects of visual information. These aspects encompass the manifestation of objects, their interrelations within the spatio-temporal domain, and the categorization of the scenes captured by UAVs. To encode context information, we utilize Transformer with message passing for updating the graph's nodes and edges. Furthermore, we have designed a graph-oriented deep Variational Autoencoder (VAE) approach for unsupervised categorization of scenes, enabling the extraction of the spatio-temporal context graph across diverse settings. In conclusion, by utilizing contextual data, we ascertain anomaly scores at the frame-level to identify atypical occurrences. We assessed the efficacy of the suggested approach by employing it on a trio of intricate data collections, specifically, the UCF-Crime, Avenue, and ShanghaiTech datasets, which provided substantial evidence of the method's successful performance
- âŠ