782 research outputs found
Towards key-frame extraction methods for 3D video: a review
The increasing rate of creation and use of 3D video content leads to a pressing need for methods capable of lowering
the cost of 3D video searching, browsing and indexing operations, with improved content selection performance.
Video summarisation methods specifically tailored for 3D video content fulfil these requirements. This paper presents
a review of the state-of-the-art of a crucial component of 3D video summarisation algorithms: the key-frame
extraction methods. The methods reviewed cover 3D video key-frame extraction as well as shot boundary detection
methods specific for use in 3D video. The performance metrics used to evaluate the key-frame extraction methods
and the summaries derived from those key-frames are presented and discussed. The applications of these methods
are also presented and discussed, followed by an exposition about current research challenges on 3D video
summarisation methods
Saliency-based Video Summarization for Face Anti-spoofing
Due to the growing availability of face anti-spoofing databases, researchers
are increasingly focusing on video-based methods that use hundreds to thousands
of images to assess their impact on performance. However, there is no clear
consensus on the exact number of frames in a video required to improve the
performance of face anti-spoofing tasks. Inspired by the visual saliency
theory, we present a video summarization method for face anti-spoofing tasks
that aims to enhance the performance and efficiency of deep learning models by
leveraging visual saliency. In particular, saliency information is extracted
from the differences between the Laplacian and Wiener filter outputs of the
source images, enabling identification of the most visually salient regions
within each frame. Subsequently, the source images are decomposed into base and
detail layers, enhancing representation of important information. The weighting
maps are then computed based on the saliency information, indicating the
importance of each pixel in the image. By linearly combining the base and
detail layers using the weighting maps, the method fuses the source images to
create a single representative image that summarizes the entire video. The key
contribution of our proposed method lies in demonstrating how visual saliency
can be used as a data-centric approach to improve the performance and
efficiency of face presentation attack detection models. By focusing on the
most salient images or regions within the images, a more representative and
diverse training set can be created, potentially leading to more effective
models. To validate the method's effectiveness, a simple deep learning
architecture (CNN-RNN) was used, and the experimental results showcased
state-of-the-art performance on five challenging face anti-spoofing datasets
Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement Analysis
abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
NeurJSCC Enabled Semantic Communications: Paradigms, Applications, and Potentials
Recent advances in deep learning have led to increased interest in solving
high-efficiency end-to-end transmission problems using methods that employ the
nonlinear property of neural networks. These techniques, we call neural joint
source-channel coding (NeurJSCC), extract latent semantic features of the
source signal across space and time, and design corresponding variable-length
NeurJSCC approaches to transmit latent features over wireless communication
channels. Rapid progress has led to numerous research papers, but a
consolidation of the discovered knowledge has not yet emerged. In this article,
we gather diverse ideas to categorize the expansive aspects on NeurJSCC as two
paradigms, i.e., explicit and implicit NeurJSCC. We first focus on those two
paradigms of NeurJSCC by identifying their common and different components in
building end-to-end communication systems. We then focus on typical
applications of NeurJSCC to various communication tasks. Our article highlights
the improved quality, flexibility, and capability brought by NeurJSCC, and we
also point out future directions
Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment
Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. We propose a framework for retrieval of mathematical notation using symbol pairs extracted from visual and semantic representations of mathematical expressions on the symbolic domain for retrieval of text documents. We further adapt our model for retrieval of mathematical notation on images and lecture videos. Graph-based representations are used on each modality to describe math formulas. For symbolic formula retrieval, where the structure is known, we use symbol layout trees and operator trees. For image-based formula retrieval, since the structure is unknown we use a more general Line of Sight graph representation. Paths of these graphs define symbol pairs tuples that are used as the entries for our inverted index of mathematical notation. Our retrieval framework uses a three-stage approach with a fast selection of candidates as the first layer, a more detailed matching algorithm with similarity metric computation in the second stage, and finally when relevance assessments are available, we use an optional third layer with linear regression for estimation of relevance using multiple similarity scores for final re-ranking. Our model has been evaluated using large collections of documents, and preliminary results are presented for videos and cross-modal search. The proposed framework can be adapted for other domains like chemistry or technical diagrams where two visually similar elements from a collection are usually related to each other
- …