1,958 research outputs found

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    An HMM-Based Framework for Video Semantic Analysis

    Get PDF
    Video semantic analysis is essential in video indexing and structuring. However, due to the lack of robust and generic algorithms, most of the existing works on semantic analysis are limited to specific domains. In this paper, we present a novel hidden Markove model (HMM)-based framework as a general solution to video semantic analysis. In the proposed framework, semantics in different granularities are mapped to a hierarchical model space, which is composed of detectors and connectors. In this manner, our model decomposes a complex analysis problem into simpler subproblems during the training process and automatically integrates those subproblems for recognition. The proposed framework is not only suitable for a broad range of applications, but also capable of modeling semantics in different semantic granularities. Additionally, we also present a new motion representation scheme, which is robust to different motion vector sources. The applications of the proposed framework in basketball event detection, soccer shot classification, and volleyball sequence analysis have demonstrated the effectiveness of the proposed framework on video semantic analysis

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Affect recognition & generation in-the-wild

    Get PDF
    Affect recognition based on a subject’s facial expressions has been a topic of major research in the attempt to generate machines that can understand the way subjects feel, act and react. In the past, due to the unavailability of large amounts of data captured in real-life situations, research has mainly focused on controlled environments. However, recently, social media and platforms have been widely used. Moreover, deep learning has emerged as a means to solve visual analysis and recognition problems. This Ph.D. Thesis exploits these advances and makes significant contributions for affect analysis and recognition in-the-wild. We tackle affect analysis and recognition as a dual knowledge generation problem: i) we create new, large and rich in-the-wild databases and ii) we design and train novel deep neural architectures that are able to analyse affect over these databases and to successfully generalise their performance on other datasets. At first, we present the creation of Aff-Wild database annotated according to valence-arousal and an end-to-end CNN-RNN architecture, AffWildNet. Then we use AffWildNet as a robust prior for dimensional and categorical affect recognition and extend it by extracting low-/mid-/high-level latent information and analysing this via multiple RNNs. Additionally, we propose a novel loss function for DNN-based categorical affect recognition. Next, we generate Aff-Wild2, the first database containing annotations for all main behavior tasks: estimate Valence-Arousal; classify into Basic Expressions; detect Action Units. We develop multi-task and multi-modal extensions of AffWildNet by fusing these tasks and propose a novel holistic approach that utilises all existing databases with non-overlapping annotations and couples them through co-annotation and distribution matching. Finally, we present an approach for valence-arousal, or basic expressions’ facial affect synthesis. We generate an image with a given affect, or a sequence of images with evolving affect, by annotating a 4-D database and utilising a 3-D morphable model.Open Acces
    • …
    corecore