    Human Motion Capture Data Tailored Transform Coding

    Human motion capture (mocap) is a widely used technique for digitalizing human movements. With growing usage, compressing mocap data has received increasing attention, since compact data size enables efficient storage and transmission. Our analysis shows that mocap data have some unique characteristics that distinguish themselves from images and videos. Therefore, directly borrowing image or video compression techniques, such as discrete cosine transform, does not work well. In this paper, we propose a novel mocap-tailored transform coding algorithm that takes advantage of these features. Our algorithm segments the input mocap sequences into clips, which are represented in 2D matrices. Then it computes a set of data-dependent orthogonal bases to transform the matrices to frequency domain, in which the transform coefficients have significantly less dependency. Finally, the compression is obtained by entropy coding of the quantized coefficients and the bases. Our method has low computational cost and can be easily extended to compress mocap databases. It also requires neither training nor complicated parameter setting. Experimental results demonstrate that the proposed scheme significantly outperforms state-of-the-art algorithms in terms of compression performance and speed

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Robust subspace learning for static and dynamic affect and behaviour modelling

    Machine analysis of human affect and behavior in naturalistic contexts has witnessed a growing attention in the last decade from various disciplines ranging from social and cognitive sciences to machine learning and computer vision. Endowing machines with the ability to seamlessly detect, analyze, model, predict as well as simulate and synthesize manifestations of internal emotional and behavioral states in real-world data is deemed essential for the deployment of next-generation, emotionally- and socially-competent human-centered interfaces. In this thesis, we are primarily motivated by the problem of modeling, recognizing and predicting spontaneous expressions of non-verbal human affect and behavior manifested through either low-level facial attributes in static images or high-level semantic events in image sequences. Both visual data and annotations of naturalistic affect and behavior naturally contain noisy measurements of unbounded magnitude at random locations, commonly referred to as ‘outliers’. We present here machine learning methods that are robust to such gross, sparse noise. First, we deal with static analysis of face images, viewing the latter as a superposition of mutually-incoherent, low-complexity components corresponding to facial attributes, such as facial identity, expressions and activation of atomic facial muscle actions. We develop a robust, discriminant dictionary learning framework to extract these components from grossly corrupted training data and combine it with sparse representation to recognize the associated attributes. We demonstrate that our framework can jointly address interrelated classification tasks such as face and facial expression recognition. Inspired by the well-documented importance of the temporal aspect in perceiving affect and behavior, we direct the bulk of our research efforts into continuous-time modeling of dimensional affect and social behavior. Having identified a gap in the literature which is the lack of data containing annotations of social attitudes in continuous time and scale, we first curate a new audio-visual database of multi-party conversations from political debates annotated frame-by-frame in terms of real-valued conflict intensity and use it to conduct the first study on continuous-time conflict intensity estimation. Our experimental findings corroborate previous evidence indicating the inability of existing classifiers in capturing the hidden temporal structures of affective and behavioral displays. We present here a novel dynamic behavior analysis framework which models temporal dynamics in an explicit way, based on the natural assumption that continuous- time annotations of smoothly-varying affect or behavior can be viewed as outputs of a low-complexity linear dynamical system when behavioral cues (features) act as system inputs. A novel robust structured rank minimization framework is proposed to estimate the system parameters in the presence of gross corruptions and partially missing data. Experiments on prediction of dimensional conflict and affect as well as multi-object tracking from detection validate the effectiveness of our predictive framework and demonstrate that for the first time that complex human behavior and affect can be learned and predicted based on small training sets of person(s)-specific observations.Open Acces

    Taming Wild Faces: Web-Scale, Open-Universe Face Identification in Still and Video Imagery

    With the increasing pervasiveness of digital cameras, the Internet, and social networking, there is a growing need to catalog and analyze large collections of photos and videos. In this dissertation, we explore unconstrained still-image and video-based face recognition in real-world scenarios, e.g. social photo sharing and movie trailers, where people of interest are recognized and all others are ignored. In such a scenario, we must obtain high precision in recognizing the known identities, while accurately rejecting those of no interest. Recent advancements in face recognition research has seen Sparse Representation-based Classification (SRC) advance to the forefront of competing methods. However, its drawbacks, slow speed and sensitivity to variations in pose, illumination, and occlusion, have hindered its wide-spread applicability. The contributions of this dissertation are three-fold: 1. For still-image data, we propose a novel Linearly Approximated Sparse Representation-based Classification (LASRC) algorithm that uses linear regression to perform sample selection for l1-minimization, thus harnessing the speed of least-squares and the robustness of SRC. On our large dataset collected from Facebook, LASRC performs equally to standard SRC with a speedup of 100-250x. 2. For video, applying the popular l1-minimization for face recognition on a frame-by-frame basis is prohibitively expensive computationally, so we propose a new algorithm Mean Sequence SRC (MSSRC) that performs video face recognition using a joint optimization leveraging all of the available video data and employing the knowledge that the face track frames belong to the same individual. Employing MSSRC results in a speedup of 5x on average over SRC on a frame-by-frame basis. 3. Finally, we make the observation that MSSRC sometimes assigns inconsistent identities to the same individual in a scene that could be corrected based on their visual similarity. Therefore, we construct a probabilistic affinity graph combining appearance and co-occurrence similarities to model the relationship between face tracks in a video. Using this relationship graph, we employ random walk analysis to propagate strong class predictions among similar face tracks, while dampening weak predictions. Our method results in a performance gain of 15.8% in average precision over using MSSRC alone

    Intra-Key-Frame Coding and Side Information Generation Schemes in Distributed Video Coding

    In this thesis investigation has been made to propose improved schemes for intra-key-frame coding and side information (SI) generation in a distributed video coding (DVC) framework. From the DVC developments in last few years it has been observed that schemes put more thrust on intra-frame coding and better quality side information (SI) generation. In fact both are interrelated as SI generation is dependent on decoded key frame quality. Hence superior quality key frames generated through intra-key frame coding will in turn are utilized to generate good quality SI frames. As a result, DVC needs less number of parity bits to reconstruct the WZ frames at the decoder. Keeping this in mind, we have proposed two schemes for intra-key frame coding namely, (a) Borrows Wheeler Transform based H.264/AVC (Intra) intra-frame coding (BWT-H.264/AVC(Intra)) (b) Dictionary based H.264/AVC (Intra) intra-frame coding using orthogonal matching pursuit (DBOMP-H.264/AVC (Intra)) BWT-H.264/AVC (Intra) scheme is a modified version of H.264/AVC (Intra) scheme where a regularized bit stream is generated prior to compression. This scheme results in higher compression efficiency as well as high quality decoded key frames. DBOMP-H.264/AVC (Intra) scheme is based on an adaptive dictionary and H.264/AVC (Intra) intra-frame coding. The traditional transform is replaced with a dictionary trained with K-singular value decomposition (K-SVD) algorithm. The dictionary elements are coded using orthogonal matching pursuit (OMP). Further, two side information generation schemes have been suggested namely, (a) Multilayer Perceptron based side information generation (MLP - SI) (b) Multivariable support vector regression based side information generation (MSVR-SI) MLP-SI scheme utilizes a multilayer perceptron (MLP) to estimate SI frames from the decoded key frames block-by-block. The network is trained offline using training patterns from different frames collected from standard video sequences. MSVR-SI scheme uses an optimized multi variable support vector regression (M-SVR) to generate SI frames from decoded key frames block-by-block. Like MLP, the training for M-SVR is made offline with known training patterns apriori. Both intra-key-frame coding and SI generation schemes are embedded in the Stanford based DVC architecture and studied individually to compare performances with their competitive schemes. Visual as well as quantitative evaluations have been made to show the efficacy of the schemes. To exploit the usefulness of intra-frame coding schemes in SI generation, four hybrid schemes have been formulated by combining the aforesaid suggested schemes as follows: (a) BWT-MLP scheme that uses BWT-H.264/AVC (Intra) intra-frame coding scheme and MLP-SI side information generation scheme. (b) BWT-MSVR scheme, where we utilize BWT-H.264/AVC (Intra) for intra-frame coding followed by MSVR-SI based side information generation. (c) DBOMP-MLP scheme is an outcome of putting DBOMP-H.264/AVC (Intra) intra-frame coding and MLP-SI side information generation schemes. (d) DBOMP-MSVR scheme deals with DBOMP-H.264/AVC (Intra) intra-frame coding and MSVR-SI side information generation together. The hybrid schemes are also incorporated into the Stanford based DVC architecture and simulation has been carried out on standard video sequences. The performance analysis with respect to overall rate distortion, number requests per SI frame, temporal evaluation, and decoding time requirement has been made to derive an overall conclusion

    Tensor Networks for Big Data Analytics and Large-Scale Optimization Problems

    In this paper we review basic and emerging models and associated algorithms for large-scale tensor networks, especially Tensor Train (TT) decompositions using novel mathematical and graphical representations. We discus the concept of tensorization (i.e., creating very high-order tensors from lower-order original data) and super compression of data achieved via quantized tensor train (QTT) networks. The purpose of a tensorization and quantization is to achieve, via low-rank tensor approximations "super" compression, and meaningful, compact representation of structured data. The main objective of this paper is to show how tensor networks can be used to solve a wide class of big data optimization problems (that are far from tractable by classical numerical methods) by applying tensorization and performing all operations using relatively small size matrices and tensors and applying iteratively optimized and approximative tensor contractions. Keywords: Tensor networks, tensor train (TT) decompositions, matrix product states (MPS), matrix product operators (MPO), basic tensor operations, tensorization, distributed representation od data optimization problems for very large-scale problems: generalized eigenvalue decomposition (GEVD), PCA/SVD, canonical correlation analysis (CCA).Comment: arXiv admin note: text overlap with arXiv:1403.204