701 research outputs found
Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection
Abstract—This paper addresses a spatiotemporal pattern recognition problem. The main purpose of this study is to find a right representation and matching of action video volumes for categorization. A novel method is proposed to measure video-to-video volume similarity by extending Canonical Correlation Analysis (CCA), a principled tool to inspect linear relations between two sets of vectors, to that of two multiway data arrays (or tensors). The proposed method analyzes video volumes as inputs avoiding the difficult problem of explicit motion estimation required in traditional methods and provides a way of spatiotemporal pattern matching that is robust to intraclass variations of actions. The proposed matching is demonstrated for action classification by a simple Nearest Neighbor classifier. We, moreover, propose an automatic action detection method, which performs 3D window search over an input video with action exemplars. The search is speeded up by dynamic learning of subspaces in the proposed CCA. Experiments on a public action data set (KTH) and a self-recorded hand gesture data showed that the proposed method is significantly better than various state-ofthe-art methods with respect to accuracy. Our method has low time complexity and does not require any major tuning parameters. Index Terms—Action categorization, gesture recognition, canonical correlation analysis, tensor, action detection, incremental subspace learning, spatiotemporal pattern classification. Ç
On-line Learning of Mutually Orthogonal Subspaces for Face Recognition by Image Sets
We address the problem of face recognition by matching image sets. Each set of face images is represented by a subspace (or linear manifold) and recognition is carried out by subspace-to-subspace matching. In this paper, 1) a new discriminative method that maximises orthogonality between subspaces is proposed. The method improves the discrimination power of the subspace angle based face recognition method by maximizing the angles between different classes. 2) We propose a method for on-line updating the discriminative subspaces as a mechanism for continuously improving recognition accuracy. 3) A further enhancement called locally orthogonal subspace method is presented to maximise the orthogonality between competing classes. Experiments using 700 face image sets have shown that the proposed method outperforms relevant prior art and effectively boosts its accuracy by online learning. It is shown that the method for online learning delivers the same solution as the batch computation at far lower computational cost and the locally orthogonal method exhibits improved accuracy. We also demonstrate the merit of the proposed face recognition method on portal scenarios of multiple biometric grand challenge
Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression
In this work we present a novel approach to joint semantic localisation and
scene understanding. Our work is motivated by the need for localisation
algorithms which not only predict 6-DoF camera pose but also simultaneously
recognise surrounding objects and estimate 3D geometry. Such capabilities are
crucial for computer vision guided systems which interact with the environment:
autonomous driving, augmented reality and robotics. In particular, we propose a
two step procedure. During the first step we train a convolutional neural
network to jointly predict per-pixel globally unique instance labels and
corresponding local coordinates for each instance of a static object (e.g. a
building). During the second step we obtain scene coordinates by combining
object center coordinates and local coordinates and use them to perform 6-DoF
camera pose estimation. We evaluate our approach on real world (CamVid-360) and
artificial (SceneCity) autonomous driving datasets. We obtain smaller mean
distance and angular errors than state-of-the-art 6-DoF pose estimation
algorithms based on direct pose regression and pose estimation from scene
coordinates on all datasets. Our contributions include: (i) a novel formulation
of scene coordinate regression as two separate tasks of object instance
recognition and local coordinate regression and a demonstration that our
proposed solution allows to predict accurate 3D geometry of static objects and
estimate 6-DoF pose of camera on (ii) maps larger by several orders of
magnitude than previously attempted by scene coordinate regression methods, as
well as on (iii) lightweight, approximate 3D maps built from 3D primitives such
as building-aligned cuboids.Toyota Corporatio
Recommended from our members
Creatures Great and SMAL: Recovering the Shape and Motion of Animals from Video
We present a system to recover the 3D shape and motion of a wide variety of
quadrupeds from video. The system comprises a machine learning front-end which
predicts candidate 2D joint positions, a discrete optimization which finds
kinematically plausible joint correspondences, and an energy minimization stage
which fits a detailed 3D model to the image. In order to overcome the limited
availability of motion capture training data from animals, and the difficulty
of generating realistic synthetic training images, the system is designed to
work on silhouette data. The joint candidate predictor is trained on
synthetically generated silhouette images, and at test time, deep learning
methods or standard video segmentation tools are used to extract silhouettes
from real data. The system is tested on animal videos from several species, and
shows accurate reconstructions of 3D shape and pose.GlaxoSmithKlin
Face recognition with image sets using manifold density divergence
In many automatic face recognition applications, a set of a person\u27s face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semi-parametric model for learning probability densities confined to highly non-linear but intrinsically low-dimensional manifolds. The model leads to a statistical formulation of the recognition problem in terms of minimizing the divergence between densities estimated on these manifolds. The proposed method is evaluated on a large data set, acquired in realistic imaging conditions with severe illumination variation. Our algorithm is shown to match the best and outperform other state-of-the-art algorithms in the literature, achieving 94% recognition rate on average
Recommended from our members
Large Scale Labelled Video Data Augmentation for Semantic Segmentation in Driving Scenarios
In this paper we present an analysis of the effect of large scale video data augmentation for semantic segmentation in driving scenarios. Our work is motivated by a strong correlation between the high performance of most recent deep learning based methods and the availability of large volumes
of ground truth labels. To generate additional labelled data, we make use of an occlusion-aware and uncertainty-enabled label propagation algorithm. As a result we increase the availability of high-resolution labelled frames by a factor of 20, yielding in a 6.8% to 10.8% rise in average classification accuracy and/or IoU scores for several semantic segmentation networks.
Our key contributions include: (a) augmented CityScapes and CamVid datasets providing 56.2K and 6.5K additional labelled frames of object classes respectively, (b) detailed empirical analysis of the effect of the use of augmented data as well as (c) extension of proposed framework to instance segmentation
Cardiovascular toxicity induced by chemotherapy, targeted agents and radiotherapy: ESMO Clinical Practice Guidelines
Cardiovascular (CV) toxicity is a potential short- or long-term complication of various anticancer therapies. Some drugs, such as anthracyclines or other biological agents, have been implicated in causing potentially irreversible clinically important cardiac dysfunction. Although targeted therapies are considered less toxic and better tolerated by patients compared with classic chemotherapy agents, rare but serious complications have been described, and longer follow-up is needed to determine the exact profile and outcomes of related cardiac side-effects. Some of these side-effects are irreversible, leading to progressive CV disease, and some others induce reversible dysfunction with no long-term cardiac damage to the patient. Assessment of the prevalence, type and severity of cardiac toxicity caused by various cancer treatments is a breakthrough topic for patient management. Guidelines for preventing, monitoring and treating cardiac side-effects are a major medical need. Efforts are needed to promote strategies for cardiac risk prevention, detection and management, avoiding unintended consequences that can impede development, regulatory approval and patient access to novel therapy. These new ESMO Clinical Practice Guidelines are the result of a multidisciplinary cardio-oncology review of current evidence with the ultimate goal of providing strict criteria-based recommendations on CV risk prevention, assessment, monitoring and management during anticancer treatmen
- …