272 research outputs found

    Anomaly detection in moving-camera videos with sparse and low-rank matrix decompositions

    Get PDF
    This work presents two methods based on sparse decompositions that can detect anomalies in video sequences obtained from moving cameras. The first method starts by computing the union of subspaces (UoS) that best represents all the frames from a reference (anomaly-free) video as a low-rank projection plus a sparse residue. Then it performs a low-rank representation of the target (possibly anomalous) video by taking advantage of both the UoS and the sparse residue computed from the reference video. The anomalies are extracted after post-processing this video with these residual data. Such algorithm provides good detection results while at the same time obviating the need for previous video synchronization. However, this technique looses its detection efficiency when target and reference videos presents more severe misalignments. This may happen due to small uncontrolled camera moviment and shaking during the acquisition phase, which is often common in realworld situations. To extend its applicability, a second contribution is proposed in order to cope with these possible pose misalignments. This is done by modeling the target-reference pose discrepancy as geometric transformations acting on the domain of frames of the target video. A complete matrix decomposition algorithm is presented in order to perform a sparse representation of the target video as a sparse combination of the reference video plus a sparse residue, while taking into account the transformation acting on it. Our method is then verified and compared against state-of-the-art techniques using a challenging video dataset, that comprises recordings presenting the described misalignments. Under the evaluation metrics used, the second proposed method exhibits an improvement of at least 16% over the first proposed one, and 22% over the next best rated method.Apresentamos dois métodos baseados em decomposições esparsas que podem detectar anomalias em sequências de vídeo obtidas por câmeras em movimento. O primeiro método estima a união de subespaços (UdS) que melhor representa todos os quadros de um vídeo de referência (livre de anomalias) como uma projeção de baixo-posto mais um resíduo esparso. Em seguida, é realizada uma representação de baixo-posto do vídeo alvo (possivelmente anômalo) aproveitando a UdS e o resíduo esparso calculado a partir do vídeo de referência. As anomalias são extraídas após o pós-processamento destas informações residuais. Esse algoritmo fornece bons resultados de detecção, além de eliminar a necessidade de uma sincronização prévia dos vídeos. No entanto, essa técnica perde eficiência quando os vídeos de referência e alvo apresentam desalinhamentos mais graves entre si. Isso pode ocorrer devido a pequenos movimentos descontrolados da câmera e tremores durante a fase de aquisição. Para estender sua aplicabilidade, uma segunda contribuição é proposta a fim de lidar com esse possível desalinhamento. Isso é feito modelando a discrepância de pose de câmera entre os vídeos de referência e alvo com transformações geométricas agindo no domínio dos quadros do vídeo alvo. Um algoritmo completo de decomposição de matrizes é apresentado para realizar uma representação esparsa do vídeo alvo como uma combinação esparsa do vídeo de referência, levando em consideração as transformações que atuam sobre seus quadros. Nosso método é então verificado e comparado com técnicas de última geração com auxílio de vídeos de uma base desafiadora, apresentando os desalinhamentos em questão. Sob as métricas de avaliação usadas, o segundo método proposto exibe uma melhoria de pelo menos 16% em relação ao primeiro, e 22% sobre o método melhor avaliado logo em seguida

    Robust Subspace Estimation via Low-Rank and Sparse Decomposition and Applications in Computer Vision

    Get PDF
    PhDRecent advances in robust subspace estimation have made dimensionality reduction and noise and outlier suppression an area of interest for research, along with continuous improvements in computer vision applications. Due to the nature of image and video signals that need a high dimensional representation, often storage, processing, transmission, and analysis of such signals is a difficult task. It is therefore desirable to obtain a low-dimensional representation for such signals, and at the same time correct for corruptions, errors, and outliers, so that the signals could be readily used for later processing. Major recent advances in low-rank modelling in this context were initiated by the work of Cand`es et al. [17] where the authors provided a solution for the long-standing problem of decomposing a matrix into low-rank and sparse components in a Robust Principal Component Analysis (RPCA) framework. However, for computer vision applications RPCA is often too complex, and/or may not yield desirable results. The low-rank component obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks lower dimensional representations are required. The RPCA has the ability to robustly estimate noise and outliers and separate them from the low-rank component, by a sparse part. But, it has no mechanism of providing an insight into the structure of the sparse solution, nor a way to further decompose the sparse part into a random noise and a structured sparse component that would be advantageous in many computer vision tasks. As videos signals are usually captured by a camera that is moving, obtaining a low-rank component by RPCA becomes impossible. In this thesis, novel Approximated RPCA algorithms are presented, targeting different shortcomings of the RPCA. The Approximated RPCA was analysed to identify the most time consuming RPCA solutions, and replace them with simpler yet tractable alternative solutions. The proposed method is able to obtain the exact desired rank for the low-rank component while estimating a global transformation to describe camera-induced motion. Furthermore, it is able to decompose the sparse part into a foreground sparse component, and a random noise part that contains no useful information for computer vision processing. The foreground sparse component is obtained by several novel structured sparsity-inducing norms, that better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for reducing complexity of low-rank estimation have been proposed that achieve significant complexity reduction without sacrificing the visual representation of video and image information. The proposed algorithms are applied to several fundamental computer vision tasks, namely, high efficiency video coding, batch image alignment, inpainting, and recovery, video stabilisation, background modelling and foreground segmentation, robust subspace clustering and motion estimation, face recognition, and ultra high definition image and video super-resolution. The algorithms proposed in this thesis including batch image alignment and recovery, background modelling and foreground segmentation, robust subspace clustering and motion segmentation, and ultra high definition image and video super-resolution achieve either state-of-the-art or comparable results to existing methods

    A framework of face recognition with set of testing images

    Get PDF
    We propose a novel framework to solve the face recognition problem base on set of testing images. Our framework can handle the case that no pose overlap between training set and query set. The main techniques used in this framework are manifold alignment, face normalization and discriminant learning. Experiments on different databases show our system outperforms some state of the art methods

    Occlusion-Aware Multi-View Reconstruction of Articulated Objects for Manipulation

    Get PDF
    The goal of this research is to develop algorithms using multiple views to automatically recover complete 3D models of articulated objects in unstructured environments and thereby enable a robotic system to facilitate further manipulation of those objects. First, an algorithm called Procrustes-Lo-RANSAC (PLR) is presented. Structure-from-motion techniques are used to capture 3D point cloud models of an articulated object in two different configurations. Procrustes analysis, combined with a locally optimized RANSAC sampling strategy, facilitates a straightforward geometric approach to recovering the joint axes, as well as classifying them automatically as either revolute or prismatic. The algorithm does not require prior knowledge of the object, nor does it make any assumptions about the planarity of the object or scene. Second, with such a resulting articulated model, a robotic system is then able to manipulate the object either along its joint axes at a specified grasp point in order to exercise its degrees of freedom or move its end effector to a particular position even if the point is not visible in the current view. This is one of the main advantages of the occlusion-aware approach, because the models capture all sides of the object meaning that the robot has knowledge of parts of the object that are not visible in the current view. Experiments with a PUMA 500 robotic arm demonstrate the effectiveness of the approach on a variety of real-world objects containing both revolute and prismatic joints. Third, we improve the proposed approach by using a RGBD sensor (Microsoft Kinect) that yield a depth value for each pixel immediately by the sensor itself rather than requiring correspondence to establish depth. KinectFusion algorithm is applied to produce a single high-quality, geometrically accurate 3D model from which rigid links of the object are segmented and aligned, allowing the joint axes to be estimated using the geometric approach. The improved algorithm does not require artificial markers attached to objects, yields much denser 3D models and reduces the computation time

    Video Motion: Finding Complete Motion Paths for Every Visible Point

    Get PDF
    <p>The problem of understanding motion in video has been an area of intense research in computer vision for decades. The traditional approach is to represent motion using optical flow fields, which describe the two-dimensional instantaneous velocity at every pixel in every frame. We present a new approach to describing motion in video in which each visible world point is associated with a sequence-length video motion path. A video motion path lists the location where a world point would appear if it were visible in every frame of the sequence. Each motion path is coupled with a vector of binary visibility flags for the associated point that identify the frames in which the tracked point is unoccluded.</p><p>We represent paths for all visible points in a particular sequence using a single linear subspace. The key insight we exploit is that, for many sequences, this subspace is low-dimensional, scaling with the complexity of the deformations and the number of independent objects in the scene, rather than the number of frames in the sequence. Restricting all paths to lie within a single motion subspace provides strong regularization that allows us to extend paths through brief occlusions, relying on evidence from the visible frames to hallucinate the unseen locations.</p><p>This thesis presents our mathematical model of video motion. We define a path objective function that optimizes a set of paths given estimates of visible intervals, under the assumption that motion is generally spatially smooth and that the appearance of a tracked point remains constant over time. We estimate visibility based on global properties of all paths, enforcing the physical requirement that at least one tracked point must be visible at every pixel in the video. The model assumes the existence of an appropriate path motion basis; we find a sequence-specific basis through analysis of point tracks from a frame-to-frame tracker. Tracking failures caused by image noise, non-rigid deformations, or occlusions complicate the problem by introducing missing data. We update standard trackers to aggressively reinitialize points lost in earlier frames. Finally, we improve on standard Principal Component Analysis with missing data by introducing a novel compaction step that associates these relocalized points, reducing the amount of missing data that must be overcome. The full system achieves state-of-the-art results, recovering dense, accurate, long-range point correspondences in the face of significant occlusions.</p>Dissertatio

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Proceedings of the 2015 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    This book is a collection of proceedings of the talks given at the 2015 annual joint workshop of Fraunhofer IOSB and the Vision and Fusion Laboratory (IES) by the doctoral students of both institutions. The topics of individual contributions range from computer vision, optical metrology, and world modelling to data fusion and human-machine interaction

    Methods for Structure from Motion

    Get PDF

    QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios

    Get PDF
    The concerns about individuals security have justified the increasing number of surveillance cameras deployed both in private and public spaces. However, contrary to popular belief, these devices are in most cases used solely for recording, instead of feeding intelligent analysis processes capable of extracting information about the observed individuals. Thus, even though video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant details about the subjects that took part in a crime depends on the manual inspection of recordings. As such, the current goal of the research community is the development of automated surveillance systems capable of monitoring and identifying subjects in surveillance scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at designing a visual surveillance system capable of acquiring biometric data at a distance (e.g., face, iris or gait) without requiring human intervention in the process, as well as devising biometric recognition methods robust to the degradation factors resulting from the unconstrained acquisition process. Regarding the first goal, the analysis of the data acquired by typical surveillance systems shows that large acquisition distances significantly decrease the resolution of biometric samples, and thus their discriminability is not sufficient for recognition purposes. In the literature, diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring high-resolution imagery at a distance, particularly when using a master-slave configuration. In the master-slave configuration, the video acquired by a typical surveillance camera is analyzed for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged at high-resolution by the PTZ camera. Several methods have already shown that this configuration can be used for acquiring biometric data at a distance. Nevertheless, these methods failed at providing effective solutions to the typical challenges of this strategy, restraining its use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development of a biometric data acquisition system based on the cooperation of a PTZ camera with a typical surveillance camera. The first proposal is a camera calibration method capable of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ camera. The second proposal is a camera scheduling method for determining - in real-time - the sequence of acquisitions that maximizes the number of different targets obtained, while minimizing the cumulative transition time. In order to achieve the first goal of this thesis, both methods were combined with state-of-the-art approaches of the human monitoring field to develop a fully automated surveillance capable of acquiring biometric data at a distance and without human cooperation, designated as QUIS-CAMPI system. The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis of the performance of the state-of-the-art biometric recognition approaches shows that these approaches attain almost ideal recognition rates in unconstrained data. However, this performance is incongruous with the recognition rates observed in surveillance scenarios. Taking into account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a distance ranging from 5 to 40 meters and without human intervention in the acquisition process. This set allows to objectively assess the performance of state-of-the-art biometric recognition methods in data that truly encompass the covariates of surveillance scenarios. As such, this set was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained by the nine methods specially designed for this competition. In addition, the data acquired by the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the development of methods robust to the covariates of surveillance scenarios. The first proposal regards a method for detecting corrupted features in biometric signatures inferred by a redundancy analysis algorithm. The second proposal is a caricature-based face recognition approach capable of enhancing the recognition performance by automatically generating a caricature from a 2D photo. The experimental evaluation of these methods shows that both approaches contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivíduos tem justificado o crescimento do número de câmaras de vídeo-vigilância instaladas tanto em espaços privados como públicos. Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente capaz de inferir em tempo real informações sobre os indivíduos observados. Assim, apesar de a vídeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda confinado à disponibilização de vídeos que têm que ser manualmente inspecionados para extrair informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal desafio da comunidade científica é o desenvolvimento de sistemas automatizados capazes de monitorizar e identificar indivíduos em ambientes de vídeo-vigilância. Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento biométrico aos ambientes de vídeo-vigilância. De forma mais especifica, pretende-se 1) conceber um sistema de vídeo-vigilância que consiga adquirir dados biométricos a longas distâncias (e.g., imagens da cara, íris, ou vídeos do tipo de passo) sem requerer a cooperação dos indivíduos no processo; e 2) desenvolver métodos de reconhecimento biométrico robustos aos fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas. No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas típicos de vídeo-vigilância mostra que, devido à distância de captura, os traços biométricos amostrados não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis. Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave. Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse (e.g. carros, pessoas) a partir do vídeo adquirido por uma câmara de vídeo-vigilância e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados com esta estratégia, impedindo assim o seu uso em ambientes de vídeo-vigilância. Deste modo, esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de vídeo-vigilância usando uma câmara PTZ assistida por uma câmara típica de vídeo-vigilância. O primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara master para o ângulo da câmara PTZ (slave) sem o auxílio de outros dispositivos óticos. O segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações que maximiza o número de diferentes sujeitos observados e simultaneamente minimiza o tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os dois métodos propostos foram combinados com os avanços alcançados na área da monitorização de humanos para assim desenvolver o primeiro sistema de vídeo-vigilância completamente automatizado e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação dos indivíduos no processo, designado por sistema QUIS-CAMPI. O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho não é corroborado pelos resultados observados em ambientes de vídeo-vigilância, o que sugere que os conjuntos de dados atuais não contêm verdadeiramente os fatores de degradação típicos dos ambientes de vídeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biométricos atuais, esta tese introduz um novo conjunto de dados biométricos (imagens da face e vídeos do tipo de passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho dos métodos do estado-da-arte no reconhecimento de indivíduos em imagens/vídeos capturados num ambiente real de vídeo-vigilância. Como tal, este conjunto foi utilizado para promover a primeira competição de reconhecimento biométrico em ambientes não controlados. Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9 métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para aumentar a robustez aos fatores de degradação observados em ambientes de vídeo-vigilância. O primeiro é um método para detetar características corruptas em assinaturas biométricas através da análise da redundância entre subconjuntos de características. O segundo é um método de reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma única foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir as taxas de erro em dados adquiridos de forma não controlada
    corecore