145 research outputs found

    Background foreground segmentation with RGB-D Kinect data: An efficient combination of classifiers

    Get PDF
    Low cost RGB-D cameras such as the Microsoft’s Kinect or the Asus’s Xtion Pro are completely changing the computer vision world, as they are being successfully used in several applications and research areas. Depth data are particularly attractive and suitable for applications based on moving objects detection through foreground/background segmentation approaches; the RGB-D applications proposed in literature employ, in general, state of the art foreground/background segmentation techniques based on the depth information without taking into account the color information. The novel approach that we propose is based on a combination of classifiers that allows improving background subtraction accuracy with respect to state of the art algorithms by jointly considering color and depth data. In particular, the combination of classifiers is based on a weighted average that allows to adaptively modifying the support of each classifier in the ensemble by considering foreground detections in the previous frames and the depth and color edges. In this way, it is possible to reduce false detections due to critical issues that can not be tackled by the individual classifiers such as: shadows and illumination changes, color and depth camouflage, moved background objects and noisy depth measurements. Moreover, we propose, for the best of the author’s knowledge, the first publicly available RGB-D benchmark dataset with hand-labeled ground truth of several challenging scenarios to test background/foreground segmentation algorithms

    Multi-sensor background subtraction by fusing multiple region-based probabilistic classifiers

    Get PDF
    In the recent years, the computer vision community has shown great interest on depth-based applications thanks to the performance and flexibility of the new generation of RGB-D imagery. In this paper, we present an efficient background subtraction algorithm based on the fusion of multiple region-based classifiers that processes depth and color data provided by RGB-D cameras. Foreground objects are detected by combining a region-based foreground prediction (based on depth data) with different background models (based on a Mixture of Gaussian algorithm) providing color and depth descriptions of the scene at pixel and region level. The information given by these modules is fused in a mixture of experts fashion to improve the foreground detection accuracy. The main contributions of the paper are the region-based models of both background and foreground, built from the depth and color data. The obtained results using different database sequences demonstrate that the proposed approach leads to a higher detection accuracy with respect to existing state-of-the-art techniques

    Selective eigenbackgrounds method for background subtraction in crowed scenes

    Full text link

    Moving Object Detection using Lab2000HL Color Space with Spatial and Temporal Smoothing

    Full text link

    VISUAL TRACKING AND ILLUMINATION RECOVERY VIA SPARSE REPRESENTATION

    Get PDF
    Compressive sensing, or sparse representation, has played a fundamental role in many fields of science. It shows that the signals and images can be reconstructed from far fewer measurements than what is usually considered to be necessary. Sparsity leads to efficient estimation, efficient compression, dimensionality reduction, and efficient modeling. Recently, there has been a growing interest in compressive sensing in computer vision and it has been successfully applied to face recognition, background subtraction, object tracking and other problems. Sparsity can be achieved by solving the compressive sensing problem using L1 minimization. In this dissertation, we present the results of a study of applying sparse representation to illumination recovery, object tracking, and simultaneous tracking and recognition. Illumination recovery, also known as inverse lighting, is the problem of recovering an illumination distribution in a scene from the appearance of objects located in the scene. It is used for Augmented Reality, where the virtual objects match the existing image and cast convincing shadows on the real scene rendered with the recovered illumination. Shadows in a scene are caused by the occlusion of incoming light, and thus contain information about the lighting of the scene. Although shadows have been used in determining the 3D shape of the object that casts shadows onto the scene, few studies have focused on the illumination information provided by the shadows. In this dissertation, we recover the illumination of a scene from a single image with cast shadows given the geometry of the scene. The images with cast shadows can be quite complex and therefore cannot be well approximated by low-dimensional linear subspaces. However, in this study we show that the set of images produced by a Lambertian scene with cast shadows can be efficiently represented by a sparse set of images generated by directional light sources. We first model an image with cast shadows as composed of a diffusive part (without cast shadows) and a residual part that captures cast shadows. Then, we express the problem in an L1-regularized least squares formulation, with nonnegativity constraints (as light has to be nonnegative at any point in space). This sparse representation enjoys an effective and fast solution, thanks to recent advances in compressive sensing. In experiments on both synthetic and real data, our approach performs favorably in comparison to several previously proposed methods. Visual tracking, which consistently infers the motion of a desired target in a video sequence, has been an active and fruitful research topic in computer vision for decades. It has many practical applications such as surveillance, human computer interaction, medical imaging and so on. Many challenges to design a robust tracking algorithm come from the enormous unpredictable variations in the target, such as deformations, fast motion, occlusions, background clutter, and lighting changes. To tackle the challenges posed by tracking, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the tracking target at a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates. The sparsity is achieved by solving an L1-regularized least squares problem. Then the candidate with the smallest projection error is taken as the tracking target. After that, tracking is continued using a Bayesian state inference framework in which a particle filter is used for propagating sample distributions over time. Three additional components further improve the robustness of our approach: 1) a velocity incorporated motion model that helps concentrate the samples on the true target location in the next frame, 2) the nonnegativity constraints that help filter out clutter that is similar to tracked targets in reversed intensity patterns, and 3) a dynamic template update scheme that keeps track of the most representative templates throughout the tracking procedure. We test the proposed approach on many challenging sequences involving heavy occlusions, drastic illumination changes, large scale changes, non-rigid object movement, out-of-plane rotation, and large pose variations. The proposed approach shows excellent performance in comparison with four previously proposed trackers. We also extend the work to simultaneous tracking and recognition in vehicle classification in IR video sequences. We attempt to resolve the uncertainties in tracking and recognition at the same time by introducing a static template set that stores target images in various conditions such as different poses, lighting, and so on. The recognition results at each frame are propagated to produce the final result for the whole video. The tracking result is evaluated at each frame and low confidence in tracking performance initiates a new cycle of tracking and classification. We demonstrate the robustness of the proposed method on vehicle tracking and classification using outdoor IR video sequences

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

    Consensus ou fusion de segmentation pour quelques applications de détection ou de classification en imagerie

    Full text link
    Récemment, des vraies mesures de distances, au sens d’un certain critère (et possédant de bonnes propriétés asymptotiques) ont été introduites entre des résultats de partitionnement (clustering) de donnés, quelquefois indexées spatialement comme le sont les images segmentées. À partir de ces métriques, le principe de segmentation moyenne (ou consensus) a été proposée en traitement d’images, comme étant la solution d’un problème d’optimisation et une façon simple et efficace d’améliorer le résultat final de segmentation ou de classification obtenues en moyennant (ou fusionnant) différentes segmentations de la même scène estimée grossièrement à partir de plusieurs algorithmes de segmentation simples (ou identiques mais utilisant différents paramètres internes). Ce principe qui peut se concevoir comme un débruitage de données d’abstraction élevée, s’est avéré récemment une alternative efficace et très parallélisable, comparativement aux méthodes utilisant des modèles de segmentation toujours plus complexes et plus coûteux en temps de calcul. Le principe de distance entre segmentations et de moyennage ou fusion de segmentations peut être exploité, directement ou facilement adapté, par tous les algorithmes ou les méthodes utilisées en imagerie numérique où les données peuvent en fait se substituer à des images segmentées. Cette thèse a pour but de démontrer cette assertion et de présenter différentes applications originales dans des domaines comme la visualisation et l’indexation dans les grandes bases d’images au sens du contenu segmenté de chaque image, et non plus au sens habituel de la couleur et de la texture, le traitement d’images pour améliorer sensiblement et facilement la performance des méthodes de détection du mouvement dans une séquence d’images ou finalement en analyse et classification d’images médicales avec une application permettant la détection automatique et la quantification de la maladie d’Alzheimer à partir d’images par résonance magnétique du cerveau.Recently, some true metrics in a criterion sense (with good asymptotic properties) were introduced between data partitions (or clusterings) even for data spatially ordered such as image segmentations. From these metrics, the notion of average clustering (or consensus segmentation) was then proposed in image processing as the solution of an optimization problem and a simple and effective way to improve the final result of segmentation or classification obtained by averaging (or fusing) different segmentations of the same scene which are roughly estimated from several simple segmentation models (or obtained with the same model but with different internal parameters). This principle, which can be conceived as a denoising of high abstraction data, has recently proved to be an effective and very parallelizable alternative, compared to methods using ever more complex and time-consuming segmentation models. The principle of distance between segmentations, and averaging of segmentations, in a criterion sense, can be exploited, directly or easily adapted, by all the algorithms or methods used in digital imaging where data can in fact be substituted to segmented images. This thesis proposal aims at demonstrating this assertion and to present different original applications in various fields in digital imagery such as the visualization and the indexation in the image databases, in the sense of the segmented contents of each image, and no longer in the common color and texture sense, or in image processing in order to sensibly and easily improve the detection of movement in the image sequence or finally in analysis and classification in medical imaging with an application allowing the automatic detection and quantification of Alzheimer’s disease

    Optimization Algorithms for Integrating Advanced Facility-Level Healthcare Technologies into Personal Healthcare Devices

    Get PDF
    Healthcare is one of the most important services to preserve the quality of our daily lives, and it is capable of dealing with issues such as global aging, increase in the healthcare cost, and changes to the medical paradigm, i.e., from the in-facility cure to the prevention and cure outside the facility. Accordingly, there has been growing interest in the smart and personalized healthcare systems to diagnose and care themselves. Such systems are capable of providing facility-level diagnosis services by using smart devices (e.g., smartphones, smart watches, and smart glasses). However, in realizing the smart healthcare systems, it is very difficult, albeit impossible, to directly integrate high-precision healthcare technologies or scientific theories into the smart devices due to the stringent limitations in the computing power and battery lifetime, as well as environmental constraints. In this dissertation, we propose three optimization methods in the field of cell counting systems and gait-aid systems for Parkinson's disease patients that address the problems that arise when integrating a specialized healthcare system used in the facilities into mobile or wearable devices. First, we present an optimized cell counting algorithm based on heuristic optimization, which is a key building block for realizing the mobile point-of-care platforms. Second, we develop a learning-based cell counting algorithm that guarantees high performance and efficiency despite the existence of blurry cells due to out-focus and varying brightness of background caused by the limitation of lenses free in-line holographic apparatus. Finally, we propose smart gait-aid glasses for Parkinson’s disease patients based on mathematical optimization. ⓒ 2017 DGISTopenI. Introduction 1-- 1.1 Global Healthcare Trends 1-- 1.2 Smart Healthcare System 2-- 1.3 Benefits of Smart Healthcare System 3-- 1.4 Challenges of Smart Healthcare. 4-- 1.5 Optimization 6-- 1.6 Aims of the Dissertation 7-- 1.7 Dissertation Organization 8-- II.Optimization of a cell counting algorithm for mobile point-of-care testing platforms 9-- 2.1 Introduction 9-- 2.2 Materials and Methods. 13-- 2.2.1 Experimental Setup. 13-- 2.2.2 Overview of Cell Counting. 16-- 2.2.3 Cell Library Optimization. 18-- 2.2.4 NCC Approximation. 20-- 2.3 Results 21-- 2.3.1 Cell Library Optimization. 21-- 2.3.2 NCC Approximation. 23-- 2.3.3 Measurement Using an Android Device. 28-- 2.4 Summary 32-- III.Human-level Blood Cell Counting System using NCC-Deep learning algorithm on Lens-free Shadow Image. 33-- 3.1 Introduction 33-- 3.2 Cell Counting Architecture 36-- 3.3 Methods 37-- 3.3.1 Candidate Point Selection based on NCC. 37-- 3.3.2 Reliable Cell Counting using CNN. 40-- 3.4 Results 43-- 3.4.1 Subjects . 43-- 3.4.2 Evaluation for the cropped cell image 44-- 3.4.3 Evaluation on the blood sample image 46-- 3.4.4 Elapsed-time evaluation 50-- 3.5 Summary 50-- IV.Smart Gait-Aid Glasses for Parkinson’s Disease Patients 52-- 4.1 Introduction 52-- 4.2 Related Works 54-- 4.2.1 Existing FOG Detection Methods 54-- 4.2.2 Existing Gait-Aid Systems 56-- 4.3 Methods 57-- 4.3.1 Movement Recognition. 59-- 4.3.2 FOG Detection On Glasses. 62-- 4.3.3 Generation of Visual Patterns 66-- 4.4 Experiments . 67-- 4.5 Results 69-- 4.5.1 FOG Detection Performance. 69-- 4.5.2 Gait-Aid Performance. 71-- 4.6 Summary 73-- V. Conclusion 75-- Reference 77-- 요약문 89본 논문은 의료 관련 연구시설 및 병원 그리고 실험실 레벨에서 사용되는 전문적인 헬스케어 시스템을 개인의 일상생활 속에서 사용할 수 있는 스마트 헬스케어 시스템에 적용시키기 위한 최적화 문제에 대해 다룬다. 현대 사회에서 의료비용 증가 세계적인 고령화에 따라 의료 패러다임은 질병이 발생한 뒤 시설 내에서 치료 받는 방식에서 질병이나 건강관리에 관심있는 환자 혹은 일반인이 휴대할 수 있는 개인용 디바이스를 이용하여 의료 서비스에 접근하고, 이를 이용하여 질병을 미리 예방하는 방식으로 바뀌었다. 이에 따라 언제, 어디서나 스마트 디바이스(스마트폰, 스마트워치, 스마트안경 등)를 이용하여 병원 수준의 예방 및 진단을 실현하는 스마트 헬스케어가 주목 받고 있다. 하지만, 스마트 헬스케어 서비스 실현을 위하여 기존의 전문 헬스케어 장치 및 과학적 이론을 스마트 디바이스에 접목하는 데에는 스마트 디바이스의 제한적인 컴퓨팅 파워와 배터리, 그리고 연구소나 실험실에서 발생하지 않았던 환경적인 제약조건으로 인해 적용 할 수 없는 문제가 있다. 따라서 사용 환경에 맞춰 동작 가능하도록 최적화가 필요하다. 본 논문에서는 Cell counting 분야와 파킨슨 환자의 보행 보조 분야에서 전문 헬스케어 시스템을 스마트 헬스케어에 접목시키는데 발생하는 세 가지 문제를 제시하고 문제 해결을 위한 세 가지 최적화 알고리즘(Heuristic optimization, Learning-based optimization, Mathematical optimization) 및 이를 기반으로 하는 시스템을 제안한다.DoctordCollectio

    Minimising Human Annotation for Scalable Person Re-Identification

    Get PDF
    PhDAmong the diverse tasks performed by an intelligent distributed multi-camera surveillance system, person re-identification (re-id) is one of the most essential. Re-id refers to associating an individual or a group of people across non-overlapping cameras at different times and locations, and forms the foundation of a variety of applications ranging from security and forensic search to quotidian retail and health care. Though attracted rapidly increasing academic interests over the past decade, it still remains a non-trivial and unsolved problem for launching a practical reid system in real-world environments, due to the ambiguous and noisy feature of surveillance data and the potentially dramatic visual appearance changes caused by uncontrolled variations in human poses and divergent viewing conditions across distributed camera views. To mitigate such visual ambiguity and appearance variations, most existing re-id approaches rely on constructing fully supervised machine learning models with extensively labelled training datasets which is unscalable for practical applications in the real-world. Particularly, human annotators must exhaustively search over a vast quantity of offline collected data, manually label cross-view matched images of a large population between every possible camera pair. Nonetheless, having the prohibitively expensive human efforts dissipated, a trained re-id model is often not easily generalisable and transferable, due to the elastic and dynamic operating conditions of a surveillance system. With such motivations, this thesis proposes several scalable re-id approaches with significantly reduced human supervision, readily applied to practical applications. More specifically, this thesis has developed and investigated four new approaches for reducing human labelling effort in real-world re-id as follows: Chapter 3 The first approach is affinity mining from unlabelled data. Different from most existing supervised approaches, this work aims to model the discriminative information for reid without exploiting human annotations, but from the vast amount of unlabelled person image data, thus applicable to both semi-supervised and unsupervised re-id. It is non-trivial since the human annotated identity matching correspondence is often the key to discriminative re-id modelling. In this chapter, an alternative strategy is explored by specifically mining two types of affinity relationships among unlabelled data: (1) inter-view data affinity and (2) intra-view data affinity. In particular, with such affinity information encoded as constraints, a Regularised Kernel Subspace Learning model is developed to explicitly reduce inter-view appearance variations and meanwhile enhance intra-view appearance disparity for more discriminative re-id matching. Consequently, annotation costs can be immensely alleviated and a scalable re-id model is readily to be leveraged to plenty of unlabelled data which is inexpensive to collect. Chapter 4 The second approach is saliency discovery from unlabelled data. This chapter continues to investigate the problem of what can be learned in unlabelled images without identity labels annotated by human. Other than affinity mining as proposed by Chapter 3, a different solution is proposed. That is, to discover localised visual appearance saliency of person appearances. Intuitively, salient and atypical appearances of human are able to uniquely and representatively describe and identify an individual, whilst also often robust to view changes and detection variances. Motivated by this, an unsupervised Generative Topic Saliency model is proposed to jointly perform foreground extraction, saliency detection, as well as discriminative re-id matching. This approach completely avoids the exhaustive annotation effort for model training, and thus better scales to real-world applications. Moreover, its automatically discovered re-id saliency representations are shown to be semantically interpretable, suitable for generating useful visual analysis for deployable user-oriented software tools. Chapter 5 The third approach is incremental learning from actively labelled data. Since learning from unlabelled data alone yields less discriminative matching results, and in some cases there will be limited human labelling resources available for re-id modelling, this chapter thus investigate the problem of how to maximise a model’s discriminative capability with minimised labelling efforts. The challenges are to (1) automatically select the most representative data from a vast number of noisy/ambiguous unlabelled data in order to maximise model discrimination capacity; and (2) incrementally update the model parameters to accelerate machine responses and reduce human waiting time. To that end, this thesis proposes a regression based re-id model, characterised by its very fast and efficient incremental model updates. Furthermore, an effective active data sampling algorithm with three novel joint exploration-exploitation criteria is designed, to make automatic data selection feasible with notably reduced human labelling costs. Such an approach ensures annotations to be spent only on very few data samples which are most critical to model’s generalisation capability, instead of being exhausted by blindly labelling many noisy and redundant training samples. Chapter 6 The last technical area of this thesis is human-in-the-loop learning from relevance feedback. Whilst former chapters mainly investigate techniques to reduce human supervision for model training, this chapter motivates a novel research area to further minimise human efforts spent in the re-id deployment stage. In real-world applications where camera network and potential gallery size increases dramatically, even the state-of-the-art re-id models generate much inferior re-id performances and human involvements at deployment stage is inevitable. To minimise such human efforts and maximise re-id performance, this thesis explores an alternative approach to re-id by formulating a hybrid human-computer learning paradigm with humans in the model matching loop. Specifically, a Human Verification Incremental Learning model is formulated which does not require any pre-labelled training data, therefore scalable to new camera pairs; Moreover, the proposed model learns cumulatively from human feedback to provide an instant improvement to re-id ranking of each probe on-the-fly, thus scalable to large gallery sizes. It has been demonstrated that the proposed re-id model achieves significantly superior re-id results whilst only consumes much less human supervision effort. For facilitating a holistic understanding about this thesis, the main studies are summarised and framed into a graphical abstract as shown in Figur
    corecore