30 research outputs found

    Video-based face recognition using multiple face orientations

    Get PDF
    This work is focused on designing and implementing a real-time video-based face identification system with low memory and computational requirements and high recognition rates. Since pro le features are stronger and, therefore, better when characterising faces than frontal faces, the system will detect and identify not only pure frontal but also pro le faces. This property of pro le faces will help to improve face recognition rates depending on the strategy for fusion of results used. Also, dimensionality reduction techniques will be studied and tested in order to nd the fastest and most efective one. Modi cation in k Nearest Neighbor classi er will be carried out to add a penalisation factor in function of the distance, increasing classi cation results and strictness. In order to nd which are the best options for reducing computational requirements in a face identi cation system several simulations will be performed. Among many others, simulations will look for optimal values of the k parameter in k Nearest Neighbor, the number of transformed coe cients kept in a feature vector or the minimum size of face images and will test dimensionality reduction in images, variation of the number of models or fusion of results. Finally, this work will show how a real-time system can be implemented in an ordinary computer obtaining successful results whether it be in real-time, adverse or controlled conditions environments

    SYMMETRY IN HUMAN MOTION ANALYSIS: THEORY AND EXPERIMENTS

    Get PDF
    Video based human motion analysis has been actively studied over the past decades. We propose novel approaches that are able to analyze human motion under such challenges and apply them to surveillance and security applications. Part I analyses the cyclic property of human motion and presents algorithms to classify humans in videos by their gait patterns. Two approaches are proposed. The first employs the omputationally efficient periodogram, to characterize periodicity. In order to integrate shape and motion, we convert the cyclic pattern into a binary sequence using the angle between two legs when the toe-to-toe distance is maximized during walking. Part II further extends the previous approaches to analyze the symmetry in articulation within a stride. A feature that has been shown in our work to be a particularly strong indicator of the presence of pedestrians is the X-junction generated by bipedal swing of body limbs. The proposed algorithm extracts the patterns in spatio-temporal surfaces. In Part III, we present a compact characterization of human gait and activities. Our approach is based on decomposing an image sequence into x-t slices, which generate twisted patterns defined as the Double Helical Signature (DHS). It is shown that the patterns sufficiently characterize human gait and a class of activities. The features of DHS are: (1) it naturally codes appearance and kinematic parameters of human motion; (2) it reveals an inherent geometric symmetry (Frieze Group); and (3) it is effective and efficient for recovering gait and activity parameters. Finally, we use the DHS to classify activities such as carrying a backpack, briefcase etc. The advantage of using DHS is that we only need a small portion of 3D data to recognize various symmetries

    Advanced machine learning approaches for target detection, tracking and recognition

    Get PDF
    This dissertation addresses the key technical components of an Automatic Target Recognition (ATR) system namely: target detection, tracking, learning and recognition. Novel solutions are proposed for each component of the ATR system based on several new advances in the field of computer vision and machine learning. Firstly, we introduce a simple and elegant feature, RelCom, and a boosted feature selection method to achieve a very low computational complexity target detector. Secondly, we present a particle filter based target tracking algorithm that uses a quad histogram based appearance model along with online feature selection. Further, we improve the tracking performance by means of online appearance learning where appearance learning is cast as an Adaptive Kalman filtering (AKF) problem which we formulate using both covariance matching and, for the first time in a visual tracking application, the recent autocovariance least-squares (ALS) method. Then, we introduce an integrated tracking and recognition system that uses two generative models to accommodate the pose variations and maneuverability of different ground targets. Specifically, a tensor-based generative model is used for multi-view target representation that can synthesize unseen poses, and can be trained from a small set of signatures. In addition, a target-dependent kinematic model is invoked to characterize the target dynamics. Both generative models are integrated in a graphical framework for joint estimation of the target's kinematics, pose, and discrete valued identity. Finally, for target recognition we advocate the concept of a continuous identity manifold that captures both inter-class and intra-class shape variability among training targets. A hemispherical view manifold is used for modeling the view-dependent appearance. In addition to being able to deal with arbitrary view variations, this model can determine the target identity at both class and sub-class levels, for targets not present in the training data. The proposed components of the ATR system enable us to perform low computational complexity target detection with low false alarm rates, robust tracking of targets under challenging circumstances and recognition of target identities at both class and sub-class levels. Experiments on real and simulated data confirm the performance of the proposed components with promising results

    Biometric Systems

    Get PDF
    Because of the accelerating progress in biometrics research and the latest nation-state threats to security, this book's publication is not only timely but also much needed. This volume contains seventeen peer-reviewed chapters reporting the state of the art in biometrics research: security issues, signature verification, fingerprint identification, wrist vascular biometrics, ear detection, face detection and identification (including a new survey of face recognition), person re-identification, electrocardiogram (ECT) recognition, and several multi-modal systems. This book will be a valuable resource for graduate students, engineers, and researchers interested in understanding and investigating this important field of study

    Ubiquitous Technologies for Emotion Recognition

    Get PDF
    Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions

    Real-time object detection using monocular vision for low-cost automotive sensing systems

    Get PDF
    This work addresses the problem of real-time object detection in automotive environments using monocular vision. The focus is on real-time feature detection, tracking, depth estimation using monocular vision and finally, object detection by fusing visual saliency and depth information. Firstly, a novel feature detection approach is proposed for extracting stable and dense features even in images with very low signal-to-noise ratio. This methodology is based on image gradients, which are redefined to take account of noise as part of their mathematical model. Each gradient is based on a vector connecting a negative to a positive intensity centroid, where both centroids are symmetric about the centre of the area for which the gradient is calculated. Multiple gradient vectors define a feature with its strength being proportional to the underlying gradient vector magnitude. The evaluation of the Dense Gradient Features (DeGraF) shows superior performance over other contemporary detectors in terms of keypoint density, tracking accuracy, illumination invariance, rotation invariance, noise resistance and detection time. The DeGraF features form the basis for two new approaches that perform dense 3D reconstruction from a single vehicle-mounted camera. The first approach tracks DeGraF features in real-time while performing image stabilisation with minimal computational cost. This means that despite camera vibration the algorithm can accurately predict the real-world coordinates of each image pixel in real-time by comparing each motion-vector to the ego-motion vector of the vehicle. The performance of this approach has been compared to different 3D reconstruction methods in order to determine their accuracy, depth-map density, noise-resistance and computational complexity. The second approach proposes the use of local frequency analysis of i ii gradient features for estimating relative depth. This novel method is based on the fact that DeGraF gradients can accurately measure local image variance with subpixel accuracy. It is shown that the local frequency by which the centroid oscillates around the gradient window centre is proportional to the depth of each gradient centroid in the real world. The lower computational complexity of this methodology comes at the expense of depth map accuracy as the camera velocity increases, but it is at least five times faster than the other evaluated approaches. This work also proposes a novel technique for deriving visual saliency maps by using Division of Gaussians (DIVoG). In this context, saliency maps express the difference of each image pixel is to its surrounding pixels across multiple pyramid levels. This approach is shown to be both fast and accurate when evaluated against other state-of-the-art approaches. Subsequently, the saliency information is combined with depth information to identify salient regions close to the host vehicle. The fused map allows faster detection of high-risk areas where obstacles are likely to exist. As a result, existing object detection algorithms, such as the Histogram of Oriented Gradients (HOG) can execute at least five times faster. In conclusion, through a step-wise approach computationally-expensive algorithms have been optimised or replaced by novel methodologies to produce a fast object detection system that is aligned to the requirements of the automotive domain

    From pixels to people : recovering location, shape and pose of humans in images

    Get PDF
    Humans are at the centre of a significant amount of research in computer vision. Endowing machines with the ability to perceive people from visual data is an immense scientific challenge with a high degree of direct practical relevance. Success in automatic perception can be measured at different levels of abstraction, and this will depend on which intelligent behaviour we are trying to replicate: the ability to localise persons in an image or in the environment, understanding how persons are moving at the skeleton and at the surface level, interpreting their interactions with the environment including with other people, and perhaps even anticipating future actions. In this thesis we tackle different sub-problems of the broad research area referred to as "looking at people", aiming to perceive humans in images at different levels of granularity. We start with bounding box-level pedestrian detection: We present a retrospective analysis of methods published in the decade preceding our work, identifying various strands of research that have advanced the state of the art. With quantitative exper- iments, we demonstrate the critical role of developing better feature representations and having the right training distribution. We then contribute two methods based on the insights derived from our analysis: one that combines the strongest aspects of past detectors and another that focuses purely on learning representations. The latter method outperforms more complicated approaches, especially those based on hand- crafted features. We conclude our work on pedestrian detection with a forward-looking analysis that maps out potential avenues for future research. We then turn to pixel-level methods: Perceiving humans requires us to both separate them precisely from the background and identify their surroundings. To this end, we introduce Cityscapes, a large-scale dataset for street scene understanding. This has since established itself as a go-to benchmark for segmentation and detection. We additionally develop methods that relax the requirement for expensive pixel-level annotations, focusing on the task of boundary detection, i.e. identifying the outlines of relevant objects and surfaces. Next, we make the jump from pixels to 3D surfaces, from localising and labelling to fine-grained spatial understanding. We contribute a method for recovering 3D human shape and pose, which marries the advantages of learning-based and model- based approaches. We conclude the thesis with a detailed discussion of benchmarking practices in computer vision. Among other things, we argue that the design of future datasets should be driven by the general goal of combinatorial robustness besides task-specific considerations.Der Mensch steht im Zentrum vieler Forschungsanstrengungen im Bereich des maschinellen Sehens. Es ist eine immense wissenschaftliche Herausforderung mit hohem unmittelbarem Praxisbezug, Maschinen mit der FĂ€higkeit auszustatten, Menschen auf der Grundlage von visuellen Daten wahrzunehmen. Die automatische Wahrnehmung kann auf verschiedenen Abstraktionsebenen erfolgen. Dies hĂ€ngt davon ab, welches intelligente Verhalten wir nachbilden wollen: die FĂ€higkeit, Personen auf der BildflĂ€che oder im 3D-Raum zu lokalisieren, die Bewegungen von Körperteilen und KörperoberflĂ€chen zu erfassen, Interaktionen einer Person mit ihrer Umgebung einschließlich mit anderen Menschen zu deuten, und vielleicht sogar zukĂŒnftige Handlungen zu antizipieren. In dieser Arbeit beschĂ€ftigen wir uns mit verschiedenen Teilproblemen die dem breiten Forschungsgebiet "Betrachten von Menschen" gehören. Beginnend mit der FußgĂ€ngererkennung prĂ€sentieren wir eine Analyse von Methoden, die im Jahrzehnt vor unserem Ausgangspunkt veröffentlicht wurden, und identifizieren dabei verschiedene ForschungsstrĂ€nge, die den Stand der Technik vorangetrieben haben. Unsere quantitativen Experimente zeigen die entscheidende Rolle sowohl der Entwicklung besserer Bildmerkmale als auch der Trainingsdatenverteilung. Anschließend tragen wir zwei Methoden bei, die auf den Erkenntnissen unserer Analyse basieren: eine Methode, die die stĂ€rksten Aspekte vergangener Detektoren kombiniert, eine andere, die sich im Wesentlichen auf das Lernen von Bildmerkmalen konzentriert. Letztere ĂŒbertrifft kompliziertere Methoden, insbesondere solche, die auf handgefertigten Bildmerkmalen basieren. Wir schließen unsere Arbeit zur FußgĂ€ngererkennung mit einer vorausschauenden Analyse ab, die mögliche Wege fĂŒr die zukĂŒnftige Forschung aufzeigt. Anschließend wenden wir uns Methoden zu, die Entscheidungen auf Pixelebene betreffen. Um Menschen wahrzunehmen, mĂŒssen wir diese sowohl praezise vom Hintergrund trennen als auch ihre Umgebung verstehen. Zu diesem Zweck fĂŒhren wir Cityscapes ein, einen umfangreichen Datensatz zum VerstĂ€ndnis von Straßenszenen. Dieser hat sich seitdem als Standardbenchmark fĂŒr Segmentierung und Erkennung etabliert. DarĂŒber hinaus entwickeln wir Methoden, die die Notwendigkeit teurer Annotationen auf Pixelebene reduzieren. Wir konzentrieren uns hierbei auf die Aufgabe der Umgrenzungserkennung, d. h. das Erkennen der Umrisse relevanter Objekte und OberflĂ€chen. Als nĂ€chstes machen wir den Sprung von Pixeln zu 3D-OberflĂ€chen, vom Lokalisieren und Beschriften zum prĂ€zisen rĂ€umlichen VerstĂ€ndnis. Wir tragen eine Methode zur SchĂ€tzung der 3D-KörperoberflĂ€che sowie der 3D-Körperpose bei, die die Vorteile von lernbasierten und modellbasierten AnsĂ€tzen vereint. Wir schließen die Arbeit mit einer ausfĂŒhrlichen Diskussion von Evaluationspraktiken im maschinellen Sehen ab. Unter anderem argumentieren wir, dass der Entwurf zukĂŒnftiger DatensĂ€tze neben aufgabenspezifischen Überlegungen vom allgemeinen Ziel der kombinatorischen Robustheit bestimmt werden sollte

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Deep learning of brain asymmetry digital biomarkers to support early diagnosis of cognitive decline and dementia

    Get PDF
    Early identification of degenerative processes in the human brain is essential for proper care and treatment. This may involve different instrumental diagnostic methods, including the most popular computer tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography (PET) scans. These technologies provide detailed information about the shape, size, and function of the human brain. Structural and functional cerebral changes can be detected by computational algorithms and used to diagnose dementia and its stages (amnestic early mild cognitive impairment - EMCI, Alzheimer’s Disease - AD). They can help monitor the progress of the disease. Transformation shifts in the degree of asymmetry between the left and right hemispheres illustrate the initialization or development of a pathological process in the brain. In this vein, this study proposes a new digital biomarker for the diagnosis of early dementia based on the detection of image asymmetries and crosssectional comparison of NC (normal cognitively), EMCI and AD subjects. Features of brain asymmetries extracted from MRI of the ADNI and OASIS databases are used to analyze structural brain changes and machine learning classification of the pathology. The experimental part of the study includes results of supervised machine learning algorithms and transfer learning architectures of convolutional neural networks for distinguishing between cognitively normal subjects and patients with early or progressive dementia. The proposed pipeline offers a low-cost imaging biomarker for the classification of dementia. It can be potentially helpful to other brain degenerative disorders accompanied by changes in brain asymmetries
    corecore