40 research outputs found

    The effectiveness of face detection algorithms in unconstrained crowd scenes

    Full text link
    The 2013 Boston Marathon bombing represents a case where automatic facial biometrics tools could have proven invaluable to law enforcement officials, yet the lack of ro-bustness of current tools in unstructured environments lim-ited their utility. In this work, we focus on complications that confound face detection algorithms. We first present a simple multi-pose generalization of the Viola-Jones al-gorithm. Our results on the Face Detection Data set and Benchmark (FDDB) show that it makes a significant im-provement over the state of the art for published algorithms. Conversely, our experiments demonstrate that the improve-ments attained by accommodating multiple poses can be negligible compared to the gains yielded by normalizing scores and using the most appropriate classifier for uncon-trolled data. We conclude with a qualitative evaluation of the proposed algorithm on publicly available images of the Boston Marathon crowds. Although the results of our evalu-ations are encouraging, they confirm that there is still room for improvement in terms of robustness to out-of-plane ro-tation, blur and occlusion. 1

    Discriminative Appearance Models for Face Alignment

    Get PDF
    The proposed face alignment algorithm uses local gradient features as the appearance representation. These features are obtained by pixel value comparison, which provide robustness against changes in illumination, as well as partial occlusion and local deformation due to the locality. The adopted features are modeled in three discriminative methods, which correspond to different alignment cost functions. The discriminative appearance modeling alleviate the generalization problem to some extent

    Multivariate Boosting with Look-Up Tables for Face Processing

    Get PDF
    This thesis proposes a novel unified boosting framework. We apply this framework to the several face processing tasks, face detection, facial feature localisation, and pose classification, and use the same boosting algorithm and the same pool of features (local binary features). This is in contrast with the standard approaches that make use of a variety of features and models, for example AdaBoost, cascades of boosted classifiers and Active Appearance Models. The unified boosting framework covers multivariate classification and regression problems and it is achieved by interpreting boosting as optimization in the functional space of the weak learners. Thus a wide range of smooth loss functions can be optimized with the same algorithm. There are two general optimization strategies we propose that extend recent works on TaylorBoost and Variational AdaBoost. The first proposition is an empirical expectation formulation that minimizes the average loss and the second is a variational formulation that includes an additional penalty for large variations between predictions. These two boosting formulations are used to train real-time models using local binary features. This is achieved using look-up-tables as weak learners and multi-block Local Binary Patterns as features. The resulting boosting algorithms are simple, efficient and easily scalable with the available resources. Furthermore, we introduce a novel coarse-to-fine feature selection method to handle high resolution models and a bootstrapping algorithm to sample representative training data from very large pools of data. The proposed approach is evaluated for several face processing tasks. These tasks include frontal face detection (binary classification), facial feature localization (multivariate regression) and pose estimation (multivariate classification). Several studies are performed to assess different optimization algorithms, bootstrapping parametrizations and feature sharing methods (for the multivariate case). The results show good performance for all of these tasks. In addition to this, two other contributions are presented. First, we propose a context-based model for removing the false alarms generated by a given generic face detector. Second, we propose a new face detector that predicts the Jaccard distance between the current location and the ground truth. This allows us to formulate the face detection problem as a regression task

    Face Detection and Verification using Local Binary Patterns

    Get PDF
    This thesis proposes a robust Automatic Face Verification (AFV) system using Local Binary Patterns (LBP). AFV is mainly composed of two modules: Face Detection (FD) and Face Verification (FV). The purpose of FD is to determine whether there are any face in an image, while FV involves confirming or denying the identity claimed by a person. The contributions of this thesis are the following: 1) a real-time multiview FD system which is robust to illumination and partial occlusion, 2) a FV system based on the adaptation of LBP features, 3) an extensive study of the performance evaluation of FD algorithms and in particular the effect of FD errors on FV performance. The first part of the thesis addresses the problem of frontal FD. We introduce the system of Viola and Jones which is the first real-time frontal face detector. One of its limitations is the sensitivity to local lighting variations and partial occlusion of the face. In order to cope with these limitations, we propose to use LBP features. Special emphasis is given to the scanning process and to the merging of overlapped detections, because both have a significant impact on the performance. We then extend our frontal FD module to multiview FD. In the second part, we present a novel generative approach for FV, based on an LBP description of the face. The main advantages compared to previous approaches are a very fast and simple training procedure and robustness to bad lighting conditions. In the third part, we address the problem of estimating the quality of FD. We first show the influence of FD errors on the FV task and then empirically demonstrate the limitations of current detection measures when applied to this task. In order to properly evaluate the performance of a face detection module, we propose to embed the FV into the performance measuring process. We show empirically that the proposed methodology better matches the final FV performance

    Video-based Pedestrian Intention Recognition and Path Prediction for Advanced Driver Assistance Systems

    Get PDF
    Fortgeschrittene Fahrerassistenzsysteme (FAS) spielen eine sehr wichtige Rolle in zukĂŒnftigen Fahrzeugen um die Sicherheit fĂŒr den Fahrer, der FahrgĂ€ste und ungeschĂŒtzte Verkehrsteilnehmer wie FußgĂ€nger und Radfahrer zu erhöhen. Diese Art von Systemen versucht in begrenztem Rahmen, ZusammenstĂ¶ĂŸe in gefĂ€hrlichen Situationen mit einem unaufmerksamen Fahrer und FußgĂ€nger durch das Auslösen einer automatischen Notbremsung zu vermeiden. Aufgrund der hohen VariabilitĂ€t an FußgĂ€ngerbewegungsmustern werden bestehende Systeme in einer konservativen Art und Weise konzipiert, um durch eine Restriktion auf beherrschbare Umgebungen mögliche Fehlauslöseraten drastisch zu reduzieren, wie z.B. in Szenarien in denen FußgĂ€nger plötzlich anhalten und dadurch die Situation deeskalieren. Um dieses Problem zu ĂŒberwinden, stellt eine zuverlĂ€ssige FußgĂ€ngerabsichtserkennung und Pfad\-vorhersage einen großen Wert dar. In dieser Arbeit wird die gesamte Ablaufkette eines Stereo-Video basierten Systems zur IntentionsschĂ€tzung und Pfadvorhersage von FußgĂ€ngern beschrieben, welches in einer spĂ€teren Funktionsentscheidung fĂŒr eine automatische Notbremsung verwendet wird. Im ersten von drei Hauptbestandteilen wird ein Echtzeit-Verfahren vorgeschlagen, das in niedrig aufgelösten Bildern aus komplexen und hoch dynamischen Innerstadt-Szenarien versucht, die Köpfe von FußgĂ€ngern zu lokalisieren und deren Pose zu schĂ€tzen. Einzelbild-basierte SchĂ€tzungen werden aus den Wahrscheinlichkeitsausgaben von acht angelernten Kopfposen-spezifischen Detektoren abgeleitet, die im Bildbereich eines FußgĂ€ngerkandidaten angewendet werden. Weitere Robustheit in der Kopflokalisierung wird durch Hinzunahme von Stereo-Tiefeninformation erreicht. DarĂŒber hinaus werden die Kopfpositionen und deren Pose ĂŒber die Zeit durch die Implementierung eines Partikelfilters geglĂ€ttet. FĂŒr die IntentionsschĂ€tzung von FußgĂ€ngern wird die Verwendung eines robusten und leistungsstarken Ansatzes des Maschinellen Lernens in unterschiedlichen Szenarien untersucht. Dieser Ansatz ist in der Lage, fĂŒr Zeitreihen von Beobachtungen, die inneren Unterstrukturen einer bestimmten Absichtsklasse zu modellieren und zusĂ€tzlich die extrinsische Dynamik zwischen unterschiedlichen Absichtsklassen zu erfassen. Das Verfahren integriert bedeutsame extrahierte Merkmale aus der FußgĂ€ngerdynamik sowie Kontextinformationen mithilfe der menschlichen Kopfpose. Zum Schluss wird ein Verfahren zur Pfadvorhersage vorgestellt, welches die PrĂ€diktionsschritte eines Filters fĂŒr multiple Bewegungsmodelle fĂŒr einen Zeithorizont von ungefĂ€hr einer Sekunde durch Einbeziehung der geschĂ€tzten FußgĂ€ngerabsichten steuert. Durch Hilfestellungen fĂŒr den Filter das geeignete Bewegungsmodell zu wĂ€hlen, kann der resultierende PfadprĂ€diktionsfehler um ein signifikantes Maß reduziert werden. Eine Vielzahl von Szenarien wird behandelt, einschließlich seitlich querender oder anhaltender FußgĂ€nger oder Personen, die zunĂ€chst entlang des BĂŒrgersteigs gehen aber dann plötzlich in Richtung der Fahrbahn einbiegen

    A comparative analysis of binary patterns with discrete cosine transform for gender classification

    Get PDF
    This paper presents a comparative analysis of binary patters for gender classification with a novel method of feature transformation for improved accuracy rates. The main requirements of our application are speed and accuracy. We investigate a combination of local binary patterns (LBP), Census Transform (CT) and Modified Census Transform (MCT) applied over the full, top and bottom halves of the face. Gender classification is performed using support vector machines (SVM). A main focus of the investigation is to determine whether or not a 1D discrete cosine transform (DCT) applied directly to the grey level histograms would improve accuracy. We used a public database of faces and run face and eye detection algorithms allowing automatic cropping and normalisation of the images. A set of 120 tests over the entire database demonstrate that the proposed 1D discrete cosine transform improves accuracy in all test cases with small standard deviations. It is shown that using basic versions of the algorithms, LBP is marginally superior to both CT and MCT and agrees with results in the literature for higher accuracy on male subjects. However, a significant result of our investigation is that, by applying a 1D-DCT this bias is removed and an equivalent error rate is achieved for both genders. Furthermore, it is demonstrated that DCT improves overall accuracy and renders CT a superior performance compared to LBP in all cases considered

    Use of Coherent Point Drift in computer vision applications

    Get PDF
    This thesis presents the novel use of Coherent Point Drift in improving the robustness of a number of computer vision applications. CPD approach includes two methods for registering two images - rigid and non-rigid point set approaches which are based on the transformation model used. The key characteristic of a rigid transformation is that the distance between points is preserved, which means it can be used in the presence of translation, rotation, and scaling. Non-rigid transformations - or affine transforms - provide the opportunity of registering under non-uniform scaling and skew. The idea is to move one point set coherently to align with the second point set. The CPD method finds both the non-rigid transformation and the correspondence distance between two point sets at the same time without having to use a-priori declaration of the transformation model used. The first part of this thesis is focused on speaker identification in video conferencing. A real-time, audio-coupled video based approach is presented, which focuses more on the video analysis side, rather than the audio analysis that is known to be prone to errors. CPD is effectively utilised for lip movement detection and a temporal face detection approach is used to minimise false positives if face detection algorithm fails to perform. The second part of the thesis is focused on multi-exposure and multi-focus image fusion with compensation for camera shake. Scale Invariant Feature Transforms (SIFT) are first used to detect keypoints in images being fused. Subsequently this point set is reduced to remove outliers, using RANSAC (RANdom Sample Consensus) and finally the point sets are registered using CPD with non-rigid transformations. The registered images are then fused with a Contourlet based image fusion algorithm that makes use of a novel alpha blending and filtering technique to minimise artefacts. The thesis evaluates the performance of the algorithm in comparison to a number of state-of-the-art approaches, including the key commercial products available in the market at present, showing significantly improved subjective quality in the fused images. The final part of the thesis presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR task and may capture vehicles at different approaching angles. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximise the reliability of the final outcome. Experimental results are provided to prove that the proposed system demonstrates an accuracy in excess of 95% when tested on real CCTV footage with no prior camera calibration

    Ubiquitous Technologies for Emotion Recognition

    Get PDF
    Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions
    corecore