Search CORE

40 research outputs found

The effectiveness of face detection algorithms in unconstrained crowd scenes

Author: Jeremiah R. Barr
Kevin W. Bowyer
Patrick J. Flynn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2014
Field of study

The 2013 Boston Marathon bombing represents a case where automatic facial biometrics tools could have proven invaluable to law enforcement officials, yet the lack of ro-bustness of current tools in unstructured environments lim-ited their utility. In this work, we focus on complications that confound face detection algorithms. We first present a simple multi-pose generalization of the Viola-Jones al-gorithm. Our results on the Face Detection Data set and Benchmark (FDDB) show that it makes a significant im-provement over the state of the art for published algorithms. Conversely, our experiments demonstrate that the improve-ments attained by accommodating multiple poses can be negligible compared to the gains yielded by normalizing scores and using the most appropriate classifier for uncon-trolled data. We conclude with a qualitative evaluation of the proposed algorithm on publicly available images of the Boston Marathon crowds. Although the results of our evalu-ations are encouraging, they confirm that there is still room for improvement in terms of robustness to out-of-plane ro-tation, blur and occlusion. 1

CiteSeerX

Crossref

Discriminative Appearance Models for Face Alignment

Author: Gao Hua
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2013
Field of study

The proposed face alignment algorithm uses local gradient features as the appearance representation. These features are obtained by pixel value comparison, which provide robustness against changes in illumination, as well as partial occlusion and local deformation due to the locality. The adopted features are modeled in three discriminative methods, which correspond to different alignment cost functions. The discriminative appearance modeling alleviate the generalization problem to some extent

KITopen

Multivariate Boosting with Look-Up Tables for Face Processing

Author: Atanasoaei Cosmin
Publication venue: Lausanne, EPFL
Publication date: 24/05/2012
Field of study

This thesis proposes a novel unified boosting framework. We apply this framework to the several face processing tasks, face detection, facial feature localisation, and pose classification, and use the same boosting algorithm and the same pool of features (local binary features). This is in contrast with the standard approaches that make use of a variety of features and models, for example AdaBoost, cascades of boosted classifiers and Active Appearance Models. The unified boosting framework covers multivariate classification and regression problems and it is achieved by interpreting boosting as optimization in the functional space of the weak learners. Thus a wide range of smooth loss functions can be optimized with the same algorithm. There are two general optimization strategies we propose that extend recent works on TaylorBoost and Variational AdaBoost. The first proposition is an empirical expectation formulation that minimizes the average loss and the second is a variational formulation that includes an additional penalty for large variations between predictions. These two boosting formulations are used to train real-time models using local binary features. This is achieved using look-up-tables as weak learners and multi-block Local Binary Patterns as features. The resulting boosting algorithms are simple, efficient and easily scalable with the available resources. Furthermore, we introduce a novel coarse-to-fine feature selection method to handle high resolution models and a bootstrapping algorithm to sample representative training data from very large pools of data. The proposed approach is evaluated for several face processing tasks. These tasks include frontal face detection (binary classification), facial feature localization (multivariate regression) and pose estimation (multivariate classification). Several studies are performed to assess different optimization algorithms, bootstrapping parametrizations and feature sharing methods (for the multivariate case). The results show good performance for all of these tasks. In addition to this, two other contributions are presented. First, we propose a context-based model for removing the false alarms generated by a given generic face detector. Second, we propose a new face detector that predicts the Jaccard distance between the current location and the ground truth. This allows us to formulate the face detection problem as a regression task

Infoscience - École polytechnique fédérale de Lausanne

Face Detection and Verification using Local Binary Patterns

Author: Rodriguez Yann
Publication venue: IDIAP
Publication date: 11/02/2010
Field of study

This thesis proposes a robust Automatic Face Verification (AFV) system using Local Binary Patterns (LBP). AFV is mainly composed of two modules: Face Detection (FD) and Face Verification (FV). The purpose of FD is to determine whether there are any face in an image, while FV involves confirming or denying the identity claimed by a person. The contributions of this thesis are the following: 1) a real-time multiview FD system which is robust to illumination and partial occlusion, 2) a FV system based on the adaptation of LBP features, 3) an extensive study of the performance evaluation of FD algorithms and in particular the effect of FD errors on FV performance. The first part of the thesis addresses the problem of frontal FD. We introduce the system of Viola and Jones which is the first real-time frontal face detector. One of its limitations is the sensitivity to local lighting variations and partial occlusion of the face. In order to cope with these limitations, we propose to use LBP features. Special emphasis is given to the scanning process and to the merging of overlapped detections, because both have a significant impact on the performance. We then extend our frontal FD module to multiview FD. In the second part, we present a novel generative approach for FV, based on an LBP description of the face. The main advantages compared to previous approaches are a very fast and simple training procedure and robustness to bad lighting conditions. In the third part, we address the problem of estimating the quality of FD. We first show the influence of FD errors on the FV task and then empirically demonstrate the limitations of current detection measures when applied to this task. In order to properly evaluate the performance of a face detection module, we propose to embed the FV into the performance measuring process. We show empirically that the proposed methodology better matches the final FV performance

Infoscience - École polytechnique fédérale de Lausanne

Video-based Pedestrian Intention Recognition and Path Prediction for Advanced Driver Assistance Systems

Author: Schulz Andreas
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2017
Field of study

Fortgeschrittene Fahrerassistenzsysteme (FAS) spielen eine sehr wichtige Rolle in zukünftigen Fahrzeugen um die Sicherheit für den Fahrer, der Fahrgäste und ungeschützte Verkehrsteilnehmer wie Fußgänger und Radfahrer zu erhöhen. Diese Art von Systemen versucht in begrenztem Rahmen, Zusammenstöße in gefährlichen Situationen mit einem unaufmerksamen Fahrer und Fußgänger durch das Auslösen einer automatischen Notbremsung zu vermeiden. Aufgrund der hohen Variabilität an Fußgängerbewegungsmustern werden bestehende Systeme in einer konservativen Art und Weise konzipiert, um durch eine Restriktion auf beherrschbare Umgebungen mögliche Fehlauslöseraten drastisch zu reduzieren, wie z.B. in Szenarien in denen Fußgänger plötzlich anhalten und dadurch die Situation deeskalieren. Um dieses Problem zu überwinden, stellt eine zuverlässige Fußgängerabsichtserkennung und Pfad\-vorhersage einen großen Wert dar. In dieser Arbeit wird die gesamte Ablaufkette eines Stereo-Video basierten Systems zur Intentionsschätzung und Pfadvorhersage von Fußgängern beschrieben, welches in einer späteren Funktionsentscheidung für eine automatische Notbremsung verwendet wird. Im ersten von drei Hauptbestandteilen wird ein Echtzeit-Verfahren vorgeschlagen, das in niedrig aufgelösten Bildern aus komplexen und hoch dynamischen Innerstadt-Szenarien versucht, die Köpfe von Fußgängern zu lokalisieren und deren Pose zu schätzen. Einzelbild-basierte Schätzungen werden aus den Wahrscheinlichkeitsausgaben von acht angelernten Kopfposen-spezifischen Detektoren abgeleitet, die im Bildbereich eines Fußgängerkandidaten angewendet werden. Weitere Robustheit in der Kopflokalisierung wird durch Hinzunahme von Stereo-Tiefeninformation erreicht. Darüber hinaus werden die Kopfpositionen und deren Pose über die Zeit durch die Implementierung eines Partikelfilters geglättet. Für die Intentionsschätzung von Fußgängern wird die Verwendung eines robusten und leistungsstarken Ansatzes des Maschinellen Lernens in unterschiedlichen Szenarien untersucht. Dieser Ansatz ist in der Lage, für Zeitreihen von Beobachtungen, die inneren Unterstrukturen einer bestimmten Absichtsklasse zu modellieren und zusätzlich die extrinsische Dynamik zwischen unterschiedlichen Absichtsklassen zu erfassen. Das Verfahren integriert bedeutsame extrahierte Merkmale aus der Fußgängerdynamik sowie Kontextinformationen mithilfe der menschlichen Kopfpose. Zum Schluss wird ein Verfahren zur Pfadvorhersage vorgestellt, welches die Prädiktionsschritte eines Filters für multiple Bewegungsmodelle für einen Zeithorizont von ungefähr einer Sekunde durch Einbeziehung der geschätzten Fußgängerabsichten steuert. Durch Hilfestellungen für den Filter das geeignete Bewegungsmodell zu wählen, kann der resultierende Pfadprädiktionsfehler um ein signifikantes Maß reduziert werden. Eine Vielzahl von Szenarien wird behandelt, einschließlich seitlich querender oder anhaltender Fußgänger oder Personen, die zunächst entlang des Bürgersteigs gehen aber dann plötzlich in Richtung der Fahrbahn einbiegen

KITopen

A comparative analysis of binary patterns with discrete cosine transform for gender classification

Author: Kormann Mariza
Rodrigues Marcos
Tomek Peter
Publication venue: 'World Scientific and Engineering Academy and Society (WSEAS)'
Publication date: 01/07/2014
Field of study

This paper presents a comparative analysis of binary patters for gender classification with a novel method of feature transformation for improved accuracy rates. The main requirements of our application are speed and accuracy. We investigate a combination of local binary patterns (LBP), Census Transform (CT) and Modified Census Transform (MCT) applied over the full, top and bottom halves of the face. Gender classification is performed using support vector machines (SVM). A main focus of the investigation is to determine whether or not a 1D discrete cosine transform (DCT) applied directly to the grey level histograms would improve accuracy. We used a public database of faces and run face and eye detection algorithms allowing automatic cropping and normalisation of the images. A set of 120 tests over the entire database demonstrate that the proposed 1D discrete cosine transform improves accuracy in all test cases with small standard deviations. It is shown that using basic versions of the algorithms, LBP is marginally superior to both CT and MCT and agrees with results in the literature for higher accuracy on male subjects. However, a significant result of our investigation is that, by applying a 1D-DCT this bias is removed and an equivalent error rate is achieved for both genders. Furthermore, it is demonstrated that DCT improves overall accuracy and renders CT a superior performance compared to LBP in all cases considered

Sheffield Hallam University Research Archive

Use of Coherent Point Drift in computer vision applications

Author: Sara Saravi (7168430)
Publication venue
Publication date: 01/01/2013
Field of study

This thesis presents the novel use of Coherent Point Drift in improving the robustness of a number of computer vision applications. CPD approach includes two methods for registering two images - rigid and non-rigid point set approaches which are based on the transformation model used. The key characteristic of a rigid transformation is that the distance between points is preserved, which means it can be used in the presence of translation, rotation, and scaling. Non-rigid transformations - or affine transforms - provide the opportunity of registering under non-uniform scaling and skew. The idea is to move one point set coherently to align with the second point set. The CPD method finds both the non-rigid transformation and the correspondence distance between two point sets at the same time without having to use a-priori declaration of the transformation model used. The first part of this thesis is focused on speaker identification in video conferencing. A real-time, audio-coupled video based approach is presented, which focuses more on the video analysis side, rather than the audio analysis that is known to be prone to errors. CPD is effectively utilised for lip movement detection and a temporal face detection approach is used to minimise false positives if face detection algorithm fails to perform. The second part of the thesis is focused on multi-exposure and multi-focus image fusion with compensation for camera shake. Scale Invariant Feature Transforms (SIFT) are first used to detect keypoints in images being fused. Subsequently this point set is reduced to remove outliers, using RANSAC (RANdom Sample Consensus) and finally the point sets are registered using CPD with non-rigid transformations. The registered images are then fused with a Contourlet based image fusion algorithm that makes use of a novel alpha blending and filtering technique to minimise artefacts. The thesis evaluates the performance of the algorithm in comparison to a number of state-of-the-art approaches, including the key commercial products available in the market at present, showing significantly improved subjective quality in the fused images. The final part of the thesis presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR task and may capture vehicles at different approaching angles. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximise the reliability of the final outcome. Experimental results are provided to prove that the proposed system demonstrates an accuracy in excess of 95% when tested on real CCTV footage with no prior camera calibration

Loughborough University Institutional Repository

Ubiquitous Technologies for Emotion Recognition

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions

Directory of Open Access Books (DOAB)