89 research outputs found

    Pedestrian and Vehicle Detection in Autonomous Vehicle Perception Systems—A Review

    Get PDF
    Autonomous Vehicles (AVs) have the potential to solve many traffic problems, such as accidents, congestion and pollution. However, there are still challenges to overcome, for instance, AVs need to accurately perceive their environment to safely navigate in busy urban scenarios. The aim of this paper is to review recent articles on computer vision techniques that can be used to build an AV perception system. AV perception systems need to accurately detect non-static objects and predict their behaviour, as well as to detect static objects and recognise the information they are providing. This paper, in particular, focuses on the computer vision techniques used to detect pedestrians and vehicles. There have been many papers and reviews on pedestrians and vehicles detection so far. However, most of the past papers only reviewed pedestrian or vehicle detection separately. This review aims to present an overview of the AV systems in general, and then review and investigate several detection computer vision techniques for pedestrians and vehicles. The review concludes that both traditional and Deep Learning (DL) techniques have been used for pedestrian and vehicle detection; however, DL techniques have shown the best results. Although good detection results have been achieved for pedestrians and vehicles, the current algorithms still struggle to detect small, occluded, and truncated objects. In addition, there is limited research on how to improve detection performance in difficult light and weather conditions. Most of the algorithms have been tested on well-recognised datasets such as Caltech and KITTI; however, these datasets have their own limitations. Therefore, this paper recommends that future works should be implemented on more new challenging datasets, such as PIE and BDD100K.EPSRC DTP PhD studentshi

    Robust object representation by boosting-like deep learning architecture

    Get PDF
    This paper presents a new deep learning architecture for robust object representation, aiming at efficiently combining the proposed synchronized multi-stage feature (SMF) and a boosting-like algorithm. The SMF structure can capture a variety of characteristics from the inputting object based on the fusion of the handcraft features and deep learned features. With the proposed boosting-like algorithm, we can obtain more convergence stability on training multi-layer network by using the boosted samples. We show the generalization of our object representation architecture by applying it to undertake various tasks, i.e. pedestrian detection and action recognition. Our approach achieves 15.89% and 3.85% reduction in the average miss rate compared with ACF and JointDeep on the largest Caltech dataset, and acquires competitive results on the MSRAction3D dataset

    Optimisation de Base de Donnée pour la Détection de Piétons temps-réel

    Get PDF
    International audienceThis paper tackles data selection for training set generation in the context of nearreal-time pedestrian detection through the introduction of a training methodology: FairTrain.After highlighting the impact of poorly chosen data on detector performance, we will introduce anew data selection technique utilizing the expectation-maximization algorithm for data weighting.FairTrain also features a version of the cascade-of-rejectors enhanced with data selection principles.Experiments on the INRIA and PETS2009 datasets prove that, when ne trained, a simple HoG-based detector can perform on par with most of its near real-time competitors.Ce document traite de la sélection de données pour la génération de l’ensembled’entraînement dans le contexte de la détection des piétons en temps-réel grâce a l’introductiond’une méthodologie: FairTrain. Après avoir souligné l’impact des données mal choisies sur lesperformances des détecteurs, nous allons présenter une nouvelle technique de sélection de donnéespondéré par l’algorithme d’expectation-maximization. FairTrain propose également une versionde cascade-de-rejecteurs améliorée avec des principes de sélection de données. Les expériencessur les bases de données INRIA et Caltech prouvent que, lorsqu’ils sont bien formés, un simpledétecteur basé sur des HoGs fonctionne aussi bien que ses concurrents temps-réel

    Coopération de réseaux de caméras ambiantes et de vision embarquée sur robot mobile pour la surveillance de lieux publics

    Get PDF
    Actuellement, il y a une demande croissante pour le déploiement de robots mobile dans des lieux publics. Pour alimenter cette demande, plusieurs chercheurs ont déployé des systèmes robotiques de prototypes dans des lieux publics comme les hôpitaux, les supermarchés, les musées, et les environnements de bureau. Une principale préoccupation qui ne doit pas être négligé, comme des robots sortent de leur milieu industriel isolé et commencent à interagir avec les humains dans un espace de travail partagé, est une interaction sécuritaire. Pour un robot mobile à avoir un comportement interactif sécuritaire et acceptable - il a besoin de connaître la présence, la localisation et les mouvements de population à mieux comprendre et anticiper leurs intentions et leurs actions. Cette thèse vise à apporter une contribution dans ce sens en mettant l'accent sur les modalités de perception pour détecter et suivre les personnes à proximité d'un robot mobile. Comme une première contribution, cette thèse présente un système automatisé de détection des personnes visuel optimisé qui prend explicitement la demande de calcul prévue sur le robot en considération. Différentes expériences comparatives sont menées pour mettre clairement en évidence les améliorations de ce détecteur apporte à la table, y compris ses effets sur la réactivité du robot lors de missions en ligne. Dans un deuxiè contribution, la thèse propose et valide un cadre de coopération pour fusionner des informations depuis des caméras ambiant affixé au mur et de capteurs montés sur le robot mobile afin de mieux suivre les personnes dans le voisinage. La même structure est également validée par des données de fusion à partir des différents capteurs sur le robot mobile au cours de l'absence de perception externe. Enfin, nous démontrons les améliorations apportées par les modalités perceptives développés en les déployant sur notre plate-forme robotique et illustrant la capacité du robot à percevoir les gens dans les lieux publics supposés et respecter leur espace personnel pendant la navigation.This thesis deals with detection and tracking of people in a surveilled public place. It proposes to include a mobile robot in classical surveillance systems that are based on environment fixed sensors. The mobile robot brings about two important benefits: (1) it acts as a mobile sensor with perception capabilities, and (2) it can be used as means of action for service provision. In this context, as a first contribution, it presents an optimized visual people detector based on Binary Integer Programming that explicitly takes the computational demand stipulated into consideration. A set of homogeneous and heterogeneous pool of features are investigated under this framework, thoroughly tested and compared with the state-of-the-art detectors. The experimental results clearly highlight the improvements the different detectors learned with this framework bring to the table including its effect on the robot's reactivity during on-line missions. As a second contribution, the thesis proposes and validates a cooperative framework to fuse information from wall mounted cameras and sensors on the mobile robot to better track people in the vicinity. Finally, we demonstrate the improvements brought by the developed perceptual modalities by deploying them on our robotic platform and illustrating the robot's ability to perceive people in supposed public areas and respect their personal space during navigation

    From pixels to people : recovering location, shape and pose of humans in images

    Get PDF
    Humans are at the centre of a significant amount of research in computer vision. Endowing machines with the ability to perceive people from visual data is an immense scientific challenge with a high degree of direct practical relevance. Success in automatic perception can be measured at different levels of abstraction, and this will depend on which intelligent behaviour we are trying to replicate: the ability to localise persons in an image or in the environment, understanding how persons are moving at the skeleton and at the surface level, interpreting their interactions with the environment including with other people, and perhaps even anticipating future actions. In this thesis we tackle different sub-problems of the broad research area referred to as "looking at people", aiming to perceive humans in images at different levels of granularity. We start with bounding box-level pedestrian detection: We present a retrospective analysis of methods published in the decade preceding our work, identifying various strands of research that have advanced the state of the art. With quantitative exper- iments, we demonstrate the critical role of developing better feature representations and having the right training distribution. We then contribute two methods based on the insights derived from our analysis: one that combines the strongest aspects of past detectors and another that focuses purely on learning representations. The latter method outperforms more complicated approaches, especially those based on hand- crafted features. We conclude our work on pedestrian detection with a forward-looking analysis that maps out potential avenues for future research. We then turn to pixel-level methods: Perceiving humans requires us to both separate them precisely from the background and identify their surroundings. To this end, we introduce Cityscapes, a large-scale dataset for street scene understanding. This has since established itself as a go-to benchmark for segmentation and detection. We additionally develop methods that relax the requirement for expensive pixel-level annotations, focusing on the task of boundary detection, i.e. identifying the outlines of relevant objects and surfaces. Next, we make the jump from pixels to 3D surfaces, from localising and labelling to fine-grained spatial understanding. We contribute a method for recovering 3D human shape and pose, which marries the advantages of learning-based and model- based approaches. We conclude the thesis with a detailed discussion of benchmarking practices in computer vision. Among other things, we argue that the design of future datasets should be driven by the general goal of combinatorial robustness besides task-specific considerations.Der Mensch steht im Zentrum vieler Forschungsanstrengungen im Bereich des maschinellen Sehens. Es ist eine immense wissenschaftliche Herausforderung mit hohem unmittelbarem Praxisbezug, Maschinen mit der Fähigkeit auszustatten, Menschen auf der Grundlage von visuellen Daten wahrzunehmen. Die automatische Wahrnehmung kann auf verschiedenen Abstraktionsebenen erfolgen. Dies hängt davon ab, welches intelligente Verhalten wir nachbilden wollen: die Fähigkeit, Personen auf der Bildfläche oder im 3D-Raum zu lokalisieren, die Bewegungen von Körperteilen und Körperoberflächen zu erfassen, Interaktionen einer Person mit ihrer Umgebung einschließlich mit anderen Menschen zu deuten, und vielleicht sogar zukünftige Handlungen zu antizipieren. In dieser Arbeit beschäftigen wir uns mit verschiedenen Teilproblemen die dem breiten Forschungsgebiet "Betrachten von Menschen" gehören. Beginnend mit der Fußgängererkennung präsentieren wir eine Analyse von Methoden, die im Jahrzehnt vor unserem Ausgangspunkt veröffentlicht wurden, und identifizieren dabei verschiedene Forschungsstränge, die den Stand der Technik vorangetrieben haben. Unsere quantitativen Experimente zeigen die entscheidende Rolle sowohl der Entwicklung besserer Bildmerkmale als auch der Trainingsdatenverteilung. Anschließend tragen wir zwei Methoden bei, die auf den Erkenntnissen unserer Analyse basieren: eine Methode, die die stärksten Aspekte vergangener Detektoren kombiniert, eine andere, die sich im Wesentlichen auf das Lernen von Bildmerkmalen konzentriert. Letztere übertrifft kompliziertere Methoden, insbesondere solche, die auf handgefertigten Bildmerkmalen basieren. Wir schließen unsere Arbeit zur Fußgängererkennung mit einer vorausschauenden Analyse ab, die mögliche Wege für die zukünftige Forschung aufzeigt. Anschließend wenden wir uns Methoden zu, die Entscheidungen auf Pixelebene betreffen. Um Menschen wahrzunehmen, müssen wir diese sowohl praezise vom Hintergrund trennen als auch ihre Umgebung verstehen. Zu diesem Zweck führen wir Cityscapes ein, einen umfangreichen Datensatz zum Verständnis von Straßenszenen. Dieser hat sich seitdem als Standardbenchmark für Segmentierung und Erkennung etabliert. Darüber hinaus entwickeln wir Methoden, die die Notwendigkeit teurer Annotationen auf Pixelebene reduzieren. Wir konzentrieren uns hierbei auf die Aufgabe der Umgrenzungserkennung, d. h. das Erkennen der Umrisse relevanter Objekte und Oberflächen. Als nächstes machen wir den Sprung von Pixeln zu 3D-Oberflächen, vom Lokalisieren und Beschriften zum präzisen räumlichen Verständnis. Wir tragen eine Methode zur Schätzung der 3D-Körperoberfläche sowie der 3D-Körperpose bei, die die Vorteile von lernbasierten und modellbasierten Ansätzen vereint. Wir schließen die Arbeit mit einer ausführlichen Diskussion von Evaluationspraktiken im maschinellen Sehen ab. Unter anderem argumentieren wir, dass der Entwurf zukünftiger Datensätze neben aufgabenspezifischen Überlegungen vom allgemeinen Ziel der kombinatorischen Robustheit bestimmt werden sollte

    Pedestrian Movement Direction Recognition Using Convolutional Neural Networks

    Get PDF
    Pedestrian movement direction recognition is an important factor in autonomous driver assistance and security surveillance systems. Pedestrians are the most crucial and fragile moving objects in streets, roads, and events, where thousands of people may gather on a regular basis. People flow analysis on zebra crossings and in shopping centers or events such as demonstrations are a key element to improve safety and to enable autonomous cars to drive in real life environments. This paper focuses on deep learning techniques such as convolutional neural networks (CNN) to achieve a reliable detection of pedestrians moving in a particular direction. We propose a CNN-based technique that leverages current pedestrian detection techniques (histograms of oriented gradients-linSVM) to generate a sum of subtracted frames (flow estimation around the detected pedestrian), which are used as an input for the proposed modified versions of various state-of-the-art CNN networks, such as AlexNet, GoogleNet, and ResNet. Moreover, we have also created a new data set for this purpose, and analyzed the importance of training in a known data set for the neural networks to achieve reliable results.This work was supported by the Feder funds, Spanish Government through the COMBAHO Project, under Grant TIN2016-76515-R, and in part by the University of Alicante Project under Grant GRE16-19

    Deeply Smile Detection Based on Discriminative Features with Modified LeNet-5 Network

    Get PDF
    Facial expressions are caused by specific movements of the face muscles; they are regarded as a visible manifestation of a person\u27s inner thought process, internal emotional states, and intentions. A smile is a facial expression that often indicates happiness, satisfaction, or agreement. Many applications use smile detection such as automatic image capture, distance learning systems, interactive systems, video conferencing, patient monitoring, and product rating. The smile detection system is divided into two stages: feature extraction and classification. As a result, the accuracy of smile detection is dependent on both phases. In recent years, numerous researchers and scholars have identified various approaches to smile detection, however, their accuracy is still under the desired level. To this end, we propose an effective Convolutional Neural Network (CNN) architecture based on modified LeNet-5 Network (MLeNet-5) for detecting smiles in images. The proposed system generates low-level face identifiers and detect smiles using a strong binary classifier. In our experiments, the proposed MLenet-5 system used the SMILEsmilesD and (GENKI-4 K) databases in which the smile detection rate of the proposed method improves the accuracy by 2% on SMILEsmilesD database and 5% on GENKI-4 K database relative to LeNet-5-based CNN network. In addition, the proposed system decreases the number of parameters compared to LeNet-5-based CNN network and most of the existing models while maintaining the robustness and effectiveness of the results

    Deeply Smile Detection Based on Discriminative Features with Modified LeNet-5 Network

    Get PDF
    Facial expressions are caused by specific movements of the face muscles; they are regarded as a visible manifestation of a person\u27s inner thought process, internal emotional states, and intentions. A smile is a facial expression that often indicates happiness, satisfaction, or agreement. Many applications use smile detection such as automatic image capture, distance learning systems, interactive systems, video conferencing, patient monitoring, and product rating. The smile detection system is divided into two stages: feature extraction and classification. As a result, the accuracy of smile detection is dependent on both phases. In recent years, numerous researchers and scholars have identified various approaches to smile detection, however, their accuracy is still under the desired level. To this end, we propose an effective Convolutional Neural Network (CNN) architecture based on modified LeNet-5 Network (MLeNet-5) for detecting smiles in images. The proposed system generates low-level face identifiers and detect smiles using a strong binary classifier. In our experiments, the proposed MLenet-5 system used the SMILEsmilesD and (GENKI-4 K) databases in which the smile detection rate of the proposed method improves the accuracy by 2% on SMILEsmilesD database and 5% on GENKI-4 K database relative to LeNet-5-based CNN network. In addition, the proposed system decreases the number of parameters compared to LeNet-5-based CNN network and most of the existing models while maintaining the robustness and effectiveness of the results

    Prototype to Increase Crosswalk Safety by Integrating Computer Vision with ITS-G5 Technologies

    Get PDF
    Human errors are probably the main cause of car accidents, and this type of vehicle is one of the most dangerous forms of transport for people. The danger comes from the fact that on public roads there are simultaneously different types of actors (drivers, pedestrians or cyclists) and many objects that change their position over time, making difficult to predict their immediate movements. The intelligent transport system (ITS-G5) standard specifies the European communication technologies and protocols to assist public road users, providing them with relevant information. The scientific community is developing ITS-G5 applications for various purposes, among which is the increasing of pedestrian safety. This paper describes the developed work to implement an ITS-G5 prototype that aims at the increasing of pedestrian and driver safety in the vicinity of a pedestrian crosswalk by sending ITS-G5 decentralized environmental notification messages (DENM) to the vehicles. These messages are analyzed, and if they are relevant, they are presented to the driver through a car’s onboard infotainment system. This alert allows the driver to take safety precautions to prevent accidents. The implemented prototype was tested in a controlled environment pedestrian crosswalk. The results showed the capacity of the prototype for detecting pedestrians, suitable message sending, the reception and processing on a vehicle onboard unit (OBU) module and its presentation on the car onboard infotainment system.info:eu-repo/semantics/publishedVersio
    • …
    corecore