5 research outputs found

    Scene understanding from a moving camera for object detection and free space estimation

    No full text
    Modern vehicles are equipped with multiple cameras which are already used in various practical applications. Advanced driver assistance systems (ADAS) are of particular interest because of the safety and comfort features they offer to the driver. Camera based scene understanding is an important scientific problem that has to be addressed in order to provide the information needed for camera based driver assistance systems. While frontal cameras are widely used, there are applications where cameras observing lateral space can deliver better results. Fish eye cameras mounted in the side mirrors are particularly interesting, because they can observe a big area on the side of the vehicle and can be used for several applications for which the traditional front facing cameras are not suitable. We present a general method for scene understanding using 3D reconstruction of the environment around the vehicle. It is based on pixel-wise image labeling using a conditional random field (CRF). Our method is able to create a simple 3D model of the scene and also to provide semantic labels of the different objects and areas in the image, like for example cars, sidewalks, and buildings. We demonstrate how our method can be used for two applications that are of high importance for various driver assistance systems — car detection and free space estimation. We show that our system is able to perform in real time for speeds of up to 63 km/h

    Monitoring the driver's activity using 3D information

    Get PDF
    Driver supervision is crucial in safety systems for the driver. It is important to monitor the driver to understand his necessities, patterns of movements and behaviour under determined circumstances. The availability of an accurate tool to supervise the driver’s behaviour allows multiple objectives to be achieved such as the detection of drowsiness (analysing the head movements and blinking pattern) and distraction (estimating where the driver is looking by studying the head and eyes position). Once the misbehaviour is detected in both cases an alarm, of the correct type according to the situation, could be triggered to correct the driver’s behaviour. This application distinguishes itself form other driving assistance systems due to the fact that it is oriented to analyse the inside of the vehicle instead of the outside. It is important to notice that inside supervising applications are as important as the outside supervising applications because if the driver falls asleep, a pedestrian detection algorithm can do only limited actions to prevent the accident. All this under the best and predetermined circumstances. The application has the potential to be used to estimate if the driver is looking at certain area where another application detected that an obstacle is present (inert object, animal or pedestrian). Although the market has already available technologies, able to provide automatic driver monitoring, the associated cost of the sensors to accomplish this task is very high as it is not a popular product (compared to other home or entertaining devices) nor there is a market with a high demand and supply for this sensors. Many of these technologies require external and invasive devices (attach one or a set of sensors to the body) which may interfere the driving movements proper of the nature of the driver under no supervised conditions. Current applications based on computer vision take advantage of the latest development of information technologies and the increase in computational power to create applications that fit to the criteria of a non-invasive method for driving monitoring application. Technologies such as stereo and time of flight cameras are able to overcome some of the difficulties related to computer vision applications such as extreme lighting conditions (too dark or too bright) saturation of the colour sensors and lack of depth information. It is true that the combination of different sensors can overcome this problems by performing multiple scans from different areas or by combining the information obtained from different devices but this requires an additional step of calibration, positioning and it involves a dependability factor of the application on not one but as many sensors included in the task to perform the supervision because if one of them fails, the results may not be correct. Some of the recent gaming sensors available in the market, such as the Kinect sensor bar form Microsoft, are providing a new set of previously-expensive sensors embedded in a low cost device, thus providing 3D information together with some additional features and without the need for complex sets of handcrafted system that can fail as previously mentioned. The proposed solution in this thesis monitors the driver by using the different data from the Kinect sensor (depth information, infrared and colour image). The fusion of the information from the different sources allows the usage of 2D and 3D algorithms in order to provide a reliable face detection, accurate pose estimation and trustable detection of facial features such as the eyes and nose. The system will compare, with an average speed over 10Hz, the initial face capture with the next frames, it will compare by an iterative algorithm previously configured with the compromise of accuracy and speed. In order to determine the reliability and accuracy of the proposed system, several tests were performed for the head-pose orientation algorithm with an Inertial Measurement Unit (IMU) attached to the back of the head of the collaborative subjects. The inertial measurements provided by the IMU were used as a ground truth for three degrees of freedom (3DoF) tests (yaw, pitch and roll). Finally, the tests results were compared with those available in current literature to check the performance of the algorithm presented. Estimating the head orientation is the main function of this proposal as it is the one that delivers more information to estimate the behaviour of the driver. Whether it is to have a first estimation if the driver is looking to the front or if it is presenting signs of fatigue when nodding. Supporting this tool, is another that is in charge of the analysis of the colour image that will deal with the study of the eyes of the driver. From this study, it will be possible to estimate where the driver is looking at by estimating the gaze orientation through the position of the pupil. The gaze orientation would help, along with the head orientation, to have a more accurate guess regarding where the driver is looking. The gaze orientation is then a support tool that complements the head orientation. Another way to estimate a hazardous situation is with the analysis of the opening of the eyes. It can be estimated if the driver is tired through the study of the driver’s blinking pattern during a determined time. If it is so, the driver increases the chance to cause an accident due to drowsiness. The part of the whole solution that deals with solving this problem will analyse one eye of the driver to estimate if it is closed or open according to the analysis of dark regions in the image. Once the state of the eye is determined, an analysis during a determined period of time will be done in order to know if the eye was most of the time closed or open and thus estimate in a more accurate way if the driver is falling asleep or not. This 2 modules, drowsiness detector and gaze estimator, will complement the estimation of the head orientation with the goal of getting more certainty regarding the driver’s status and, when possible, to prevent an accident due to misbehaviours. It is worth to mention that the Kinect sensor is built specifically for indoor use and connected to a video console, not for the outside. Therefore, it is inevitable that some limitations arise when performing monitoring under real driving conditions. They will be discussed in this proposal. However, the algorithm presented can be used with any point-cloud based sensor (stereo cameras, time of flight cameras, laser scanners etc...); more expensive, but less sensitive compared to the former. Future works are described at the end in order to show the scalability of this proposal.La supervisión del conductor es crucial en los sistemas de asistencia a la conducción. Resulta importante monitorizarle para entender sus necesidades, patrones de movimiento y comportamiento bajo determinadas circunstancias. La disponibilidad de una herramienta precisa que supervise el comportamiento del conductor permite que varios objetivos sean alcanzados como la detección de somnolencia (analizando los movimientos de la cabeza y parpadeo) y distracción (estimando hacia donde está mirando por medio del estudio de la posición tanto de la cabeza como de los ojos). En ambos casos, una vez detectado el mal comportamiento, se podría activar una alarma del tipo adecuado según la situación que le corresponde con el objetivo de corregir su comportamiento del conductor Esta aplicación se distingue de otros sistemas avanzados de asistencia la conducción debido al hecho de que está orientada al análisis interior del vehículo en lugar del exterior. Es importante notar que las aplicaciones de supervisión interna son tan importantes como las del exterior debido a que si el conductor se duerme, un sistema de detección de peatones o vehículos sólo podrá hacer ciertas maniobras para evitar un accidente. Todo esto bajo las condiciones idóneas y circunstancias predeterminadas. Esta aplicación tiene el potencial para estimar si quien conduce está mirando hacia una zona específica que otra aplicación que detecta objetos, animales y peatones ha remarcado como importante. Aunque en el mercado existen tecnologías disponibles capaces de supervisar al conductor, estas tienen un coste prohibitivo para cierto grupo de clientela debido a que no es un producto popular (comparado con otros dispositivos para el hogar o de entretenimiento) ni existe un mercado con alta oferta y demanda de dichos dispositivos. Muchas de estas tecnologías requieren de dispositivos externos e invasivos (colocarle al conductor uno o más sensores en el cuerpo) que podrían interferir con la naturaleza de los movimientos propios de la conducción bajo condiciones sin supervisar. Las aplicaciones actuales basadas en visión por computador toman ventaja de los últimos desarrollos de la tecnología informática y el incremento en poder computacional para crear aplicaciones que se ajustan al criterio de un método no invasivo para aplicarlo a la supervisión del conductor. Tecnologías como cámaras estéreo y del tipo “tiempo de vuelo” son capaces de sobrepasar algunas de las dificultades relacionadas a las aplicaciones de visión por computador como condiciones extremas de iluminación (diurna y nocturna), saturación de los sensores de color y la falta de información de profundidad. Es cierto que la combinación y fusión de sensores puede resolver este problema por medio de múltiples escaneos de diferentes zonas o combinando la información obtenida de diversos dispositivos pero esto requeriría un paso adicional de calibración, posicionamiento e involucra un factor de dependencia de la aplicación hacia no uno sino los múltiples sensores involucrados ya que si uno de ellos falla, los resultados podrían no ser correctos. Recientemente han aparecido en el mercado de los videojuego algunos sensores, como es el caso de la barra de sensores Kinect de Microsoft, dispositivo de bajo coste, que ofrece información 3D junto con otras características adicionales y sin la necesidad de sistemas complejos de sistemas manufacturados que pueden fallar como se ha mencionado anteriormente. La solución propuesta en esta tesis supervisa al conductor por medio del uso de información diversa del sensor Kinect (información de profundidad, imágenes de color en espectro visible y en espectro infrarrojo). La fusión de información de diversas fuentes permite el uso de algoritmos en 2D y 3D con el objetivo de proveer una detección facial confiable, estimación de postura precisa y detección de características faciales como los ojos y la nariz. El sistema comparará, con una velocidad promedio superior a 10Hz, la captura inicial de la cara con el resto de las imágenes de video, la comparación la hará por medio de un algoritmo iterativo previamente configurado comprometido con el balance entre velocidad y precisión. Con tal de determinar la fiabilidad y precisión del sistema propuesto, diversas pruebas fueron realizadas para el algoritmo de estimación de postura de la cabeza con una unidad de medidas inerciales (IMU por sus siglas en inglés) situada en la parte trasera de la cabeza de los sujetos que participaron en los ensayos. Las medidas inerciales provistas por la IMU fueron usadas como punto de referencia para las pruebas de los tres grados de libertad de movimiento. Finalmente, los resultados de las pruebas fueron comparados con aquellos disponibles en la literatura actual para comprobar el rendimiento del algoritmo aquí presentado. Estimar la orientación de la cabeza es la función principal de esta propuesta ya que es la que más aporta información para la estimación del comportamiento del conductor. Sea para tener una primera estimación si ve hacia el frente o si presenta señales de fatiga al cabecear hacia abajo. Acompañando a esta herramienta, está el análisis de la imagen a color que se encargará del estudio de los ojos. A partir de dicho estudio, se podrá estimar hacia donde está viendo el conductor según la posición de la pupila. La orientación de la mirada ayudaría, junto con la orientación de la cabeza, a saber hacia dónde ve el conductor. La estimación de la orientación de la mirada es una herramienta de soporte que complementa la orientación de la cabeza. Otra forma de determinar una situación de riesgo es con el análisis de la apertura de los ojos. A través del estudio del patrón de parpadeo en el conductor durante un determinado tiempo se puede estimar si se encuentra cansado. De ser así, el conductor aumenta las posibilidades de causar un accidente debido a la somnolencia. La parte de la solución que se encarga de resolver este problema analizará un ojo del conductor para estimar si se encuentra cerrado o abierto de acuerdo al análisis de regiones de interés en la imagen. Una vez determinado el estado del ojo, se procederá a hacer un análisis durante un determinado tiempo para saber si el ojo ha estado mayormente cerrado o abierto y estimar de forma más acertada si se está quedando dormido o no. Estos 2 módulos, el detector de somnolencia y el análisis de la mirada complementarán la estimación de la orientación de la cabeza con el objetivo de brindar mayor certeza acerca del estado del conductor y, de ser posible, prevenir un accidente debido a malos comportamientos. Es importante mencionar que el sensor Kinect está construido específicamente para el uso dentro de una habitación y conectado a una videoconsola, no para el exterior. Por lo tanto, es inevitable que algunas limitaciones salgan a luz cuando se realice la monitorización bajo condiciones reales de conducción. Dichos problemas serán mencionados en esta propuesta. Sin embargo, el algoritmo presentado es generalizable a cualquier sensor basado en nubes de puntos (cámaras estéreo, cámaras del tipo “time of flight”, escáneres láseres etc...); más caros pero menos sensibles a estos inconvenientes previamente descritos. Se mencionan también trabajos futuros al final con el objetivo de enseñar la escalabilidad de esta propuesta.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Andrés Iborra García.- Secretario: Francisco José Rodríguez Urbano.- Vocal: José Manuel Pastor Garcí

    Multimodal machine learning for intelligent mobility

    Get PDF
    Scientific problems are solved by finding the optimal solution for a specific task. Some problems can be solved analytically while other problems are solved using data driven methods. The use of digital technologies to improve the transportation of people and goods, which is referred to as intelligent mobility, is one of the principal beneficiaries of data driven solutions. Autonomous vehicles are at the heart of the developments that propel Intelligent Mobility. Due to the high dimensionality and complexities involved in real-world environments, it needs to become commonplace for intelligent mobility to use data-driven solutions. As it is near impossible to program decision making logic for every eventuality manually. While recent developments of data-driven solutions such as deep learning facilitate machines to learn effectively from large datasets, the application of techniques within safety-critical systems such as driverless cars remain scarce.Autonomous vehicles need to be able to make context-driven decisions autonomously in different environments in which they operate. The recent literature on driverless vehicle research is heavily focused only on road or highway environments but have discounted pedestrianized areas and indoor environments. These unstructured environments tend to have more clutter and change rapidly over time. Therefore, for intelligent mobility to make a significant impact on human life, it is vital to extend the application beyond the structured environments. To further advance intelligent mobility, researchers need to take cues from multiple sensor streams, and multiple machine learning algorithms so that decisions can be robust and reliable. Only then will machines indeed be able to operate in unstructured and dynamic environments safely. Towards addressing these limitations, this thesis investigates data driven solutions towards crucial building blocks in intelligent mobility. Specifically, the thesis investigates multimodal sensor data fusion, machine learning, multimodal deep representation learning and its application of intelligent mobility. This work demonstrates that mobile robots can use multimodal machine learning to derive driver policy and therefore make autonomous decisions.To facilitate autonomous decisions necessary to derive safe driving algorithms, we present an algorithm for free space detection and human activity recognition. Driving these decision-making algorithms are specific datasets collected throughout this study. They include the Loughborough London Autonomous Vehicle dataset, and the Loughborough London Human Activity Recognition dataset. The datasets were collected using an autonomous platform design and developed in house as part of this research activity. The proposed framework for Free-Space Detection is based on an active learning paradigm that leverages the relative uncertainty of multimodal sensor data streams (ultrasound and camera). It utilizes an online learning methodology to continuously update the learnt model whenever the vehicle experiences new environments. The proposed Free Space Detection algorithm enables an autonomous vehicle to self-learn, evolve and adapt to new environments never encountered before. The results illustrate that online learning mechanism is superior to one-off training of deep neural networks that require large datasets to generalize to unfamiliar surroundings. The thesis takes the view that human should be at the centre of any technological development related to artificial intelligence. It is imperative within the spectrum of intelligent mobility where an autonomous vehicle should be aware of what humans are doing in its vicinity. Towards improving the robustness of human activity recognition, this thesis proposes a novel algorithm that classifies point-cloud data originated from Light Detection and Ranging sensors. The proposed algorithm leverages multimodality by using the camera data to identify humans and segment the region of interest in point cloud data. The corresponding 3-dimensional data was converted to a Fisher Vector Representation before being classified by a deep Convolutional Neural Network. The proposed algorithm classifies the indoor activities performed by a human subject with an average precision of 90.3%. When compared to an alternative point cloud classifier, PointNet[1], [2], the proposed framework out preformed on all classes. The developed autonomous testbed for data collection and algorithm validation, as well as the multimodal data-driven solutions for driverless cars, is the major contributions of this thesis. It is anticipated that these results and the testbed will have significant implications on the future of intelligent mobility by amplifying the developments of intelligent driverless vehicles.</div
    corecore