8,385 research outputs found

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields

    Full text link
    This work presents a first evaluation of using spatio-temporal receptive fields from a recently proposed time-causal spatio-temporal scale-space framework as primitives for video analysis. We propose a new family of video descriptors based on regional statistics of spatio-temporal receptive field responses and evaluate this approach on the problem of dynamic texture recognition. Our approach generalises a previously used method, based on joint histograms of receptive field responses, from the spatial to the spatio-temporal domain and from object recognition to dynamic texture recognition. The time-recursive formulation enables computationally efficient time-causal recognition. The experimental evaluation demonstrates competitive performance compared to state-of-the-art. Especially, it is shown that binary versions of our dynamic texture descriptors achieve improved performance compared to a large range of similar methods using different primitives either handcrafted or learned from data. Further, our qualitative and quantitative investigation into parameter choices and the use of different sets of receptive fields highlights the robustness and flexibility of our approach. Together, these results support the descriptive power of this family of time-causal spatio-temporal receptive fields, validate our approach for dynamic texture recognition and point towards the possibility of designing a range of video analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure

    Detection and Recognition of Traffic Signs Inside the Attentional Visual Field of Drivers

    Get PDF
    Traffic sign detection and recognition systems are essential components of Advanced Driver Assistance Systems and self-driving vehicles. In this contribution we present a vision-based framework which detects and recognizes traffic signs inside the attentional visual field of drivers. This technique takes advantage of the driver\u27s 3D absolute gaze point obtained through the combined use of a front-view stereo imaging system and a non-contact 3D gaze tracker. We used a linear Support Vector Machine as a classifier and a Histogram of Oriented Gradient as features for detection. Recognition is performed by using Scale Invariant Feature Transforms and color information. Our technique detects and recognizes signs which are in the field of view of the driver and also provides indication when one or more signs have been missed by the driver

    Overview of Environment Perception for Intelligent Vehicles

    Get PDF
    This paper presents a comprehensive literature review on environment perception for intelligent vehicles. The state-of-the-art algorithms and modeling methods for intelligent vehicles are given, with a summary of their pros and cons. A special attention is paid to methods for lane and road detection, traffic sign recognition, vehicle tracking, behavior analysis, and scene understanding. In addition, we provide information about datasets, common performance analysis, and perspectives on future research directions in this area

    Computer Vision Algorithms for Mobile Camera Applications

    Get PDF
    Wearable and mobile sensors have found widespread use in recent years due to their ever-decreasing cost, ease of deployment and use, and ability to provide continuous monitoring as opposed to sensors installed at fixed locations. Since many smart phones are now equipped with a variety of sensors, including accelerometer, gyroscope, magnetometer, microphone and camera, it has become more feasible to develop algorithms for activity monitoring, guidance and navigation of unmanned vehicles, autonomous driving and driver assistance, by using data from one or more of these sensors. In this thesis, we focus on multiple mobile camera applications, and present lightweight algorithms suitable for embedded mobile platforms. The mobile camera scenarios presented in the thesis are: (i) activity detection and step counting from wearable cameras, (ii) door detection for indoor navigation of unmanned vehicles, and (iii) traffic sign detection from vehicle-mounted cameras. First, we present a fall detection and activity classification system developed for embedded smart camera platform CITRIC. In our system, the camera platform is worn by the subject, as opposed to static sensors installed at fixed locations in certain rooms, and, therefore, monitoring is not limited to confined areas, and extends to wherever the subject may travel including indoors and outdoors. Next, we present a real-time smart phone-based fall detection system, wherein we implement camera and accelerometer based fall-detection on Samsung Galaxy S™ 4. We fuse these two sensor modalities to have a more robust fall detection system. Then, we introduce a fall detection algorithm with autonomous thresholding using relative-entropy within the class of Ali-Silvey distance measures. As another wearable camera application, we present a footstep counting algorithm using a smart phone camera. This algorithm provides more accurate step-count compared to using only accelerometer data in smart phones and smart watches at various body locations. As a second mobile camera scenario, we study autonomous indoor navigation of unmanned vehicles. A novel approach is proposed to autonomously detect and verify doorway openings by using the Google Project Tango™ platform. The third mobile camera scenario involves vehicle-mounted cameras. More specifically, we focus on traffic sign detection from lower-resolution and noisy videos captured from vehicle-mounted cameras. We present a new method for accurate traffic sign detection, incorporating Aggregate Channel Features and Chain Code Histograms, with the goal of providing much faster training and testing, and comparable or better performance, with respect to deep neural network approaches, without requiring specialized processors. Proposed computer vision algorithms provide promising results for various useful applications despite the limited energy and processing capabilities of mobile devices

    Perception and intelligent localization for autonomous driving

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaVisão por computador e fusão sensorial são temas relativamente recentes, no entanto largamente adoptados no desenvolvimento de robôs autónomos que exigem adaptabilidade ao seu ambiente envolvente. Esta dissertação foca-se numa abordagem a estes dois temas para alcançar percepção no contexto de condução autónoma. O uso de câmaras para atingir este fim é um processo bastante complexo. Ao contrário dos meios sensoriais clássicos que fornecem sempre o mesmo tipo de informação precisa e atingida de forma determinística, as sucessivas imagens adquiridas por uma câmara estão repletas da mais variada informação e toda esta ambígua e extremamente difícil de extrair. A utilização de câmaras como meio sensorial em robótica é o mais próximo que chegamos na semelhança com aquele que é o de maior importância no processo de percepção humana, o sistema de visão. Visão por computador é uma disciplina científica que engloba àreas como: processamento de sinal, inteligência artificial, matemática, teoria de controlo, neurobiologia e física. A plataforma de suporte ao estudo desenvolvido no âmbito desta dissertação é o ROTA (RObô Triciclo Autónomo) e todos os elementos que consistem o seu ambiente. No contexto deste, são descritas abordagens que foram introduzidas com fim de desenvolver soluções para todos os desafios que o robô enfrenta no seu ambiente: detecção de linhas de estrada e consequente percepção desta, detecção de obstáculos, semáforos, zona da passadeira e zona de obras. É também descrito um sistema de calibração e aplicação da remoção da perspectiva da imagem, desenvolvido de modo a mapear os elementos percepcionados em distâncias reais. Em consequência do sistema de percepção, é ainda abordado o desenvolvimento de auto-localização integrado numa arquitectura distribuída incluindo navegação com planeamento inteligente. Todo o trabalho desenvolvido no decurso da dissertação é essencialmente centrado no desenvolvimento de percepção robótica no contexto de condução autónoma.Computer vision and sensor fusion are subjects that are quite recent, however widely adopted in the development of autonomous robots that require adaptability to their surrounding environment. This thesis gives an approach on both in order to achieve perception in the scope of autonomous driving. The use of cameras to achieve this goal is a rather complex subject. Unlike the classic sensorial devices that provide the same type of information with precision and achieve this in a deterministic way, the successive images acquired by a camera are replete with the most varied information, that this ambiguous and extremely dificult to extract. The use of cameras for robotic sensing is the closest we got within the similarities with what is of most importance in the process of human perception, the vision system. Computer vision is a scientific discipline that encompasses areas such as signal processing, artificial intelligence, mathematics, control theory, neurobiology and physics. The support platform in which the study within this thesis was developed, includes ROTA (RObô Triciclo Autónomo) and all elements comprising its environment. In its context, are described approaches that introduced in the platform in order to develop solutions for all the challenges facing the robot in its environment: detection of lane markings and its consequent perception, obstacle detection, trafic lights, crosswalk and road maintenance area. It is also described a calibration system and implementation for the removal of the image perspective, developed in order to map the elements perceived in actual real world distances. As a result of the perception system development, it is also addressed self-localization integrated in a distributed architecture that allows navigation with long term planning. All the work developed in the course of this work is essentially focused on robotic perception in the context of autonomous driving

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world
    corecore