2,139 research outputs found

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Embedded Vision Systems: A Review of the Literature

    Get PDF
    Over the past two decades, the use of low power Field Programmable Gate Arrays (FPGA) for the acceleration of various vision systems mainly on embedded devices have become widespread. The reconfigurable and parallel nature of the FPGA opens up new opportunities to speed-up computationally intensive vision and neural algorithms on embedded and portable devices. This paper presents a comprehensive review of embedded vision algorithms and applications over the past decade. The review will discuss vision based systems and approaches, and how they have been implemented on embedded devices. Topics covered include image acquisition, preprocessing, object detection and tracking, recognition as well as high-level classification. This is followed by an outline of the advantages and disadvantages of the various embedded implementations. Finally, an overview of the challenges in the field and future research trends are presented. This review is expected to serve as a tutorial and reference source for embedded computer vision systems

    Pengecaman peristiwa jatuh secara tiba-tiba menggunakan fitur gerakan dan pengelas ilhaman biologi sistem penglihatan

    Get PDF
    Kajian tentang pengecaman peristiwa yang berlaku secara tiba-tiba untuk sistem video pengawasan dikenal pasti boleh menyumbang ke arah pengurangan kos pembangunan teknologi sistem peranti pengesan bolehpakai dan juga ketidakselesaan pemakainya. Adalah dijangkakan, populasi penduduk dunia akan bertambah pada masa akan datang ekoran peningkatan jangka hayat manusia yang menyebabkan peningkatan bilangan penduduk dunia berumur 60 tahun ke atas. Oleh itu, sistem penjagaan keselamatan penghuni dalam rumah tak invasif yang boleh berfungsi untuk mengawas dan mengesan sebarang kejadian kemalangan yang tidak diingini seperti rebah, pengsan dan lain-lain akan menjadi penting dan berguna untuk warga tua khususnya untuk mereka yang tinggal bersendirian. Perkembangan dalam sistem pengecaman peristiwa yang berlaku secara tiba-tiba dijangkakan dapat menyediakan kemudahan kepada warga tua yang tinggal bersendirian di samping berupaya menjaga keselamatan mereka di rumah. Ini akan dapat mengurangkan kos perbelanjaan di pusat jagaan warga tua. Justeru, objektif utama kajian adalah untuk membangunkan satu kaedah mengesan gerakan dan mengecam peristiwa yang berlaku secara tiba-tiba dan memerlukan tindakan serta perhatian segera. Perlaksanaan pembangunan kaedah pengecaman kejadian melibatkan tiga langkah penting iaitu, pemprosesan awal, penyarian fitur dan pengelasan. Pemprosesan awal menggunakan teknik penolakan latar belakang (PLB) dan teknik pelicinan, (penuras kebarangkalian ruang, SPF dan sokongan data kejiranan, NDS) untuk mengurangkan hingar imej bebayang objek. Sifat gerakan telah dikenalpasti sebagai salah satu sifat yang penting dan relevan bagi mengesan perubahan mendadak pada orientasi, arah dan penampilan objek dalam sesebuah jujukan video. Terdapat tiga kaedah sarian fitur gerakan yang berasaskan ruang-masa iaitu templat, aliran vektor gerakan (AVG) dan ilhaman biologi sistem penglihatan manusia telah dilaksanakan. Seterusnya, keberkesanan fitur gerakan diuji dengan menggunakan tiga pengelas sedia ada iaitu k-kejiranan terdekat (k-NN), mesin vektor sokongan (SVM) dan rangkaian neural inspirasi biologi suap hadapan (BFFNN-P). Potensi pengelas BFFNN-P untuk mengelas peristiwa jatuh berbanding dengan aktiviti harian yang lain ditingkatkan melalui kaedah kawalan ralat berkadar (P), kamiran (I) dan terbitan (D). Hasil kajian yang diperolehi menunjukkan teknik SPF telah memberikan keputusan yang baik dalam mengurangkan hingar dan melicinkan imej bebayang objek. Fitur gerakan GaussH yang berasaskan inspirasi sistem penglihatan manusia telah memberikan keputusan yang lebih baik berbanding templat dan AVG dengan menggunakan pengelas BFFNN-PD. Prestasi kejituan, kepekaan dan kepekaan bagi fitur gerakan GaussH dengan pengelas BFFNN-PD adalah 98.6%, 98.2% dan 99.5%. Kesimpulannya, penyelidikan ini telah berjaya menghasilkan kaedah pengelasan melalui pendekatan inspirasi biologi yang mampu mengesan peristiwa yang berlaku secara tiba-tiba

    Using clustering techniques for intelligent camera-based user interfaces

    Get PDF
    The area of Human-Machine Interface is growing fast due to its high importance in all technological systems. The basic idea behind designing human-machine interfaces is to enrich the communication with the technology in a natural and easy way. Gesture interfaces are a good example of transparent interfaces. Such interfaces must identify properly the action the user wants to perform, so the proper gesture recognition is of the highest importance. However, most of the systems based on gesture recognition use complex methods requiring high-resource devices. In this work, we propose to model gestures capturing their temporal properties, which significantly reduce storage requirements, and use clustering techniques, namely self-organizing maps and unsupervised genetic algorithm, for their classification. We further propose to train a certain number of algorithms with different parameters and combine their decision using majority voting in order to decrease the false positive rate. The main advantage of the approach is its simplicity, which enables the implementation using devices with limited resources, and therefore low cost. The testing results demonstrate its high potential

    Object detection and localization: an application inspired by RobotAtFactory using machine learning

    Get PDF
    Mestrado de dupla diplomação com a UTFPR - Universidade Tecnológica Federal do ParanáThe evolution of artificial intelligence and digital cameras has made the transformation of the real world into its digital image version more accessible and widely used. In this way, the analysis of information can be carried out with the use of algorithms. The detection and localization of objects is a crucial task in several applications, such as surveillance, autonomous robotics, intelligent transportation systems, and others. Based on this, this work aims to implement a system that can find objects and estimate their location (distance and angle), through the acquisition and analysis of images. Having as motivation the possible problems that can be introduced in the robotics competition, RobotAtFactory Lite, in future versions. As an example, the obstruction of the path developed through the printed lines, requiring the robot to deviate, and/or the positioning of the boxes in different places of the initial warehouses, being positioned so that the robot does not know its previous location, having to find it somehow. For this, different methods were analyzed, based on machine leraning, for object detection using feature extraction and neural networks, as well as object localization, based on the Pinhole model and triangulation. By compiling these techniques through python programming in the module, based on a Raspberry Pi Model B and a Raspi Cam Rev 1.3, the goal of the work is achieved. Thus, it was possible to find the objects and obtain an estimate of their relative position. In the future, in a possible implementation together with a robot, this data can be used to find objects and perform tasks.A evolução da inteligência artificial e das câmeras digitais, tornou mais acessível e amplamente utilizada a transformação do mundo real, para sua versão em imagem digital. Dessa maneira, a análise das informações pode ser efetuada com a utilização de algoritmos. A deteção e localização de objetos é uma tarefa crucial em diversas aplicações, tais como vigilância, robótica autônoma, sistemas de transporte inteligente, entre outras. Baseado nisso, este trabalho tem como objetivo implementar um sistema que consiga encontrar objetos e estimar sua localização (distância e ângulo), através da aquisição e análise de imagens. Tendo como motivação os possíveis problemas que possam ser introduzidos na competição de robótica, Robot@Factory Lite, em versões futuras. Podendo ser citados como exemplo a obstrução do caminho desenvolvido através das linhas impressas, requerendo que o robô desvie, e/ou o posicionamento das caixas em locais diferentes dos armazéns iniciais, sendo posicionadas de modo que o robô não saiba sua localização prévia, devendo encontra-las de alguma maneira. Para isso, foram analisados diferentes métodos, baseadas em machine leraning, para deteção de objetos utilizando extração de características e redes neurais, bem como a localização de objetos, baseada no modelo de Pinhole e triangulação. Compilando essas técnicas através da programação em python, no módulo, baseado em um Raspberry Pi Model B e um Raspi Cam Rev 1.3, o objetivo do trabalho é alcançado. Assim, foi possível encontrar os objetos e obter uma estimativa da sua posição relativa. Futuramente, em uma possível implementação junta a um robô, esses dados podem ser utilizados para encontrar objetos e executar tarefas

    Event-Driven Technologies for Reactive Motion Planning: Neuromorphic Stereo Vision and Robot Path Planning and Their Application on Parallel Hardware

    Get PDF
    Die Robotik wird immer mehr zu einem Schlüsselfaktor des technischen Aufschwungs. Trotz beeindruckender Fortschritte in den letzten Jahrzehnten, übertreffen Gehirne von Säugetieren in den Bereichen Sehen und Bewegungsplanung noch immer selbst die leistungsfähigsten Maschinen. Industrieroboter sind sehr schnell und präzise, aber ihre Planungsalgorithmen sind in hochdynamischen Umgebungen, wie sie für die Mensch-Roboter-Kollaboration (MRK) erforderlich sind, nicht leistungsfähig genug. Ohne schnelle und adaptive Bewegungsplanung kann sichere MRK nicht garantiert werden. Neuromorphe Technologien, einschließlich visueller Sensoren und Hardware-Chips, arbeiten asynchron und verarbeiten so raum-zeitliche Informationen sehr effizient. Insbesondere ereignisbasierte visuelle Sensoren sind konventionellen, synchronen Kameras bei vielen Anwendungen bereits überlegen. Daher haben ereignisbasierte Methoden ein großes Potenzial, schnellere und energieeffizientere Algorithmen zur Bewegungssteuerung in der MRK zu ermöglichen. In dieser Arbeit wird ein Ansatz zur flexiblen reaktiven Bewegungssteuerung eines Roboterarms vorgestellt. Dabei wird die Exterozeption durch ereignisbasiertes Stereosehen erreicht und die Pfadplanung ist in einer neuronalen Repräsentation des Konfigurationsraums implementiert. Die Multiview-3D-Rekonstruktion wird durch eine qualitative Analyse in Simulation evaluiert und auf ein Stereo-System ereignisbasierter Kameras übertragen. Zur Evaluierung der reaktiven kollisionsfreien Online-Planung wird ein Demonstrator mit einem industriellen Roboter genutzt. Dieser wird auch für eine vergleichende Studie zu sample-basierten Planern verwendet. Ergänzt wird dies durch einen Benchmark von parallelen Hardwarelösungen wozu als Testszenario Bahnplanung in der Robotik gewählt wurde. Die Ergebnisse zeigen, dass die vorgeschlagenen neuronalen Lösungen einen effektiven Weg zur Realisierung einer Robotersteuerung für dynamische Szenarien darstellen. Diese Arbeit schafft eine Grundlage für neuronale Lösungen bei adaptiven Fertigungsprozesse, auch in Zusammenarbeit mit dem Menschen, ohne Einbußen bei Geschwindigkeit und Sicherheit. Damit ebnet sie den Weg für die Integration von dem Gehirn nachempfundener Hardware und Algorithmen in die Industrierobotik und MRK

    Combined Learned and Classical Methods for Real-Time Visual Perception in Autonomous Driving

    Full text link
    Autonomy, robotics, and Artificial Intelligence (AI) are among the main defining themes of next-generation societies. Of the most important applications of said technologies is driving automation which spans from different Advanced Driver Assistance Systems (ADAS) to full self-driving vehicles. Driving automation is promising to reduce accidents, increase safety, and increase access to mobility for more people such as the elderly and the handicapped. However, one of the main challenges facing autonomous vehicles is robust perception which can enable safe interaction and decision making. With so many sensors to perceive the environment, each with its own capabilities and limitations, vision is by far one of the main sensing modalities. Cameras are cheap and can provide rich information of the observed scene. Therefore, this dissertation develops a set of visual perception algorithms with a focus on autonomous driving as the target application area. This dissertation starts by addressing the problem of real-time motion estimation of an agent using only the visual input from a camera attached to it, a problem known as visual odometry. The visual odometry algorithm can achieve low drift rates over long-traveled distances. This is made possible through the innovative local mapping approach used. This visual odometry algorithm was then combined with my multi-object detection and tracking system. The tracking system operates in a tracking-by-detection paradigm where an object detector based on convolution neural networks (CNNs) is used. Therefore, the combined system can detect and track other traffic participants both in image domain and in 3D world frame while simultaneously estimating vehicle motion. This is a necessary requirement for obstacle avoidance and safe navigation. Finally, the operational range of traditional monocular cameras was expanded with the capability to infer depth and thus replace stereo and RGB-D cameras. This is accomplished through a single-stream convolution neural network which can output both depth prediction and semantic segmentation. Semantic segmentation is the process of classifying each pixel in an image and is an important step toward scene understanding. Literature survey, algorithms descriptions, and comprehensive evaluations on real-world datasets are presented.Ph.D.College of Engineering & Computer ScienceUniversity of Michiganhttps://deepblue.lib.umich.edu/bitstream/2027.42/153989/1/Mohamed Aladem Final Dissertation.pdfDescription of Mohamed Aladem Final Dissertation.pdf : Dissertatio

    Spiking NeRF: Making Bio-inspired Neural Networks See through the Real World

    Full text link
    Spiking neuron networks (SNNs) have been thriving on numerous tasks to leverage their promising energy efficiency and exploit their potentialities as biologically plausible intelligence. Meanwhile, the Neural Radiance Fields (NeRF) render high-quality 3D scenes with massive energy consumption, and few works delve into the energy-saving solution with a bio-inspired approach. In this paper, we propose spiking NeRF (SpikingNeRF), which aligns the radiance ray with the temporal dimension of SNN, to naturally accommodate the SNN to the reconstruction of Radiance Fields. Thus, the computation turns into a spike-based, multiplication-free manner, reducing the energy consumption. In SpikingNeRF, each sampled point on the ray is matched onto a particular time step, and represented in a hybrid manner where the voxel grids are maintained as well. Based on the voxel grids, sampled points are determined whether to be masked for better training and inference. However, this operation also incurs irregular temporal length. We propose the temporal condensing-and-padding (TCP) strategy to tackle the masked samples to maintain regular temporal length, i.e., regular tensors, for hardware-friendly computation. Extensive experiments on a variety of datasets demonstrate that our method reduces the 76.74%76.74\% energy consumption on average and obtains comparable synthesis quality with the ANN baseline
    corecore