2,317 research outputs found

    A micropower centroiding vision processor

    Get PDF
    Published versio

    Realtime Color Stereovision Processing

    Get PDF
    Recent developments in aviation have made micro air vehicles (MAVs) a reality. These featherweight palm-sized radio-controlled flying saucers embody the future of air-to-ground combat. No one has ever successfully implemented an autonomous control system for MAVs. Because MAVs are physically small with limited energy supplies, video signals offer superiority over radar for navigational applications. This research takes a step forward in real time machine vision processing. It investigates techniques for implementing a real time stereovision processing system using two miniature color cameras. The effects of poor-quality optics are overcome by a robust algorithm, which operates in real time and achieves frame rates up to 10 fps in ideal conditions. The vision system implements innovative work in the following five areas of vision processing: fast image registration preprocessing, object detection, feature correspondence, distortion-compensated ranging, and multi scale nominal frequency-based object recognition. Results indicate that the system can provide adequate obstacle avoidance feedback for autonomous vehicle control. However, typical relative position errors are about 10%-to high for surveillance applications. The range of operation is also limited to between 6 - 30 m. The root of this limitation is imprecise feature correspondence: with perfect feature correspondence the range would extend to between 0.5 - 30 m. Stereo camera separation limits the near range, while optical resolution limits the far range. Image frame sizes are 160x120 pixels. Increasing this size will improve far range characteristics but will also decrease frame rate. Image preprocessing proved to be less appropriate than precision camera alignment in this application. A proof of concept for object recognition shows promise for applications with more precise object detection. Future recommendations are offered in all five areas of vision processing

    3D LiDAR Point Cloud Processing Algorithms

    Get PDF
    In the race for autonomous vehicles and advanced driver assistance systems (ADAS), the automotive industry has energetically pursued research in the area of sensor suites to achieve such technological feats. Commonly used autonomous and ADAS sensor suites include multiples of cameras, radio detection and ranging (RADAR), light detection and ranging (LiDAR), and ultrasonic sensors. Great interest has been generated in the use of LiDAR sensors and the value added in an automotive application. LiDAR sensors can be used to detect and track vehicles, pedestrians, cyclists, and surrounding objects. A LiDAR sensor operates by emitting light amplification by stimulated emission of radiation (LASER) beams and receiving the reflected LASER beam to acquire relevant distance information. LiDAR reflections are organized in a three-dimensional environment known as a point cloud. A major challenge in modern autonomous automotive research is to be able to process the dimensional environmental data in real time. The LiDAR sensor used in this research is the Velodyne HDL 32E, which provides nearly 700,000 data points per second. The large amount of data produced by a LiDAR sensor must be processed in a highly efficient way to be effective. This thesis provides an algorithm to process the LiDAR data from the sensors user datagram protocol (UDP) packet to output geometric shapes that can be further analyzed in a sensor suite or utilized for Bayesian tracking of objects. The algorithm can be divided into three stages: Stage One - UDP packet extraction; Stage Two - data clustering; and Stage Three - shape extraction. Stage One organizes the LiDAR data from a negative to a positive vertical angle during packet extraction so that subsequent steps can fully exploit the programming efficiencies. Stage Two utilizes an adaptive breakpoint detector (ABD) for clustering objects based on a Euclidean distance threshold in the point cloud. Stage Three classifies each cluster into a shape that is either a point, line, L-shape, or a polygon using principal component analysis and shape fitting algorithms that have been modified to take advantage of the pre-organized data from Stage One. The proposed algorithm was written in the C language and the runtime was tested on a two Windows equipped machines where the algorithm completed the processing, on average, sparing 30% of the time between UDP data packets sent from the HDL32E. In comparison to related research, this algorithm performed over seven hundred and thirty-seven times faster

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text regionÂżs identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions

    Computer vision libraries for trailer truck testbed using open source computer vision libraries

    Get PDF
    Computer Vision is a field that aims at understanding and analyzing images from the real world to produce numerical and symbolical data. It is a first step at duplicating the capabilities of human vision by electronically understanding the image and perceiving its features. This work aims at providing some of the features of a human eye to a trailer truck. These features include getting a 3D wireframe from continuous images and prediction of the next position of the objects in view, while the truck is moving. The thesis has been divided into 3 sections. First section is acquiring images in real time. Second section is the preprocessing of images to achieve edges and points in the image, and the third section is converting the edges and the points into 3D wireframe and predicting of the next position of the image. OpenCV library and Point cloud libraries are used in this process to facilitate operations on 2D and 3D images --Abstract, page iii

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    FPGA-based module for SURF extraction

    Get PDF
    We present a complete hardware and software solution of an FPGA-based computer vision embedded module capable of carrying out SURF image features extraction algorithm. Aside from image analysis, the module embeds a Linux distribution that allows to run programs specifically tailored for particular applications. The module is based on a Virtex-5 FXT FPGA which features powerful configurable logic and an embedded PowerPC processor. We describe the module hardware as well as the custom FPGA image processing cores that implement the algorithm's most computationally expensive process, the interest point detection. The module's overall performance is evaluated and compared to CPU and GPU based solutions. Results show that the embedded module achieves comparable disctinctiveness to the SURF software implementation running in a standard CPU while being faster and consuming significantly less power and space. Thus, it allows to use the SURF algorithm in applications with power and spatial constraints, such as autonomous navigation of small mobile robots
    • …
    corecore