834 research outputs found

    Fast and Accurate Algorithm for Eye Localization for Gaze Tracking in Low Resolution Images

    Full text link
    Iris centre localization in low-resolution visible images is a challenging problem in computer vision community due to noise, shadows, occlusions, pose variations, eye blinks, etc. This paper proposes an efficient method for determining iris centre in low-resolution images in the visible spectrum. Even low-cost consumer-grade webcams can be used for gaze tracking without any additional hardware. A two-stage algorithm is proposed for iris centre localization. The proposed method uses geometrical characteristics of the eye. In the first stage, a fast convolution based approach is used for obtaining the coarse location of iris centre (IC). The IC location is further refined in the second stage using boundary tracing and ellipse fitting. The algorithm has been evaluated in public databases like BioID, Gi4E and is found to outperform the state of the art methods.Comment: 12 pages, 10 figures, IET Computer Vision, 201

    3D head motion, point-of-regard and encoded gaze fixations in real scenes: next-generation portable video-based monocular eye tracking

    Get PDF
    Portable eye trackers allow us to see where a subject is looking when performing a natural task with free head and body movements. These eye trackers include headgear containing a camera directed at one of the subject\u27s eyes (the eye camera) and another camera (the scene camera) positioned above the same eye directed along the subject\u27s line-of-sight. The output video includes the scene video with a crosshair depicting where the subject is looking -- the point-of-regard (POR) -- that is updated for each frame. This video may be the desired final result or it may be further analyzed to obtain more specific information about the subject\u27s visual strategies. A list of the calculated POR positions in the scene video can also be analyzed. The goals of this project are to expand the information that we can obtain from a portable video-based monocular eye tracker and to minimize the amount of user interaction required to obtain and analyze this information. This work includes offline processing of both the eye and scene videos to obtain robust 2D PORs in scene video frames, identify gaze fixations from these PORs, obtain 3D head motion and ray trace fixations through volumes-of-interest (VOIs) to determine what is being fixated, when and where (3D POR). To avoid the redundancy of ray tracing a 2D POR in every video frame and to group these POR data meaningfully, a fixation-identification algorithm is employed to simplify the long list of 2D POR data into gaze fixations. In order to ray trace these fixations, the 3D motion -- position and orientation over time -- of the scene camera is computed. This camera motion is determined via an iterative structure and motion recovery algorithm that requires a calibrated camera and knowledge of the 3D location of at least four points in the scene (that can be selected from premeasured VOI vertices). The subjects 3D head motion is obtained directly from this camera motion. For the final stage of the algorithm, the 3D locations and dimensions of VOIs in the scene are required. This VOI information in world coordinates is converted to camera coordinates for ray tracing. A representative 2D POR position for each fixation is converted from image coordinates to the same camera coordinate system. Then, a ray is traced from the camera center through this position to determine which (if any) VOI is being fixated and where it is being fixated -- the 3D POR in the world. Results are presented for various real scenes. Novel visualizations of portable eye tracker data created using the results of our algorithm are also presented

    Event-Based Visual-Inertial Odometry Using Smart Features

    Get PDF
    Event-based cameras are a novel type of visual sensor that operate under a unique paradigm, providing asynchronous data on the log-level changes in light intensity for individual pixels. This hardware-level approach to change detection allows these cameras to achieve ultra-wide dynamic range and high temporal resolution. Furthermore, the advent of convolutional neural networks (CNNs) has led to state-of-the-art navigation solutions that now rival or even surpass human engineered algorithms. The advantages offered by event cameras and CNNs make them excellent tools for visual odometry (VO). This document presents the implementation of a CNN trained to detect and describe features within an image as well as the implementation of an event-based visual-inertial odometry (EVIO) pipeline, which estimates a vehicle\u27s 6-degrees-offreedom (DOF) pose using an affixed event-based camera with an integrated inertial measurement unit (IMU). The front-end of this pipeline utilizes a neural network for generating image frames from asynchronous event camera data. These frames are fed into a multi-state constraint Kalman filter (MSCKF) back-end that uses the output of the developed CNN to perform measurement updates. The EVIO pipeline was tested on a selection from the Event-Camera Dataset [1], and on a dataset collected from a fixed-wing unmanned aerial vehicle (UAV) flight test conducted by the Autonomy and Navigation Technology (ANT) Center

    Superfast three-dimensional (3D) shape measurement with binary defocusing techniques and its applications

    Get PDF
    High-speed and high-accuracy three-dimensional (3D) shape measurement has enormous potential to benefit numerous areas including advanced manufacturing, medical imaging, and diverse scientific research fields. For example, capturing the rapidly pulsing wings of a flying insect could enhance our understanding of flight and lead to better and safer aircraft designs. Even though there are numerous 3D shape measurement techniques in the literature, it remains extremely difficult to accurately capture rapidly changing events. Due to the potential for achieving high speed and high measurement accuracy, the digital fringe projection (DFP) techniques have been exhaustively studied and extensively applied to numerous disciplines. Real-time (30 Hz or better) 3D shape measurement techniques have been developed with DFP methods, yet the upper speed limit is typically 120 Hz, the refresh rate of a typical digital video projector. 120 Hz speed can accurately measure the slowly changing objects, such as human facial expressions, but it is far from sufficient to capture high-speed motions (e.g., live, beating hearts or flying insects). To overcome this speed limitation, the binary defocusing technique was recently proposed. Instead of using 8-bit sinusoidal patterns, the binary defocusing technique generates sinusoidal patterns by properly defocusing squared 1-bit binary patterns. Using this technique, kilo-Hertz (kHz) 3D shape measurement rate has been achieved. However, the binary defocusing technique suffers three major limitations: 1) low phase quality due to the influence of high-frequency harmonics; 2) smaller depth measurement range; and 3) low measurement accuracy due to the difficulty of applying existing calibration methods to the system with an out-of-focus projector. The goal of this dissertation research is to achieve superfast 3D shape measurement by overcoming the major limitations of the binary defocusing technique. Once a superfast 3D shape measurement platform is developed, numerous applications could be benefited. To this end, this dissertation research look into verifying its value by applying to the biomedical engineering field. Specifically, this dissertation research has made major contributions by conquering some major challenges associated with the binary defocusing technique. The first challenge this dissertation addresses is associated with the limited depth range and low phase quality of the binary defocusing method. The binary defocusing technique essentially generates quasi-sinusoidal fringe patterns by suppressing high-frequency harmonics through lens defocusing. However, the optical engines of the majority of digital video projectors are designed and optimized for applications with large depth of focus; for this reason, good quality sinusoids can only be generated by this technique within a very small depth region. This problem is exacerbated if the fringe stripes are wide. In that case, the high-frequency harmonics cannot be properly suppressed through defocusing, making it almost impossible to generate reasonable quality sinusoids. To alleviate this problem associated with high-frequency harmonics, an optimal pulse width modulation (OPWM) method, developed in power electronics, is proposed to improve the fringe pattern quality. Instead of projecting squared binary structures, the patterns are optimized, in one dimension perpendicular to the fringe stripes, by selectively eliminating the undesired harmonics which affect the phase quality the most. Both simulation and experimental data demonstrate that the OPWM method can substantially improve the squared binary defocusing technique when the fringe periods are between 30-300 pixels. With this technique, a multi-frequency phase-shifting algorithm is realized that enables the development of a 556-Hz 3D shape measurement system capable of capturing multiple rapidly moving objects. The OPWM technique is proved successful when the fringe stripe widths are within a certain range, yet it fails to achieve higher-quality fringe patterns when the desired fringe period goes beyond the optimal range. To further improve the binary defocusing technique, binary dithering techniques are proposed. Unlike the OPWM method, the dithering technique optimizes the patterns in both x and y dimensions, and thus can achieve higher-quality fringe patterns. This research demonstrates the superiority of this technique over all aforementioned binary defocusing techniques for high-quality 3D shape measurement even when the projector is nearly focused and the fringe stripes are wide. The second challenge this dissertation addresses is accurately calibrating the DFP system with an out-of-focus projector. The binary defocusing technique generates quasi-sinusoidal patterns through defocusing, and thus the projector cannot be perfectly in focus. In the meantime, state-of-the-art DFP system calibration assumes that the projector is always in focus. To address this problem, a novel calibration method is proposed that directly relates depth z with the phase pixel by pixel without the requirement of projector calibration. By this means, very high accuracy depth measurement is achieved: for a depth measurement range of 100 mm, the root-mean-squared (rms) error is approximately 70 &mu m. The third challenge this dissertation addresses is associated with the hardware limitation for the superfast 3D shape measurement technique. The high refresh rate of the digital micro-mirror device (DMD) has enabled superfast 3D shape measurement, yet a hardware limitation has been found once the speeds go beyond a certain range. This is because the DMD cannot completely turn on/off between frames, leading to coupling problems associated with the transient response of the DMD chip. The coupling effect causes substantial measurement error during high-speed measurement. Fortunately, since this type of error is systematic, this research finds that such error can be reduced to a negligible level by properly controlling the timing of the projector and the camera. The superfast 3D shape measurement platform developed in this research could benefit numerous applications. This research applies the developed platform to the measurement of the cardiac motion of live, beating rabbit hearts. The 3D geometric motion of the live, beating rabbit hearts can be successfully captured if the measurement speed is sufficiently fast (i.e. 200 Hz or higher for normal beating rabbit hearts). This research also finds that, due to the optical properties of live tissue, caution should be given in selecting the spectrum of light in order to properly measure the heart surface. In summary, the improved binary defocusing techniques are overwhelmingly advantageous compared to the conventional sinusoidal projection method or the squared binary defocusing technique. We believe that the superfast 3D shape measurement platform we have developed has the potential to broadly impact many more academic studies and industrial practices, especially those where understanding the high-speed 3D phenomena is critical

    Learning and Searching Methods for Robust, Real-Time Visual Odometry.

    Full text link
    Accurate position estimation provides a critical foundation for mobile robot perception and control. While well-studied, it remains difficult to provide timely, precise, and robust position estimates for applications that operate in uncontrolled environments, such as robotic exploration and autonomous driving. Continuous, high-rate egomotion estimation is possible using cameras and Visual Odometry (VO), which tracks the movement of sparse scene content known as image keypoints or features. However, high update rates, often 30~Hz or greater, leave little computation time per frame, while variability in scene content stresses robustness. Due to these challenges, implementing an accurate and robust visual odometry system remains difficult. This thesis investigates fundamental improvements throughout all stages of a visual odometry system, and has three primary contributions: The first contribution is a machine learning method for feature detector design. This method considers end-to-end motion estimation accuracy during learning. Consequently, accuracy and robustness are improved across multiple challenging datasets in comparison to state of the art alternatives. The second contribution is a proposed feature descriptor, TailoredBRIEF, that builds upon recent advances in the field in fast, low-memory descriptor extraction and matching. TailoredBRIEF is an in-situ descriptor learning method that improves feature matching accuracy by efficiently customizing descriptor structures on a per-feature basis. Further, a common asymmetry in vision system design between reference and query images is described and exploited, enabling approaches that would otherwise exceed runtime constraints. The final contribution is a new algorithm for visual motion estimation: Perspective Alignment Search~(PAS). Many vision systems depend on the unique appearance of features during matching, despite a large quantity of non-unique features in otherwise barren environments. A search-based method, PAS, is proposed to employ features that lack unique appearance through descriptorless matching. This method simplifies visual odometry pipelines, defining one method that subsumes feature matching, outlier rejection, and motion estimation. Throughout this work, evaluations of the proposed methods and systems are carried out on ground-truth datasets, often generated with custom experimental platforms in challenging environments. Particular focus is placed on preserving runtimes compatible with real-time operation, as is necessary for deployment in the field.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113365/1/chardson_1.pd

    Modeling Reaction and Transport Effects in Stereolithographic 3D Printing

    Full text link
    Continuous stereolithography has recently emerged as a leading technology in additive manufacturing (3D printing). Though several methods for continuous printing have been reported, they all share the benefit of reducing forces on the growing part and eliminating adhesion to the resin bath due to the introduction of the dead zone, a region where polymerization does not occur. The recently developed dual-wavelength approach, in which photoinitiation and photoinhibition of polymerization are controlled via different wavelengths of light, has achieved unprecedented vertical print speeds via expansion of the dead zone. We address several limitations in dual-wavelength continuous printing (and some within continuous stereolithography more broadly) via theoretical and computational modeling and the use of spatially varying exposure patterns. First, we address the problem of cure-through, undesired curing along the axis of exposure, which is more significant in continuous stereolithography than in traditional layer-by-layer stereolithography. Recognizing that the use of highly absorbing resins to improve layer resolution inherently limits achievable print speeds, we developed a method to improve part fidelity in low- to moderate-absorbance resins through modification of the images projected during printing. We derive a mathematical model to describe dose accumulation during continuous printing, describe the resulting grayscale-based correction method, and experimentally verify correction performance. Using optimized parameters with a high absorbance height resin (2000 um), feature height errors are reduced by over 85% in a test model while maintaining a high print speed (750 mm/h). Recognizing the limitations of this model, we developed a kinetics-based curing model for dual-wavelength photoinitiation/photoinhibition under variable intensities. The model is verified via experimental characterization of two custom resins using cured height and dead zone height experiments. For the two custom resins characterized, the model achieves R2 values of 0.985 and 0.958 for fitting uninhibited cure height data and values of 0.902 and 0.980 for fitting photoinhibited dead zone height data. The model is also applicable to resins in standard layer-by-layer stereolithography, and for commercial resin cure height data, our model performs similarly to the standard Jacobs model, with all R2 values above 0.98. Finally, we introduce the complexities of resin flow during continuous printing. The kinetic curing model is used in a computational fluid dynamics model to analyze dead zone uniformity, which we find is greatly affected by exposure intensity ratio, while print speed and part radius have minor effects. We find that relatively small variations in the intensity ratio (25%) can have large effects, going from good printing conditions to print failure (curing to the window) or to significant nonuniformity (maximum dead zone height over three times the minimum). We optimize exposure conditions to maximize dead zone uniformity, finding that the ability to pattern light sources is critical in generating uniform dead zones: for a 10 mm radius cylinder, over 90% of the dead zone is near the optimized value when using patterned intensity functions, compared with only 18% when using constant intensity values. In printing experiments, we find that an optimized intensity function can, without modification, successfully produce difficult-to-print parts. Taken as a whole, the work advances our understanding of the dual-wavelength approach in continuous stereolithography, improves printing performance, and motivates future research into the wide range of physical phenomena affecting the system.PHDChemical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163239/1/zdpritch_1.pd

    Investigation of Computer Vision Concepts and Methods for Structural Health Monitoring and Identification Applications

    Get PDF
    This study presents a comprehensive investigation of methods and technologies for developing a computer vision-based framework for Structural Health Monitoring (SHM) and Structural Identification (St-Id) for civil infrastructure systems, with particular emphasis on various types of bridges. SHM is implemented on various structures over the last two decades, yet, there are some issues such as considerable cost, field implementation time and excessive labor needs for the instrumentation of sensors, cable wiring work and possible interruptions during implementation. These issues make it only viable when major investments for SHM are warranted for decision making. For other cases, there needs to be a practical and effective solution, which computer-vision based framework can be a viable alternative. Computer vision based SHM has been explored over the last decade. Unlike most of the vision-based structural identification studies and practices, which focus either on structural input (vehicle location) estimation or on structural output (structural displacement and strain responses) estimation, the proposed framework combines the vision-based structural input and the structural output from non-contact sensors to overcome the limitations given above. First, this study develops a series of computer vision-based displacement measurement methods for structural response (structural output) monitoring which can be applied to different infrastructures such as grandstands, stadiums, towers, footbridges, small/medium span concrete bridges, railway bridges, and long span bridges, and under different loading cases such as human crowd, pedestrians, wind, vehicle, etc. Structural behavior, modal properties, load carrying capacities, structural serviceability and performance are investigated using vision-based methods and validated by comparing with conventional SHM approaches. In this study, some of the most famous landmark structures such as long span bridges are utilized as case studies. This study also investigated the serviceability status of structures by using computer vision-based methods. Subsequently, issues and considerations for computer vision-based measurement in field application are discussed and recommendations are provided for better results. This study also proposes a robust vision-based method for displacement measurement using spatio-temporal context learning and Taylor approximation to overcome the difficulties of vision-based monitoring under adverse environmental factors such as fog and illumination change. In addition, it is shown that the external load distribution on structures (structural input) can be estimated by using visual tracking, and afterward load rating of a bridge can be determined by using the load distribution factors extracted from computer vision-based methods. By combining the structural input and output results, the unit influence line (UIL) of structures are extracted during daily traffic just using cameras from which the external loads can be estimated by using just cameras and extracted UIL. Finally, the condition assessment at global structural level can be achieved using the structural input and output, both obtained from computer vision approaches, would give a normalized response irrespective of the type and/or load configurations of the vehicles or human loads
    corecore