1,961 research outputs found

    Human-Centric Machine Vision

    Get PDF
    Recently, the algorithms for the processing of the visual information have greatly evolved, providing efficient and effective solutions to cope with the variability and the complexity of real-world environments. These achievements yield to the development of Machine Vision systems that overcome the typical industrial applications, where the environments are controlled and the tasks are very specific, towards the use of innovative solutions to face with everyday needs of people. The Human-Centric Machine Vision can help to solve the problems raised by the needs of our society, e.g. security and safety, health care, medical imaging, and human machine interface. In such applications it is necessary to handle changing, unpredictable and complex situations, and to take care of the presence of humans

    Proceedings of the 2011 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    This book is a collection of 15 reviewed technical reports summarizing the presentations at the 2011 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory. The covered topics include image processing, optical signal processing, visual inspection, pattern recognition and classification, human-machine interaction, world and situation modeling, autonomous system localization and mapping, information fusion, and trust propagation in sensor networks

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation

    Comparative Study of Model-Based and Learning-Based Disparity Map Fusion Methods

    Get PDF
    Creating an accurate depth map has several, valuable applications including augmented/virtual reality, autonomous navigation, indoor/outdoor mapping, object segmentation, and aerial topography. Current hardware solutions for precise 3D scanning are relatively expensive. To combat hardware costs, software alternatives based on stereoscopic images have previously been proposed. However, software solutions are less accurate than hardware solutions, such as laser scanning, and are subject to a variety of irregularities. Notably, disparity maps generated from stereo images typically fall short in cases of occlusion, near object boundaries, and on repetitive texture regions or texture-less regions. Several post-processing methods are examined in an effort to combine strong algorithm results and alleviate erroneous disparity regions. These methods include basic statistical combinations, histogram-based voting, edge detection guidance, support vector machines (SVMs), and bagged trees. Individual errors and average errors are compared between the newly introduced fusion methods and the existing disparity algorithms. Several acceptable solutions are identified to bridge the gap between 3D scanning and stereo imaging. It is shown that fusing disparity maps can result in lower error rates than individual algorithms across the dataset while maintaining a high level of robustness

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Robust 3D Object Pose Estimation and Tracking from Monocular Images in Industrial Environments

    Get PDF
    Recent advances in Computer Vision are changing our way of living and enabling new applications for both leisure and professional use. Regrettably, in many industrial domains the spread of state-of-the-art technologies is made challenging by the abundance of nuisances that corrupt existing techniques beyond the required dependability. This is especially true for object localization and tracking, that is, the problem of detecting the presence of objects on images and videos and estimating their pose. This is a critical task for applications such as Augmented Reality (AR), robotic autonomous navigation, robotic object grasping, or production quality control; unfortunately, the reliability of existing techniques is harmed by visual features such as the abundance of specular and poorly textured objects, cluttered scenes, or artificial and in-homogeneous lighting. In this thesis, we propose two methods for robustly estimating the pose of a rigid object under the challenging conditions typical of industrial environments. Both methods rely on monocular images to handle metallic environments, on which depth cameras would fail; both are conceived with a limited computational and memory footprint, so that they are suitable for real-time applications such as AR. We test our methods on datasets issued from real user case scenarios, exhibiting challenging conditions. The first method is based on a global image alignment framework and a robust dense descriptor. Its global approach makes it robust in presence of local artifacts such as specularities appearing on metallic objects, ambiguous patterns like screws or wires, and poorly textured objects. Employing a global approach avoids the need of reliably detecting and matching local features across images, that become ill-conditioned tasks in the considered environments; on the other hand, current methods based on dense image alignment usually rely on luminous intensities for comparing the pixels, which is not robust in presence of challenging illumination artifacts. We show how the use of a dense descriptor computed as a non-linear function of luminous intensities, that we refer to as ``Descriptor Fields'', greatly enhances performances at a minimal computational overhead. Their low computational complexity and their ease of implementation make Descriptor Fields suitable for replacing intensities in a wide number of state-of-the-art techniques based on dense image alignment. Relying on a global approach is appropriate for overcoming local artifacts, but it can be un-effective when the target object undergoes extreme occlusions in cluttered environments. For this reason, we propose a second approach based on the detection of discriminative object parts. At the core of our approach is a novel representation for the 3D pose of the parts, that allows us to predict the 3D pose of the object even when only a single part is visible; when several parts are visible, we can easily combine them to compute a better pose of the object. The 3D pose we obtain is usually very accurate, even when only few parts are visible. We show how to use this representation in a robust 3D tracking framework. In addition to extensive comparisons with the state-of-the-art, we demonstrate our method on a practical Augmented Reality application for maintenance assistance in the ATLAS particle detector at CERN

    An end-to-end review of gaze estimation and its interactive applications on handheld mobile devices

    Get PDF
    In recent years we have witnessed an increasing number of interactive systems on handheld mobile devices which utilise gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, higher resolution and capacity of their cameras, and improved gaze estimation accuracy obtained from advanced machine learning techniques, especially in deep learning. As the literature is fast progressing, there is a pressing need to review the state of the art, delineate the boundary, and identify the key research challenges and opportunities in gaze estimation and interaction. This paper aims to serve this purpose by presenting an end-to-end holistic view in this area, from gaze capturing sensors, to gaze estimation workflows, to deep learning techniques, and to gaze interactive applications.PostprintPeer reviewe
    • …