1,408 research outputs found
3D Sensor Placement and Embedded Processing for People Detection in an Industrial Environment
Papers I, II and III are extracted from the dissertation and uploaded as separate documents to meet post-publication requirements for self-arciving of IEEE conference papers.At a time when autonomy is being introduced in more and more areas, computer vision plays a very important role. In an industrial environment, the ability to create a real-time virtual version of a volume of interest provides a broad range of possibilities, including safety-related systems such as vision based anti-collision and personnel tracking. In an offshore environment, where such systems are not common, the task is challenging due to rough weather and environmental conditions, but the result of introducing such safety systems could potentially be lifesaving, as personnel work close to heavy, huge, and often poorly instrumented moving machinery and equipment. This thesis presents research on important topics related to enabling computer vision systems in industrial and offshore environments, including a review of the most important technologies and methods. A prototype 3D sensor package is developed, consisting of different sensors and a powerful embedded computer. This, together with a novel, highly scalable point cloud compression and sensor fusion scheme allows to create a real-time 3D map of an industrial area. The question of where to place the sensor packages in an environment where occlusions are present is also investigated. The result is algorithms for automatic sensor placement optimisation, where the goal is to place sensors in such a way that maximises the volume of interest that is covered, with as few occluded zones as possible. The method also includes redundancy constraints where important sub-volumes can be defined to be viewed by more than one sensor. Lastly, a people detection scheme using a merged point cloud from six different sensor packages as input is developed. Using a combination of point cloud clustering, flattening and convolutional neural networks, the system successfully detects multiple people in an outdoor industrial environment, providing real-time 3D positions. The sensor packages and methods are tested and verified at the Industrial Robotics Lab at the University of Agder, and the people detection method is also tested in a relevant outdoor, industrial testing facility. The experiments and results are presented in the papers attached to this thesis.publishedVersio
Motion capture based on RGBD data from multiple sensors for avatar animation
With recent advances in technology and emergence of affordable RGB-D sensors for a
wider range of users, markerless motion capture has become an active field of research
both in computer vision and computer graphics.
In this thesis, we designed a POC (Proof of Concept) for a new tool that enables us
to perform motion capture by using a variable number of commodity RGB-D sensors of
different brands and technical specifications on constraint-less layout environments. The
main goal of this work is to provide a tool with motion capture capabilities by using a
handful of RGB-D sensors, without imposing strong requirements in terms of lighting,
background or extension of the motion capture area. Of course, the number of RGB-D
sensors needed is inversely proportional to their resolution, and directly proportional to
the size of the area to track to.
Built on top of the OpenNI 2 library, we made this POC compatible with most of the nonhigh-end
RGB-D sensors currently available in the market. Due to the lack of resources on
a single computer, in order to support more than a couple of sensors working simultaneously,
we need a setup composed of multiple computers. In order to keep data coherency
and synchronization across sensors and computers, our tool makes use of a semi-automatic
calibration method and a message-oriented network protocol.
From color and depth data given by a sensor, we can also obtain a 3D pointcloud representation
of the environment. By combining pointclouds from multiple sensors, we can
collect a complete and animated 3D pointcloud that can be visualized from any viewpoint.
Given a 3D avatar model and its corresponding attached skeleton, we can use an
iterative optimization method (e.g. Simplex) to find a fit between each pointcloud frame
and a skeleton configuration, resulting in 3D avatar animation when using such skeleton
configurations as key frames
Parameter-unaware autocalibration for occupancy mapping
People localization and occupancy mapping are common and important tasks for multi-camera systems. In this paper, we present a novel approach to overcome the hurdle of manual extrinsic calibration of the multi-camera system. Our approach is completely parameter unaware, meaning that the user does not need to know the focal length, position or viewing angle in advance, nor will these values be calibrated as such. The only requirement to the multi-camera setup is that the views overlap substantially and are mounted at approximately the same height, requirements that are satisfied in most typical multi-camera configurations. The proposed method uses the observed height of an object or person moving through the space to estimate the distance to the object or person. Using this distance to backproject the lowest point of each detected object, we obtain a rotated and anisotropically scaled view of the ground plane for each camera. An algorithm is presented to estimate the anisotropic scaling parameters and rotation for each camera, after which ground plane positions can be computed up to an isotropic scale factor. Lens distortion is not taken into account. The method is tested in simulation yielding average accuracies within 5cm, and in a real multi-camera environment with an accuracy within 15cm
Person Re-Identification without Identification via Event Anonymization
Wide-scale use of visual surveillance in public spaces puts individual
privacy at stake while increasing resource consumption (energy, bandwidth, and
computation). Neuromorphic vision sensors (event-cameras) have been recently
considered a valid solution to the privacy issue because they do not capture
detailed RGB visual information of the subjects in the scene. However, recent
deep learning architectures have been able to reconstruct images from event
cameras with high fidelity, reintroducing a potential threat to privacy for
event-based vision applications. In this paper, we aim to anonymize
event-streams to protect the identity of human subjects against such image
reconstruction attacks. To achieve this, we propose an end-to-end network
architecture jointly optimized for the twofold objective of preserving privacy
and performing a downstream task such as person ReId. Our network learns to
scramble events, enforcing the degradation of images recovered from the privacy
attacker. In this work, we also bring to the community the first ever
event-based person ReId dataset gathered to evaluate the performance of our
approach. We validate our approach with extensive experiments and report
results on the synthetic event data simulated from the publicly available
SoftBio dataset and our proposed Event-ReId dataset.Comment: Accepted at International Conference on Computer Vision (ICCV), 202
Occlusion-Aware Multi-View Reconstruction of Articulated Objects for Manipulation
The goal of this research is to develop algorithms using multiple views to automatically recover complete 3D models of articulated objects in unstructured environments and thereby enable a robotic system to facilitate further manipulation of those objects. First, an algorithm called Procrustes-Lo-RANSAC (PLR) is presented. Structure-from-motion techniques are used to capture 3D point cloud models of an articulated object in two different configurations. Procrustes analysis, combined with a locally optimized RANSAC sampling strategy, facilitates a straightforward geometric approach to recovering the joint axes, as well as classifying them automatically as either revolute or prismatic. The algorithm does not require prior knowledge of the object, nor does it make any assumptions about the planarity of the object or scene. Second, with such a resulting articulated model, a robotic system is then able to manipulate the object either along its joint axes at a specified grasp point in order to exercise its degrees of freedom or move its end effector to a particular position even if the point is not visible in the current view. This is one of the main advantages of the occlusion-aware approach, because the models capture all sides of the object meaning that the robot has knowledge of parts of the object that are not visible in the current view. Experiments with a PUMA 500 robotic arm demonstrate the effectiveness of the approach on a variety of real-world objects containing both revolute and prismatic joints. Third, we improve the proposed approach by using a RGBD sensor (Microsoft Kinect) that yield a depth value for each pixel immediately by the sensor itself rather than requiring correspondence to establish depth. KinectFusion algorithm is applied to produce a single high-quality, geometrically accurate 3D model from which rigid links of the object are segmented and aligned, allowing the joint axes to be estimated using the geometric approach. The improved algorithm does not require artificial markers attached to objects, yields much denser 3D models and reduces the computation time
Audiovisual head orientation estimation with particle filtering in multisensor scenarios
This article presents a multimodal approach to head pose estimation of individuals in environments equipped with multiple cameras and microphones, such as SmartRooms or automatic video conferencing. Determining the individuals head orientation is the basis for many forms of more sophisticated interactions between humans and technical devices and can also be used for automatic sensor selection (camera, microphone) in communications or video surveillance systems. The use of particle filters as a unified framework for the estimation of the head orientation for both monomodal and multimodal cases is proposed. In video, we estimate head orientation from color information by exploiting spatial redundancy among cameras. Audio information is processed to estimate the direction of the voice produced by a speaker making use of the directivity characteristics of the head radiation pattern. Furthermore, two different particle filter multimodal information fusion schemes for combining the audio and video streams are analyzed in terms of accuracy and robustness. In the first one, fusion is performed at a decision level by combining each monomodal head pose estimation, while the second one uses a joint estimation system combining information at data level. Experimental results conducted over the CLEAR 2006 evaluation database are reported and the comparison of the proposed multimodal head pose estimation algorithms with the reference monomodal approaches proves the effectiveness of the proposed approach.Postprint (published version
- …