1,055 research outputs found

    Deep Projective 3D Semantic Segmentation

    Full text link
    Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets. In this paper, we propose an alternative framework that avoids the limitations of 3D-CNNs. Instead of directly solving the problem in 3D, we first project the point cloud onto a set of synthetic 2D-images. These images are then used as input to a 2D-CNN, designed for semantic segmentation. Finally, the obtained prediction scores are re-projected to the point cloud to obtain the segmentation results. We further investigate the impact of multiple modalities, such as color, depth and surface normals, in a multi-stream network architecture. Experiments are performed on the recent Semantic3D dataset. Our approach sets a new state-of-the-art by achieving a relative gain of 7.9 %, compared to the previous best approach.Comment: Submitted to CAIP 201

    Feature-Guided Black-Box Safety Testing of Deep Neural Networks

    Full text link
    Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. Most existing approaches for crafting adversarial examples necessitate some knowledge (architecture, parameters, etc.) of the network at hand. In this paper, we focus on image classifiers and propose a feature-guided black-box approach to test the safety of deep neural networks that requires no such knowledge. Our algorithm employs object detection techniques such as SIFT (Scale Invariant Feature Transform) to extract features from an image. These features are converted into a mutable saliency distribution, where high probability is assigned to pixels that affect the composition of the image with respect to the human visual system. We formulate the crafting of adversarial examples as a two-player turn-based stochastic game, where the first player's objective is to minimise the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random. We show that, theoretically, the two-player game can con- verge to the optimal strategy, and that the optimal strategy represents a globally minimal adversarial image. For Lipschitz networks, we also identify conditions that provide safety guarantees that no adversarial examples exist. Using Monte Carlo tree search we gradually explore the game state space to search for adversarial examples. Our experiments show that, despite the black-box setting, manipulations guided by a perception-based saliency distribution are competitive with state-of-the-art methods that rely on white-box saliency matrices or sophisticated optimization procedures. Finally, we show how our method can be used to evaluate robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.Comment: 35 pages, 5 tables, 23 figure

    Layered depth images

    Get PDF
    In this paper we present a set of efficient image based rendering methods capable of rendering multiple frames per second on a PC. The first method warps Sprites with Depth representing smooth surfaces without the gaps found in other techniques. A second method for more general scenes performs warping from an intermediate representation called a Layered Depth Image (LDI). An LDI is a view of the scene from a single input camera view, but with multiple pixels along each line of sight. The size of the representation grows only linearly with the observed depth complexity in the scene. Moreover, because the LDI data are represented in a single image coordinate system, McMillan's warp ordering algorithm can be successfully adapted. As a result, pixels are drawn in the output image in back-to-front order. No z-buffer is required, so alpha-compositing can be done efficiently without depth sorting. This makes splatting an efficient solution to the resampling problem.Engineering and Applied Science

    Facilitating visual surveillance with motion detections

    Get PDF
    Visual surveillance is playing an ever increasing role in criminal detection because of a rapid deployment of surveillance cameras. Motion detection, which refers to the process of detecting a change in the position of an object in relation to the background or the change in the background in relation to the object, has become one of the enabling techniques to facilitate visual surveillance. This paper parallelizes a motion detection algorithm using a cluster of inexpensive computing devices. Custom region of interest is implemented to enhance the performance and accuracy of the motion detection algorithm. The performance of the parallelized algorithm is evaluated from both the scalability in computation and the accuracy in motion detection. Performance evaluation results show that the enhanced algorithm achieves higher accuracy in motion detection with reduced execution times in computation

    Using strong shape priors for stereo

    Get PDF
    Abstract. This paper addresses the problem of obtaining an accurate 3D reconstruction from multiple views. Taking inspiration from the recent successes of using strong prior knowledge for image segmentation, we propose a framework for 3D reconstruction which uses such priors to overcome the ambiguity inherent in this problem. Our framework is based on an object-specific Markov Random Field (MRF)[10]. It uses a volumetric scene representation and integrates conventional reconstruction measures such as photo-consistency, surface smoothness and visual hull membership with a strong object-specific prior. Simple parametric models of objects will be used as strong priors in our framework. We will show how parameters of these models can be efficiently estimated by performing inference on the MRF using dynamic graph cuts [7]. This procedure not only gives an accurate object reconstruction, but also provides us with information regarding the pose or state of the object being reconstructed. We will show the results of our method in reconstructing deformable and articulated objects.

    Generalized Multi-Camera Scene Reconstruction Using Graph Cuts

    Get PDF
    Reconstructing a 3-D scene from more than one camera is a classical problem in computer vision. One of the major sources of difficulty is the fact that not all scene elements are visible from all cameras. In the last few years, two promising approaches have been developed [. . .] that formulate the scene reconstruction problem in terms of energy minimization, and minimize the energy using graph cuts. These energy minimization approaches treat the input images symmetrically, handle visibility constraints correctly, and allow spatial smoothness to be enforced. However, these algorithm propose different problem formulations, and handle a limited class of smoothness terms. One algorithm [. . .] uses a problem formulation that is restricted to two-camera stereo, and imposes smoothness between a pair of cameras. The other algorithm [. . .] can handle an arbitrary number of cameras, but imposes smoothness only with respect to a single camera. In this paper we give a more general energy minimization formulation for the problem, which allows a larger class of spatial smoothness constraints. We show that our formulation includes both of the previous approaches as special cases, as well as permitting new energy functions. Experimental results on real data with ground truth are also included.Engineering and Applied Science

    This is not an apple! Benefits and challenges of applying computer vision to museum collections

    Get PDF
    The application of computer vision on museum collection data is at an experimental stage with predictions that it will grow in significance and use in the coming years. This research, based on the analysis of five case studies and semi-structured interviews with museum professionals, examined the opportunities and challenges of these technologies, the resources and funding required, and the ethical implications that arise during these initiatives. The case studies examined in this paper are drawn from: The Metropolitan Museum of Art (USA), Princeton University Art Museum (USA), Museum of Modern Art (USA), Harvard Art Museums (USA), Science Museum Group (UK). The research findings highlight the possibilities of computer vision to offer new ways to analyze, describe and present museum collections. However, their actual implementation on digital products is currently very limited due to the lack of resources and the inaccuracies created by algorithms. This research adds to the rapidly evolving field of computer vision within the museum sector and provides recommendations to operationalize the usage of these technologies, increase the transparency on their application, create ethics playbooks to manage potential bias and collaborate across the museum sector
    corecore