89,384 research outputs found

    Boundary and object detection in real world images

    Get PDF
    A solution to the problem of automatic location of objects in digital pictures by computer is presented. A self-scaling local edge detector which can be applied in parallel on a picture is described. Clustering algorithms and boundary following algorithms which are sequential in nature process the edge data to locate images of objects

    Silhouette coverage analysis for multi-modal video surveillance

    Get PDF
    In order to improve the accuracy in video-based object detection, the proposed multi-modal video surveillance system takes advantage of the different kinds of information represented by visual, thermal and/or depth imaging sensors. The multi-modal object detector of the system can be split up in two consecutive parts: the registration and the coverage analysis. The multi-modal image registration is performed using a three step silhouette-mapping algorithm which detects the rotation, scale and translation between moving objects in the visual, (thermal) infrared and/or depth images. First, moving object silhouettes are extracted to separate the calibration objects, i.e., the foreground, from the static background. Key components are dynamic background subtraction, foreground enhancement and automatic thresholding. Then, 1D contour vectors are generated from the resulting multi-modal silhouettes using silhouette boundary extraction, cartesian to polar transform and radial vector analysis. Next, to retrieve the rotation angle and the scale factor between the multi-sensor image, these contours are mapped on each other using circular cross correlation and contour scaling. Finally, the translation between the images is calculated using maximization of binary correlation. The silhouette coverage analysis also starts with moving object silhouette extraction. Then, it uses the registration information, i.e., rotation angle, scale factor and translation vector, to map the thermal, depth and visual silhouette images on each other. Finally, the coverage of the resulting multi-modal silhouette map is computed and is analyzed over time to reduce false alarms and to improve object detection. Prior experiments on real-world multi-sensor video sequences indicate that automated multi-modal video surveillance is promising. This paper shows that merging information from multi-modal video further increases the detection results

    Predicting Illusory Contours Without Extracting Special Image Features

    Get PDF
    Boundary completion is one of the desired properties of a robust object boundary detection model, since in real-word images the object boundaries are commonly not fully and clearly seen. An extreme example of boundary completion occurs in images with illusory contours, where the visual system completes boundaries in locations without intensity gradient. Most illusory contour models extract special image features, such as L and T junctions, while the task is known to be a difficult issue in real-world images. The proposed model uses a functional optimization approach, in which a cost value is assigned to any boundary arrangement to find the arrangement with minimal cost. The functional accounts for basic object properties, such as alignment with the image, object boundary continuity, and boundary simplicity. The encoding of these properties in the functional does not require special features extraction, since the alignment with the image only requires extraction of the image edges. The boundary arrangement is represented by a border ownership map, holding object boundary segments in discrete locations and directions. The model finds multiple possible image interpretations, which are ranked according to the probability that they are supposed to be perceived. This is achieved by using a novel approach to represent the different image interpretations by multiple functional local minima. The model is successfully applied to objects with real and illusory contours. In the case of Kanizsa illusion the model predicts both illusory and real (pacman) image interpretations. The model is a proof of concept and is currently restricted to synthetic gray-scale images with solid regions

    Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network

    Get PDF
    With more and more household objects built on planned obsolescence and consumed by a fast-growing population, hazardous waste recycling has become a critical challenge. Given the large variability of household waste, current recycling platforms mostly rely on human operators to analyze the scene, typically composed of many object instances piled up in bulk. Helping them by robotizing the unitary extraction is a key challenge to speed up this tedious process. Whereas supervised deep learning has proven very efficient for such object-level scene understanding, e.g., generic object detection and segmentation in everyday scenes, it however requires large sets of per-pixel labeled images, that are hardly available for numerous application contexts, including industrial robotics. We thus propose a step towards a practical interactive application for generating an object-oriented robotic grasp, requiring as inputs only one depth map of the scene and one user click on the next object to extract. More precisely, we address in this paper the middle issue of object seg-mentation in top views of piles of bulk objects given a pixel location, namely seed, provided interactively by a human operator. We propose a twofold framework for generating edge-driven instance segments. First, we repurpose a state-of-the-art fully convolutional object contour detector for seed-based instance segmentation by introducing the notion of edge-mask duality with a novel patch-free and contour-oriented loss function. Second, we train one model using only synthetic scenes, instead of manually labeled training data. Our experimental results show that considering edge-mask duality for training an encoder-decoder network, as we suggest, outperforms a state-of-the-art patch-based network in the present application context.Comment: This is a pre-print of an article published in Human Friendly Robotics, 10th International Workshop, Springer Proceedings in Advanced Robotics, vol 7. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-89327-3\_16, Springer Proceedings in Advanced Robotics, Siciliano Bruno, Khatib Oussama, In press, Human Friendly Robotics, 10th International Workshop,

    Generalized Boundaries from Multiple Image Interpretations

    Full text link
    Boundary detection is essential for a variety of computer vision tasks such as segmentation and recognition. In this paper we propose a unified formulation and a novel algorithm that are applicable to the detection of different types of boundaries, such as intensity edges, occlusion boundaries or object category specific boundaries. Our formulation leads to a simple method with state-of-the-art performance and significantly lower computational cost than existing methods. We evaluate our algorithm on different types of boundaries, from low-level boundaries extracted in natural images, to occlusion boundaries obtained using motion cues and RGB-D cameras, to boundaries from soft-segmentation. We also propose a novel method for figure/ground soft-segmentation that can be used in conjunction with our boundary detection method and improve its accuracy at almost no extra computational cost

    Straight to Shapes: Real-time Detection of Encoded Shapes

    Full text link
    Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects' shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network pipeline, the network shows the useful practical quality of generalising to unseen categories similar to the ones in the training set, something that most existing approaches fail to handle.Comment: 16 pages including appendix; Published at CVPR 201

    Extracting 3D parametric curves from 2D images of Helical objects

    Get PDF
    Helical objects occur in medicine, biology, cosmetics, nanotechnology, and engineering. Extracting a 3D parametric curve from a 2D image of a helical object has many practical applications, in particular being able to extract metrics such as tortuosity, frequency, and pitch. We present a method that is able to straighten the image object and derive a robust 3D helical curve from peaks in the object boundary. The algorithm has a small number of stable parameters that require little tuning, and the curve is validated against both synthetic and real-world data. The results show that the extracted 3D curve comes within close Hausdorff distance to the ground truth, and has near identical tortuosity for helical objects with a circular profile. Parameter insensitivity and robustness against high levels of image noise are demonstrated thoroughly and quantitatively
    • …
    corecore