73 research outputs found

    Statistical Analysis of Dynamic Actions

    Get PDF
    Real-world action recognition applications require the development of systems which are fast, can handle a large variety of actions without a priori knowledge of the type of actions, need a minimal number of parameters, and necessitate as short as possible learning stage. In this paper, we suggest such an approach. We regard dynamic activities as long-term temporal objects, which are characterized by spatio-temporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences which captures the similarities in their behavioral content. This measure is nonparametric and can thus handle a wide range of complex dynamic actions. Having a behavior-based distance measure between sequences, we use it for a variety of tasks, including: video indexing, temporal segmentation, and action-based video clustering. These tasks are performed without prior knowledge of the types of actions, their models, or their temporal extents

    Multi-View Image Compositions

    Get PDF
    The geometry of single-viewpoint panoramas is well understood: multiple pictures taken from the same viewpoint may be stitched together into a consistent panorama mosaic. By contrast, when the point of view changes or when the scene changes (e.g., due to objects moving) no consistent mosaic may be obtained, unless the structure of the scene is very special. Artists have explored this problem and demonstrated that geometrical consistency is not the only criterion for success: incorporating multiple view points in space and time into the same panorama may produce compelling and informative pictures. We explore this avenue and suggest an approach to automating the construction of mosaics from images taken from multiple view points into a single panorama. Rather than looking at 3D scene consistency we look at image consistency. Our approach is based on optimizing a cost function that keeps into account image-to-image consistency which is measured on point-features and along picture boundaries. The optimization explicitly considers occlusion between pictures. We illustrate our ideas with a number of experiments on collections of images of objects and outdoor scenes

    Automating joiners

    Get PDF
    Pictures taken from different view points cannot be stitched into a geometrically consistent mosaic, unless the structure of the scene is very special. However, geometrical consistency is not the only criterion for success: incorporating multiple view points into the same picture may produce compelling and informative representations. A multi viewpoint form of visual expression that has recently become highly popular is that of joiners (a term coined by artist David Hockney). Joiners are compositions where photographs are layered on a 2D canvas, with some photographs occluding others and boundaries fully visible. Composing joiners is currently a tedious manual process, especially when a great number of photographs is involved. We are thus interested in automating their construction. Our approach is based on optimizing a cost function encouraging image-to-image consistency which is measured on point-features and along picture boundaries. The optimization looks for consistency in the 2D composition rather than 3D geometrical scene consistency and explicitly considers occlusion between pictures. We illustrate our ideas with a number of experiments on collections of images of objects, people, and outdoor scenes

    Approximate Nearest Neighbor Fields in Video

    Full text link
    We introduce RIANN (Ring Intersection Approximate Nearest Neighbor search), an algorithm for matching patches of a video to a set of reference patches in real-time. For each query, RIANN finds potential matches by intersecting rings around key points in appearance space. Its search complexity is reversely correlated to the amount of temporal change, making it a good fit for videos, where typically most patches change slowly with time. Experiments show that RIANN is up to two orders of magnitude faster than previous ANN methods, and is the only solution that operates in real-time. We further demonstrate how RIANN can be used for real-time video processing and provide examples for a range of real-time video applications, including colorization, denoising, and several artistic effects.Comment: A CVPR 2015 oral pape

    Non-Parametric Probabilistic Image Segmentation

    Get PDF
    We propose a simple probabilistic generative model for image segmentation. Like other probabilistic algorithms (such as EM on a Mixture of Gaussians) the proposed model is principled, provides both hard and probabilistic cluster assignments, as well as the ability to naturally incorporate prior knowledge. While previous probabilistic approaches are restricted to parametric models of clusters (e.g., Gaussians) we eliminate this limitation. The suggested approach does not make heavy assumptions on the shape of the clusters and can thus handle complex structures. Our experiments show that the suggested approach outperforms previous work on a variety of image segmentation tasks

    Photorealistic Style Transfer with Screened Poisson Equation

    Full text link
    Recent work has shown impressive success in transferring painterly style to images. These approaches, however, fall short of photorealistic style transfer. Even when both the input and reference images are photographs, the output still exhibits distortions reminiscent of a painting. In this paper we propose an approach that takes as input a stylized image and makes it more photorealistic. It relies on the Screened Poisson Equation, maintaining the fidelity of the stylized image while constraining the gradients to those of the original input image. Our method is fast, simple, fully automatic and shows positive progress in making a stylized image photorealistic. Our results exhibit finer details and are less prone to artifacts than the state-of-the-art.Comment: presented in BMVC 201

    A walk through the web’s video clips

    Get PDF
    Approximately 10^5 video clips are posted every day on the Web. The popularity of Web-based video databases poses a number of challenges to machine vision scientists: how do we organize, index and search such large wealth of data? Content-based video search and classification have been proposed in the literature and applied successfully to analyzing movies, TV broadcasts and lab-made videos. We explore the performance of some of these algorithms on a large data-set of approximately 3000 videos. We collected our data-set directly from the Web minimizing bias for content or quality, way so as to have a faithful representation of the statistics of this medium. We find that the algorithms that we have come to trust do not work well on video clips, because their quality is lower and their subject is more varied. We will make the data publicly available to encourage further research
    corecore