221 research outputs found
Statistical Analysis of Dynamic Actions
Real-world action recognition applications require the development of systems which are fast, can handle a large variety of actions without a priori knowledge of the type of actions, need a minimal number of parameters, and necessitate as short as possible learning stage. In this paper, we suggest such an approach. We regard dynamic activities as long-term temporal objects, which are characterized by spatio-temporal features at multiple temporal scales. Based on this, we design a simple statistical distance measure between video sequences which captures the similarities in their behavioral content. This measure is nonparametric and can thus handle a wide range of complex dynamic actions. Having a behavior-based distance measure between sequences, we use it for a variety of tasks, including: video indexing, temporal segmentation, and action-based video clustering. These tasks are performed without prior knowledge of the types of actions, their models, or their temporal extents
Multi-View Image Compositions
The geometry of single-viewpoint panoramas is well understood: multiple pictures taken from the same viewpoint may be stitched together into a consistent panorama mosaic. By contrast, when the point of view changes or when the scene changes (e.g., due to objects moving) no consistent mosaic may be obtained, unless the structure of the scene is very special.
Artists have explored this problem and demonstrated that geometrical consistency is not the only criterion for success: incorporating multiple view points in space and time into the same panorama may produce compelling and
informative pictures. We explore this avenue and suggest an approach to automating the construction of mosaics from images taken from multiple view points into a single panorama. Rather than looking at 3D scene consistency we look at image consistency. Our approach is based on optimizing a cost function that keeps into account image-to-image consistency which is measured on point-features and along picture boundaries. The optimization explicitly considers occlusion between pictures.
We illustrate our ideas with a number of experiments on collections of images of objects and outdoor scenes
Automating joiners
Pictures taken from different view points cannot be stitched into a geometrically consistent mosaic, unless the structure of the scene is very special. However, geometrical consistency is not the only criterion for success: incorporating multiple view points into the same picture may produce compelling and informative representations. A multi viewpoint form of visual expression that has recently become highly popular is that of joiners (a term coined by artist David Hockney). Joiners are compositions where photographs are layered on a 2D canvas, with some photographs occluding others and boundaries fully visible.
Composing joiners is currently a tedious manual process, especially when a great number of photographs is involved. We are thus interested in automating their construction. Our approach is based on optimizing a cost function encouraging image-to-image consistency which is measured on point-features and along picture boundaries. The optimization looks for consistency in the 2D composition rather than 3D geometrical scene consistency and explicitly considers occlusion between pictures. We illustrate our ideas with a number of experiments on collections of images of objects, people, and outdoor scenes
Approximate Nearest Neighbor Fields in Video
We introduce RIANN (Ring Intersection Approximate Nearest Neighbor search),
an algorithm for matching patches of a video to a set of reference patches in
real-time. For each query, RIANN finds potential matches by intersecting rings
around key points in appearance space. Its search complexity is reversely
correlated to the amount of temporal change, making it a good fit for videos,
where typically most patches change slowly with time. Experiments show that
RIANN is up to two orders of magnitude faster than previous ANN methods, and is
the only solution that operates in real-time. We further demonstrate how RIANN
can be used for real-time video processing and provide examples for a range of
real-time video applications, including colorization, denoising, and several
artistic effects.Comment: A CVPR 2015 oral pape
Non-Parametric Probabilistic Image Segmentation
We propose a simple probabilistic generative model for
image segmentation. Like other probabilistic algorithms
(such as EM on a Mixture of Gaussians) the proposed model
is principled, provides both hard and probabilistic cluster
assignments, as well as the ability to naturally incorporate
prior knowledge. While previous probabilistic approaches
are restricted to parametric models of clusters (e.g., Gaussians)
we eliminate this limitation. The suggested approach
does not make heavy assumptions on the shape of the clusters
and can thus handle complex structures. Our experiments
show that the suggested approach outperforms previous
work on a variety of image segmentation tasks
Photorealistic Style Transfer with Screened Poisson Equation
Recent work has shown impressive success in transferring painterly style to
images. These approaches, however, fall short of photorealistic style transfer.
Even when both the input and reference images are photographs, the output still
exhibits distortions reminiscent of a painting. In this paper we propose an
approach that takes as input a stylized image and makes it more photorealistic.
It relies on the Screened Poisson Equation, maintaining the fidelity of the
stylized image while constraining the gradients to those of the original input
image. Our method is fast, simple, fully automatic and shows positive progress
in making a stylized image photorealistic. Our results exhibit finer details
and are less prone to artifacts than the state-of-the-art.Comment: presented in BMVC 201
A walk through the web’s video clips
Approximately 10^5 video clips are posted every day on the Web. The popularity of Web-based video databases poses a number of challenges to machine vision scientists: how do we organize, index and search such large wealth of data? Content-based video search and classification have been proposed in the literature and applied successfully to analyzing movies, TV broadcasts and lab-made videos. We explore the performance of some of these algorithms on a large data-set of approximately 3000 videos. We collected our data-set directly from the Web minimizing bias for content or quality, way so as to have a faithful representation of the statistics of this medium. We find that the algorithms that we have come to trust do not work well on video clips, because their quality is lower and their subject is more varied. We will make the data publicly available to encourage further research
- …