7,306 research outputs found
Co-Regularized Deep Representations for Video Summarization
Compact keyframe-based video summaries are a popular way of generating
viewership on video sharing platforms. Yet, creating relevant and compelling
summaries for arbitrarily long videos with a small number of keyframes is a
challenging task. We propose a comprehensive keyframe-based summarization
framework combining deep convolutional neural networks and restricted Boltzmann
machines. An original co-regularization scheme is used to discover meaningful
subject-scene associations. The resulting multimodal representations are then
used to select highly-relevant keyframes. A comprehensive user study is
conducted comparing our proposed method to a variety of schemes, including the
summarization currently in use by one of the most popular video sharing
websites. The results show that our method consistently outperforms the
baseline schemes for any given amount of keyframes both in terms of
attractiveness and informativeness. The lead is even more significant for
smaller summaries.Comment: Video summarization, deep convolutional neural networks,
co-regularized restricted Boltzmann machine
Seeing Behind the Camera: Identifying the Authorship of a Photograph
We introduce the novel problem of identifying the photographer behind a
photograph. To explore the feasibility of current computer vision techniques to
address this problem, we created a new dataset of over 180,000 images taken by
41 well-known photographers. Using this dataset, we examined the effectiveness
of a variety of features (low and high-level, including CNN features) at
identifying the photographer. We also trained a new deep convolutional neural
network for this task. Our results show that high-level features greatly
outperform low-level features. We provide qualitative results using these
learned models that give insight into our method's ability to distinguish
between photographers, and allow us to draw interesting conclusions about what
specific photographers shoot. We also demonstrate two applications of our
method.Comment: Dataset downloadable at http://www.cs.pitt.edu/~chris/photographer To
Appear in CVPR 201
FAME: Face Association through Model Evolution
We attack the problem of learning face models for public faces from
weakly-labelled images collected from web through querying a name. The data is
very noisy even after face detection, with several irrelevant faces
corresponding to other people. We propose a novel method, Face Association
through Model Evolution (FAME), that is able to prune the data in an iterative
way, for the face models associated to a name to evolve. The idea is based on
capturing discriminativeness and representativeness of each instance and
eliminating the outliers. The final models are used to classify faces on novel
datasets with possibly different characteristics. On benchmark datasets, our
results are comparable to or better than state-of-the-art studies for the task
of face identification.Comment: Draft version of the stud
Bayesian Nonparametric Feature and Policy Learning for Decision-Making
Learning from demonstrations has gained increasing interest in the recent
past, enabling an agent to learn how to make decisions by observing an
experienced teacher. While many approaches have been proposed to solve this
problem, there is only little work that focuses on reasoning about the observed
behavior. We assume that, in many practical problems, an agent makes its
decision based on latent features, indicating a certain action. Therefore, we
propose a generative model for the states and actions. Inference reveals the
number of features, the features, and the policies, allowing us to learn and to
analyze the underlying structure of the observed behavior. Further, our
approach enables prediction of actions for new states. Simulations are used to
assess the performance of the algorithm based upon this model. Moreover, the
problem of learning a driver's behavior is investigated, demonstrating the
performance of the proposed model in a real-world scenario
Recent Progress in Image Deblurring
This paper comprehensively reviews the recent development of image
deblurring, including non-blind/blind, spatially invariant/variant deblurring
techniques. Indeed, these techniques share the same objective of inferring a
latent sharp image from one or several corresponding blurry images, while the
blind deblurring techniques are also required to derive an accurate blur
kernel. Considering the critical role of image restoration in modern imaging
systems to provide high-quality images under complex environments such as
motion, undesirable lighting conditions, and imperfect system components, image
deblurring has attracted growing attention in recent years. From the viewpoint
of how to handle the ill-posedness which is a crucial issue in deblurring
tasks, existing methods can be grouped into five categories: Bayesian inference
framework, variational methods, sparse representation-based methods,
homography-based modeling, and region-based methods. In spite of achieving a
certain level of development, image deblurring, especially the blind case, is
limited in its success by complex application conditions which make the blur
kernel hard to obtain and be spatially variant. We provide a holistic
understanding and deep insight into image deblurring in this review. An
analysis of the empirical evidence for representative methods, practical
issues, as well as a discussion of promising future directions are also
presented.Comment: 53 pages, 17 figure
4D Unsupervised Object Discovery
Object discovery is a core task in computer vision. While fast progresses
have been made in supervised object detection, its unsupervised counterpart
remains largely unexplored. With the growth of data volume, the expensive cost
of annotations is the major limitation hindering further study. Therefore,
discovering objects without annotations has great significance. However, this
task seems impractical on still-image or point cloud alone due to the lack of
discriminative information. Previous studies underlook the crucial temporal
information and constraints naturally behind multi-modal inputs. In this paper,
we propose 4D unsupervised object discovery, jointly discovering objects from
4D data -- 3D point clouds and 2D RGB images with temporal information. We
present the first practical approach for this task by proposing a ClusterNet on
3D point clouds, which is jointly iteratively optimized with a 2D localization
network. Extensive experiments on the large-scale Waymo Open Dataset suggest
that the localization network and ClusterNet achieve competitive performance on
both class-agnostic 2D object detection and 3D instance segmentation, bridging
the gap between unsupervised methods and full supervised ones. Codes and models
will be made available at https://github.com/Robertwyq/LSMOL.Comment: Accepted by NeurIPS 2022. 17 pages, 6 figure
- …