3,714 research outputs found
Video Shot Boundary Detection Using Generalized Eigenvalue Decomposition and Gaussian Transition Detection
Shot boundary detection is the first step of the video analysis, summarization and retrieval. In this paper, we propose a novel shot boundary detection algorithm using Generalized Eigenvalue Decomposition (GED) and modeling of gradual transitions by Gaussian functions. Especially, we focus on the challenges of detecting the gradual shots and extracting appropriate spatio-temporal features, which have effects on the ability of algorithm to detect shot boundaries efficiently. We derive a theorem that discuss about some new features of GED which could be used in the video processing algorithms. Our innovative explanation utilizes this theorem in the defining of new distance metric in Eigen space for comparing video frames. The distance function has abrupt changes in hard cut transitions and semi-Gaussian behavior in gradual transitions. The algorithm detects the transitions by analyzing this distance function. Finally we report the experimental results using large-scale test sets provided by the TRECVID 2006 which has evaluations for hard cut and gradual shot boundary detection
Learning from Very Few Samples: A Survey
Few sample learning (FSL) is significant and challenging in the field of
machine learning. The capability of learning and generalizing from very few
samples successfully is a noticeable demarcation separating artificial
intelligence and human intelligence since humans can readily establish their
cognition to novelty from just a single or a handful of examples whereas
machine learning algorithms typically entail hundreds or thousands of
supervised samples to guarantee generalization ability. Despite the long
history dated back to the early 2000s and the widespread attention in recent
years with booming deep learning technologies, little surveys or reviews for
FSL are available until now. In this context, we extensively review 300+ papers
of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive
survey for FSL. In this survey, we review the evolution history as well as the
current progress on FSL, categorize FSL approaches into the generative model
based and discriminative model based kinds in principle, and emphasize
particularly on the meta learning based FSL approaches. We also summarize
several recently emerging extensional topics of FSL and review the latest
advances on these topics. Furthermore, we highlight the important FSL
applications covering many research hotspots in computer vision, natural
language processing, audio and speech, reinforcement learning and robotic, data
analysis, etc. Finally, we conclude the survey with a discussion on promising
trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page
Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest
Confinement during COVID-19 has caused serious effects on agriculture all
over the world. As one of the efficient solutions, mechanical
harvest/auto-harvest that is based on object detection and robotic harvester
becomes an urgent need. Within the auto-harvest system, robust few-shot object
detection model is one of the bottlenecks, since the system is required to deal
with new vegetable/fruit categories and the collection of large-scale annotated
datasets for all the novel categories is expensive. There are many few-shot
object detection models that were developed by the community. Yet whether they
could be employed directly for real life agricultural applications is still
questionable, as there is a context-gap between the commonly used training
datasets and the images collected in real life agricultural scenarios. To this
end, in this study, we present a novel cucumber dataset and propose two data
augmentation strategies that help to bridge the context-gap. Experimental
results show that 1) the state-of-the-art few-shot object detection model
performs poorly on the novel `cucumber' category; and 2) the proposed
augmentation strategies outperform the commonly used ones.Comment: 6 page
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models
Current deep networks are very data-hungry and benefit from training on
largescale datasets, which are often time-consuming to collect and annotate. By
contrast, synthetic data can be generated infinitely using generative models
such as DALL-E and diffusion models, with minimal effort and cost. In this
paper, we present DatasetDM, a generic dataset generation model that can
produce diverse synthetic images and the corresponding high-quality perception
annotations (e.g., segmentation masks, and depth). Our method builds upon the
pre-trained diffusion model and extends text-guided image synthesis to
perception data generation. We show that the rich latent code of the diffusion
model can be effectively decoded as accurate perception annotations using a
decoder module. Training the decoder only needs less than 1% (around 100
images) manually labeled images, enabling the generation of an infinitely large
annotated dataset. Then these synthetic data can be used for training various
perception models for downstream tasks. To showcase the power of the proposed
approach, we generate datasets with rich dense pixel-wise labels for a wide
range of downstream tasks, including semantic segmentation, instance
segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art
results on semantic segmentation and instance segmentation; 2) significantly
more robust on domain generalization than using the real data alone; and
state-of-the-art results in zero-shot segmentation setting; and 3) flexibility
for efficient application and novel task composition (e.g., image editing). The
project website and code can be found at
https://weijiawu.github.io/DatasetDM_page/ and
https://github.com/showlab/DatasetDM, respectivel
- …