1,020 research outputs found
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Learning Image-Specific Attributes by Hyperbolic Neighborhood Graph Propagation
As a kind of semantic representation of visual object descriptions,
attributes are widely used in various computer vision tasks. In most of
existing attribute-based research, class-specific attributes (CSA), which are
class-level annotations, are usually adopted due to its low annotation cost for
each class instead of each individual image. However, class-specific attributes
are usually noisy because of annotation errors and diversity of individual
images. Therefore, it is desirable to obtain image-specific attributes (ISA),
which are image-level annotations, from the original class-specific attributes.
In this paper, we propose to learn image-specific attributes by graph-based
attribute propagation. Considering the intrinsic property of hyperbolic
geometry that its distance expands exponentially, hyperbolic neighborhood graph
(HNG) is constructed to characterize the relationship between samples. Based on
HNG, we define neighborhood consistency for each sample to identify
inconsistent samples. Subsequently, inconsistent samples are refined based on
their neighbors in HNG. Extensive experiments on five benchmark datasets
demonstrate the significant superiority of the learned image-specific
attributes over the original class-specific attributes in the zero-shot object
classification task.Comment: Accepted for IJCAI 201
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Large-scale labeled data are generally required to train deep neural networks
in order to obtain better performance in visual feature learning from images or
videos for computer vision applications. To avoid extensive cost of collecting
and annotating large-scale datasets, as a subset of unsupervised learning
methods, self-supervised learning methods are proposed to learn general image
and video features from large-scale unlabeled data without using any
human-annotated labels. This paper provides an extensive review of deep
learning-based self-supervised general visual feature learning methods from
images or videos. First, the motivation, general pipeline, and terminologies of
this field are described. Then the common deep neural network architectures
that used for self-supervised learning are summarized. Next, the main
components and evaluation metrics of self-supervised learning methods are
reviewed followed by the commonly used image and video datasets and the
existing self-supervised visual feature learning methods. Finally, quantitative
performance comparisons of the reviewed methods on benchmark datasets are
summarized and discussed for both image and video feature learning. At last,
this paper is concluded and lists a set of promising future directions for
self-supervised visual feature learning
Hand Pose Estimation through Semi-Supervised and Weakly-Supervised Learning
We propose a method for hand pose estimation based on a deep regressor
trained on two different kinds of input. Raw depth data is fused with an
intermediate representation in the form of a segmentation of the hand into
parts. This intermediate representation contains important topological
information and provides useful cues for reasoning about joint locations. The
mapping from raw depth to segmentation maps is learned in a
semi/weakly-supervised way from two different datasets: (i) a synthetic dataset
created through a rendering pipeline including densely labeled ground truth
(pixelwise segmentations); and (ii) a dataset with real images for which ground
truth joint positions are available, but not dense segmentations. Loss for
training on real images is generated from a patch-wise restoration process,
which aligns tentative segmentation maps with a large dictionary of synthetic
poses. The underlying premise is that the domain shift between synthetic and
real data is smaller in the intermediate representation, where labels carry
geometric and topological meaning, than in the raw input domain. Experiments on
the NYU dataset show that the proposed training method decreases error on
joints over direct regression of joints from depth data by 15.7%.Comment: 13 pages, 10 figures, 4 table
Deep Object Co-Segmentation
This work presents a deep object co-segmentation (DOCS) approach for
segmenting common objects of the same class within a pair of images. This means
that the method learns to ignore common, or uncommon, background stuff and
focuses on objects. If multiple object classes are presented in the image pair,
they are jointly extracted as foreground. To address this task, we propose a
CNN-based Siamese encoder-decoder architecture. The encoder extracts high-level
semantic features of the foreground objects, a mutual correlation layer detects
the common objects, and finally, the decoder generates the output foreground
masks for each image. To train our model, we compile a large object
co-segmentation dataset consisting of image pairs from the PASCAL VOC dataset
with common objects masks. We evaluate our approach on commonly used datasets
for co-segmentation tasks and observe that our approach consistently
outperforms competing methods, for both seen and unseen object classes.Comment: Accepted at ACCV 201
Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation
We present an approach for jointly matching and segmenting object instances
of the same category within a collection of images. In contrast to existing
algorithms that tackle the tasks of semantic matching and object
co-segmentation in isolation, our method exploits the complementary nature of
the two tasks. The key insights of our method are two-fold. First, the
estimated dense correspondence fields from semantic matching provide
supervision for object co-segmentation by enforcing consistency between the
predicted masks from a pair of images. Second, the predicted object masks from
object co-segmentation in turn allow us to reduce the adverse effects due to
background clutters for improving semantic matching. Our model is end-to-end
trainable and does not require supervision from manually annotated
correspondences and object masks. We validate the efficacy of our approach on
five benchmark datasets: TSS, Internet, PF-PASCAL, PF-WILLOW, and SPair-71k,
and show that our algorithm performs favorably against the state-of-the-art
methods on both semantic matching and object co-segmentation tasks.Comment: PAMI 2020. Project: https://yunchunchen.github.io/MaCoSNet-web/ Code:
https://github.com/YunChunChen/MaCoSNet-pytorc
Transfer Adaptation Learning: A Decade Survey
The world we see is ever-changing and it always changes with people, things,
and the environment. Domain is referred to as the state of the world at a
certain moment. A research problem is characterized as transfer adaptation
learning (TAL) when it needs knowledge correspondence between different
moments/domains. Conventional machine learning aims to find a model with the
minimum expected risk on test data by minimizing the regularized empirical risk
on the training data, which, however, supposes that the training and test data
share similar joint probability distribution. TAL aims to build models that can
perform tasks of target domain by learning knowledge from a semantic related
but distribution different source domain. It is an energetic research filed of
increasing influence and importance, which is presenting a blowout publication
trend. This paper surveys the advances of TAL methodologies in the past decade,
and the technical challenges and essential problems of TAL have been observed
and discussed with deep insights and new perspectives. Broader solutions of
transfer adaptation learning being created by researchers are identified, i.e.,
instance re-weighting adaptation, feature adaptation, classifier adaptation,
deep network adaptation and adversarial adaptation, which are beyond the early
semi-supervised and unsupervised split. The survey helps researchers rapidly
but comprehensively understand and identify the research foundation, research
status, theoretical limitations, future challenges and under-studied issues
(universality, interpretability, and credibility) to be broken in the field
toward universal representation and safe applications in open-world scenarios.Comment: 26 pages, 4 figure
Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions
Various 3D semantic attributes such as segmentation masks, geometric
features, keypoints, and materials can be encoded as per-point probe functions
on 3D geometries. Given a collection of related 3D shapes, we consider how to
jointly analyze such probe functions over different shapes, and how to discover
common latent structures using a neural network --- even in the absence of any
correspondence information. Our network is trained on point cloud
representations of shape geometry and associated semantic functions on that
point cloud. These functions express a shared semantic understanding of the
shapes but are not coordinated in any way. For example, in a segmentation task,
the functions can be indicator functions of arbitrary sets of shape parts, with
the particular combination involved not known to the network. Our network is
able to produce a small dictionary of basis functions for each shape, a
dictionary whose span includes the semantic functions provided for that shape.
Even though our shapes have independent discretizations and no functional
correspondences are provided, the network is able to generate latent bases, in
a consistent order, that reflect the shared semantic structure among the
shapes. We demonstrate the effectiveness of our technique in various
segmentation and keypoint selection applications
Domain Adaptations for Computer Vision Applications
A basic assumption of statistical learning theory is that train and test data
are drawn from the same underlying distribution. Unfortunately, this assumption
doesn't hold in many applications. Instead, ample labeled data might exist in a
particular `source' domain while inference is needed in another, `target'
domain. Domain adaptation methods leverage labeled data from both domains to
improve classification on unseen data in the target domain. In this work we
survey domain transfer learning methods for various application domains with
focus on recent work in Computer Vision
- …