343 research outputs found
Learning Discriminative Stein Kernel for SPD Matrices and Its Applications
Stein kernel has recently shown promising performance on classifying images
represented by symmetric positive definite (SPD) matrices. It evaluates the
similarity between two SPD matrices through their eigenvalues. In this paper,
we argue that directly using the original eigenvalues may be problematic
because: i) Eigenvalue estimation becomes biased when the number of samples is
inadequate, which may lead to unreliable kernel evaluation; ii) More
importantly, eigenvalues only reflect the property of an individual SPD matrix.
They are not necessarily optimal for computing Stein kernel when the goal is to
discriminate different sets of SPD matrices. To address the two issues in one
shot, we propose a discriminative Stein kernel, in which an extra parameter
vector is defined to adjust the eigenvalues of the input SPD matrices. The
optimal parameter values are sought by optimizing a proxy of classification
performance. To show the generality of the proposed method, three different
kernel learning criteria that are commonly used in the literature are employed
respectively as a proxy. A comprehensive experimental study is conducted on a
variety of image classification tasks to compare our proposed discriminative
Stein kernel with the original Stein kernel and other commonly used methods for
evaluating the similarity between SPD matrices. The experimental results
demonstrate that, the discriminative Stein kernel can attain greater
discrimination and better align with classification tasks by altering the
eigenvalues. This makes it produce higher classification performance than the
original Stein kernel and other commonly used methods.Comment: 13 page
Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Due to its causal semantics, Bayesian networks (BN) have been widely employed
to discover the underlying data relationship in exploratory studies, such as
brain research. Despite its success in modeling the probability distribution of
variables, BN is naturally a generative model, which is not necessarily
discriminative. This may cause the ignorance of subtle but critical network
changes that are of investigation values across populations. In this paper, we
propose to improve the discriminative power of BN models for continuous
variables from two different perspectives. This brings two general
discriminative learning frameworks for Gaussian Bayesian networks (GBN). In the
first framework, we employ Fisher kernel to bridge the generative models of GBN
and the discriminative classifiers of SVMs, and convert the GBN parameter
learning to Fisher kernel learning via minimizing a generalization error bound
of SVMs. In the second framework, we employ the max-margin criterion and build
it directly upon GBN models to explicitly optimize the classification
performance of the GBNs. The advantages and disadvantages of the two frameworks
are discussed and experimentally compared. Both of them demonstrate strong
power in learning discriminative parameters of GBNs for neuroimaging based
brain network analysis, as well as maintaining reasonable representation
capacity. The contributions of this paper also include a new Directed Acyclic
Graph (DAG) constraint with theoretical guarantee to ensure the graph validity
of GBN.Comment: 16 pages and 5 figures for the article (excluding appendix
OPML: A One-Pass Closed-Form Solution for Online Metric Learning
To achieve a low computational cost when performing online metric learning
for large-scale data, we present a one-pass closed-form solution namely OPML in
this paper. Typically, the proposed OPML first adopts a one-pass triplet
construction strategy, which aims to use only a very small number of triplets
to approximate the representation ability of whole original triplets obtained
by batch-manner methods. Then, OPML employs a closed-form solution to update
the metric for new coming samples, which leads to a low space (i.e., )
and time (i.e., ) complexity, where is the feature dimensionality.
In addition, an extension of OPML (namely COPML) is further proposed to enhance
the robustness when in real case the first several samples come from the same
class (i.e., cold start problem). In the experiments, we have systematically
evaluated our methods (OPML and COPML) on three typical tasks, including UCI
data classification, face verification, and abnormal event detection in videos,
which aims to fully evaluate the proposed methods on different sample number,
different feature dimensionalities and different feature extraction ways (i.e.,
hand-crafted and deeply-learned). The results show that OPML and COPML can
obtain the promising performance with a very low computational cost. Also, the
effectiveness of COPML under the cold start setting is experimentally verified.Comment: 12 page
A Novel Unsupervised Camera-aware Domain Adaptation Framework for Person Re-identification
Unsupervised cross-domain person re-identification (Re-ID) faces two key
issues. One is the data distribution discrepancy between source and target
domains, and the other is the lack of labelling information in target domain.
They are addressed in this paper from the perspective of representation
learning. For the first issue, we highlight the presence of camera-level
sub-domains as a unique characteristic of person Re-ID, and develop
camera-aware domain adaptation to reduce the discrepancy not only between
source and target domains but also across these sub-domains. For the second
issue, we exploit the temporal continuity in each camera of target domain to
create discriminative information. This is implemented by dynamically
generating online triplets within each batch, in order to maximally take
advantage of the steadily improved feature representation in training process.
Together, the above two methods give rise to a novel unsupervised deep domain
adaptation framework for person Re-ID. Experiments and ablation studies on
benchmark datasets demonstrate its superiority and interesting properties.Comment: Accepted by ICCV201
ZeroMesh: Zero-shot Single-view 3D Mesh Reconstruction
Single-view 3D object reconstruction is a fundamental and challenging
computer vision task that aims at recovering 3D shapes from single-view RGB
images. Most existing deep learning based reconstruction methods are trained
and evaluated on the same categories, and they cannot work well when handling
objects from novel categories that are not seen during training. Focusing on
this issue, this paper tackles Zero-shot Single-view 3D Mesh Reconstruction, to
study the model generalization on unseen categories and encourage models to
reconstruct objects literally. Specifically, we propose an end-to-end two-stage
network, ZeroMesh, to break the category boundaries in reconstruction. Firstly,
we factorize the complicated image-to-mesh mapping into two simpler mappings,
i.e., image-to-point mapping and point-to-mesh mapping, while the latter is
mainly a geometric problem and less dependent on object categories. Secondly,
we devise a local feature sampling strategy in 2D and 3D feature spaces to
capture the local geometry shared across objects to enhance model
generalization. Thirdly, apart from the traditional point-to-point supervision,
we introduce a multi-view silhouette loss to supervise the surface generation
process, which provides additional regularization and further relieves the
overfitting problem. The experimental results show that our method
significantly outperforms the existing works on the ShapeNet and Pix3D under
different scenarios and various metrics, especially for novel objects
Subject-adaptive Integration of Multiple SICE Brain Networks with Different Sparsity
As a principled method for partial correlation estimation, sparse inverse covariance estimation (SICE) has been employed to model brain connectivity networks, which holds great promise for brain disease diagnosis. For each subject, the SICE method naturally leads to a set of connectivity networks with various sparsity. However, existing methods usually select a single network from them for classification and the discriminative power of this set of networks has not been fully exploited. This paper argues that the connectivity networks at different sparsity levels present complementary connectivity patterns and therefore they should be jointly considered to achieve high classification performance.In this paper, we propose a subject-adaptive method to integrate multiple SICE networks as a unified representation for classification. The integration weight is learned adaptively for each subject in order to endow the method with the flexibility in dealing with subject variations. Furthermore, to respect the manifold geometry of SICE networks, Stein kernel is employed to embed the manifold structure into a kernel-induced feature space, which allows a linear integration of SICE networks to be designed. The optimization of the integration weight and the classification of the integrated networks are performed via a sparse representation framework. Through our method, we provide a unified and effective network representation that is transparent to the sparsity level of SICE networks, and can be readily utilized for further medical analysis. Experimental study on ADHD and ADNI data sets demonstrates that the proposed integration method achieves notable improvement of classification performance in comparison with methods using a single sparsity level of SICE networks and other commonly used integration methods, such as Multiple Kernel Learning
Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder
Medical Visual Question Answering (VQA) systems play a supporting role to
understand clinic-relevant information carried by medical images. The questions
to a medical image include two categories: close-end (such as Yes/No question)
and open-end. To obtain answers, the majority of the existing medical VQA
methods relies on classification approaches, while a few works attempt to use
generation approaches or a mixture of the two. The classification approaches
are relatively simple but perform poorly on long open-end questions. To bridge
this gap, in this paper, we propose a new Transformer based framework for
medical VQA (named as Q2ATransformer), which integrates the advantages of both
the classification and the generation approaches and provides a unified
treatment for the close-end and open-end questions. Specifically, we introduce
an additional Transformer decoder with a set of learnable candidate answer
embeddings to query the existence of each answer class to a given
image-question pair. Through the Transformer attention, the candidate answer
embeddings interact with the fused features of the image-question pair to make
the decision. In this way, despite being a classification-based approach, our
method provides a mechanism to interact with the answer information for
prediction like the generation-based approaches. On the other hand, by
classification, we mitigate the task difficulty by reducing the search space of
answers. Our method achieves new state-of-the-art performance on two medical
VQA benchmarks. Especially, for the open-end questions, we achieve 79.19% on
VQA-RAD and 54.85% on PathVQA, with 16.09% and 41.45% absolute improvements,
respectively
Neural Vector Fields: Generalizing Distance Vector Fields by Codebooks and Zero-Curl Regularization
Recent neural networks based surface reconstruction can be roughly divided
into two categories, one warping templates explicitly and the other
representing 3D surfaces implicitly. To enjoy the advantages of both, we
propose a novel 3D representation, Neural Vector Fields (NVF), which adopts the
explicit learning process to manipulate meshes and implicit unsigned distance
function (UDF) representation to break the barriers in resolution and topology.
This is achieved by directly predicting the displacements from surface queries
and modeling shapes as Vector Fields, rather than relying on network
differentiation to obtain direction fields as most existing UDF-based methods
do. In this way, our approach is capable of encoding both the distance and the
direction fields so that the calculation of direction fields is
differentiation-free, circumventing the non-trivial surface extraction step.
Furthermore, building upon NVFs, we propose to incorporate two types of shape
codebooks, \ie, NVFs (Lite or Ultra), to promote cross-category reconstruction
through encoding cross-object priors. Moreover, we propose a new regularization
based on analyzing the zero-curl property of NVFs, and implement this through
the fully differentiable framework of our NVF (ultra). We evaluate both NVFs on
four surface reconstruction scenarios, including watertight vs non-watertight
shapes, category-agnostic reconstruction vs category-unseen reconstruction,
category-specific, and cross-domain reconstruction
- …