34 research outputs found
Coded Residual Transform for Generalizable Deep Metric Learning
A fundamental challenge in deep metric learning is the generalization
capability of the feature embedding network model since the embedding network
learned on training classes need to be evaluated on new test classes. To
address this challenge, in this paper, we introduce a new method called coded
residual transform (CRT) for deep metric learning to significantly improve its
generalization capability. Specifically, we learn a set of diversified
prototype features, project the feature map onto each prototype, and then
encode its features using their projection residuals weighted by their
correlation coefficients with each prototype. The proposed CRT method has the
following two unique characteristics. First, it represents and encodes the
feature map from a set of complimentary perspectives based on projections onto
diversified prototypes. Second, unlike existing transformer-based feature
representation approaches which encode the original values of features based on
global correlation analysis, the proposed coded residual transform encodes the
relative differences between the original features and their projected
prototypes. Embedding space density and spectral decay analysis show that this
multi-perspective projection onto diversified prototypes and coded residual
representation are able to achieve significantly improved generalization
capability in metric learning. Finally, to further enhance the generalization
performance, we propose to enforce the consistency on their feature similarity
matrices between coded residual transforms with different sizes of projection
prototypes and embedding dimensions. Our extensive experimental results and
ablation studies demonstrate that the proposed CRT method outperform the
state-of-the-art deep metric learning methods by large margins and improving
upon the current best method by up to 4.28% on the CUB dataset.Comment: Accepted by NeurIPS 202
Tree-Based Backtracking Orthogonal Matching Pursuit for Sparse Signal Reconstruction
Compressed sensing (CS) is a theory which exploits the sparsity characteristic of the original signal in signal sampling and coding. By solving an optimization problem, the original sparse signal can be reconstructed accurately. In this paper, a new Tree-based Backtracking Orthogonal Matching Pursuit (TBOMP) algorithm is presented with the idea of the tree model in wavelet domain. The algorithm can convert the wavelet tree structure to the corresponding relations of candidate atoms without any prior information of signal sparsity. Thus, the atom selection process will be more structural and the search space can be narrowed. Moreover, according to the backtracking process, the previous chosen atoms’ reliability can be detected and the unreliable atoms can be deleted at each iteration, which leads to an accurate reconstruction of the signal ultimately. Compared with other compressed sensing algorithms, simulation results show the proposed algorithm’s superior performance to that of several other OMP-type algorithms
Contrastive Bayesian Analysis for Deep Metric Learning
Recent methods for deep metric learning have been focusing on designing
different contrastive loss functions between positive and negative pairs of
samples so that the learned feature embedding is able to pull positive samples
of the same class closer and push negative samples from different classes away
from each other. In this work, we recognize that there is a significant
semantic gap between features at the intermediate feature layer and class
labels at the final output layer. To bridge this gap, we develop a contrastive
Bayesian analysis to characterize and model the posterior probabilities of
image labels conditioned by their features similarity in a contrastive learning
setting. This contrastive Bayesian analysis leads to a new loss function for
deep metric learning. To improve the generalization capability of the proposed
method onto new classes, we further extend the contrastive Bayesian loss with a
metric variance constraint. Our experimental results and ablation studies
demonstrate that the proposed contrastive Bayesian metric learning method
significantly improves the performance of deep metric learning in both
supervised and pseudo-supervised scenarios, outperforming existing methods by a
large margin.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Robustness meets low-rankness: unified entropy and tensor learning for multi-view subspace clustering
In this paper, we develop the weighted error entropy-regularized tensor learning method for multi-view subspace clustering (WETMSC), which integrates the noise disturbance removal and subspace structure discovery into one unified framework. Unlike most existing methods which focus only on the affinity matrix learning for the subspace discovery by different optimization models and simply assume that the noise is independent and identically distributed (i.i.d.), our WETMSC method adopts the weighted error entropy to characterize the underlying noise by assuming that noise is independent and piecewise identically distributed (i.p.i.d.). Meanwhile, WETMSC constructs the self-representation tensor by storing all self-representation matrices from the view dimension, preserving high-order correlation of views based on the tensor nuclear norm. To solve the proposed nonconvex optimization method, we design a half-quadratic (HQ) additive optimization technology and iteratively solve all subproblems under the alternating direction method of multipliers framework. Extensive comparison studies with state-of-the-art clustering methods on real-world datasets and synthetic noisy datasets demonstrate the ascendancy of the proposed WETMSC method
Bi-nuclear tensor Schatten-p norm minimization for multi-view subspace clustering
Multi-view subspace clustering aims to integrate the complementary information contained in different views to facilitate data representation. Currently, low-rank representation
(LRR) serves as a benchmark method. However, we observe that these LRR-based methods would suffer from two issues: limited clustering performance and high computational cost since (1) they usually adopt the nuclear norm with biased estimation to explore the low-rank structures; (2) the singular value decomposition of large-scale matrices is inevitably involved. Moreover, LRR may not achieve low-rank properties in both intra-views and interviews simultaneously. To address the above issues, this paper proposes the Bi-nuclear tensor Schatten-p norm minimization for multi-view subspace clustering (BTMSC). Specifically, BTMSC constructs a third-order tensor from the view dimension to explore the high-order correlation and the subspace structures of multi-view features. The Bi-Nuclear Quasi-Norm (BiN) factorization form of the Schatten-p norm is utilized to factorize the third-order tensor as the product of two small-scale thirdorder tensors, which not only captures the low-rank property of the third-order tensor but also improves the computational efficiency. Finally, an efficient alternating optimization algorithm
is designed to solve the BTMSC model. Extensive experiments with ten datasets of texts and images illustrate the performance superiority of the proposed BTMSC method over state-of-the-art methods
Interactive Virtual Reality Game for Online Learning of Science Subject in Primary Schools
Education plays an important role in nurturing children. COVID-19 pandemic brings challenges or disruptions to school education, due to school closures in some countries. Science subject in primary schools is unique as hands-on experiments are important learning components. Its learning process may be affected, as a new norm of online learning or home-based learning. This research project creates a serious game on science subject for primary school students aging within 10 to 11 years old using virtual reality (VR) technology. It consists of three virtual learning phases. Phase 1 explains theories of science topics on electricity and electric circuits. Phase 2 provides interactive hands-on experiment exercises where students can practice theory knowledge learned in the previous phase. An interactive quiz session is offered to reinforce the learning in Phase 3. Interactive VR features enable primary school students learning abstract science concepts in an interesting way compared to conventional classroom settings. Meticulous design attentions have been placed in the details such as visual instructions, voice instructions, speech tempo, animations, and colorful graphics to create a sense of realism and keep students actively engaged. Preliminary case study has been conducted with 10 students at primary schools in Singapore to evaluate learning effectiveness in this research
Energy-Efficient Nonuniform Content Edge Pre-Caching to Improve Quality of Service in Fog Radio Access Networks
The fog radio access network (F-RAN) equipped with enhanced remote radio heads (eRRHs), which can pre-store some requested files in the edge cache and support mobile edge computing (MEC). To guarantee the quality-of-service (QoS) and energy efficiency of F-RAN, a proper content caching strategy is necessary to avoid coarse content storing locally in the cache or frequent fetching from a centralized baseband signal processing unit (BBU) pool via backhauls. In this paper we investigate the relationships among eRRH/terminal activities and content requesting in F-RANs, and propose an edge content caching strategy for eRRHs by mining out mobile network behavior information. Especially, to attain the inference for appropriate content caching, we establish a pre-mapping containing content preference information and geographical influence by an efficient non-uniformed accelerated matrix completion algorithm. The energy consumption analysis is given in order to discuss the energy saving properties of the proposed edge content caching strategy. Simulation results demonstrate our theoretical analysis on the inference validity of the pre-mapping construction method in static and dynamic cases, and show the energy efficiency achieved by the proposed edge content pre-caching strategy
Solution of the Problem of Smoothing of the Signals at the Preprocessing of Thermal Images
Smoothing two-dimensional digital signals is important for a number of applications. The paper presents a mathematical method and an algorithm for smoothing two-dimensional digital signals. The method is based on minimizing the objective function using criteria of the first-order finite difference between the rows and columns of the image as a measure of distance. To estimate the parameters of the developed method, a non-iterative algorithm is used. The present study shows results of changing the smoothing filter core depending on variations in the method parameters
A GAN-Based Input-Size Flexibility Model for Single Image Dehazing
Image-to-image translation based on generative adversarial network (GAN) has
achieved state-of-the-art performance in various image restoration
applications. Single image dehazing is a typical example, which aims to obtain
the haze-free image of a haze one. This paper concentrates on the challenging
task of single image dehazing. Based on the atmospheric scattering model, we
design a novel model to directly generate the haze-free image. The main
challenge of image dehazing is that the atmospheric scattering model has two
parameters, i.e., transmission map and atmospheric light. When we estimate them
respectively, the errors will be accumulated to compromise dehazing quality.
Considering this reason and various image sizes, we propose a novel input-size
flexibility conditional generative adversarial network (cGAN) for single image
dehazing, which is input-size flexibility at both training and test stages for
image-to-image translation with cGAN framework. We propose a simple and
effective U-type residual network (UR-Net) to combine the generator and adopt
the spatial pyramid pooling (SPP) to design the discriminator. Moreover, the
model is trained with multi-loss function, in which the consistency loss is a
novel designed loss in this paper. We finally build a multi-scale cGAN fusion
model to realize state-of-the-art single image dehazing performance. The
proposed models receive a haze image as input and directly output a haze-free
one. Experimental results demonstrate the effectiveness and efficiency of the
proposed models.Comment: Computer Visio