34 research outputs found

    Coded Residual Transform for Generalizable Deep Metric Learning

    Full text link
    A fundamental challenge in deep metric learning is the generalization capability of the feature embedding network model since the embedding network learned on training classes need to be evaluated on new test classes. To address this challenge, in this paper, we introduce a new method called coded residual transform (CRT) for deep metric learning to significantly improve its generalization capability. Specifically, we learn a set of diversified prototype features, project the feature map onto each prototype, and then encode its features using their projection residuals weighted by their correlation coefficients with each prototype. The proposed CRT method has the following two unique characteristics. First, it represents and encodes the feature map from a set of complimentary perspectives based on projections onto diversified prototypes. Second, unlike existing transformer-based feature representation approaches which encode the original values of features based on global correlation analysis, the proposed coded residual transform encodes the relative differences between the original features and their projected prototypes. Embedding space density and spectral decay analysis show that this multi-perspective projection onto diversified prototypes and coded residual representation are able to achieve significantly improved generalization capability in metric learning. Finally, to further enhance the generalization performance, we propose to enforce the consistency on their feature similarity matrices between coded residual transforms with different sizes of projection prototypes and embedding dimensions. Our extensive experimental results and ablation studies demonstrate that the proposed CRT method outperform the state-of-the-art deep metric learning methods by large margins and improving upon the current best method by up to 4.28% on the CUB dataset.Comment: Accepted by NeurIPS 202

    Tree-Based Backtracking Orthogonal Matching Pursuit for Sparse Signal Reconstruction

    Get PDF
    Compressed sensing (CS) is a theory which exploits the sparsity characteristic of the original signal in signal sampling and coding. By solving an optimization problem, the original sparse signal can be reconstructed accurately. In this paper, a new Tree-based Backtracking Orthogonal Matching Pursuit (TBOMP) algorithm is presented with the idea of the tree model in wavelet domain. The algorithm can convert the wavelet tree structure to the corresponding relations of candidate atoms without any prior information of signal sparsity. Thus, the atom selection process will be more structural and the search space can be narrowed. Moreover, according to the backtracking process, the previous chosen atoms’ reliability can be detected and the unreliable atoms can be deleted at each iteration, which leads to an accurate reconstruction of the signal ultimately. Compared with other compressed sensing algorithms, simulation results show the proposed algorithm’s superior performance to that of several other OMP-type algorithms

    Contrastive Bayesian Analysis for Deep Metric Learning

    Full text link
    Recent methods for deep metric learning have been focusing on designing different contrastive loss functions between positive and negative pairs of samples so that the learned feature embedding is able to pull positive samples of the same class closer and push negative samples from different classes away from each other. In this work, we recognize that there is a significant semantic gap between features at the intermediate feature layer and class labels at the final output layer. To bridge this gap, we develop a contrastive Bayesian analysis to characterize and model the posterior probabilities of image labels conditioned by their features similarity in a contrastive learning setting. This contrastive Bayesian analysis leads to a new loss function for deep metric learning. To improve the generalization capability of the proposed method onto new classes, we further extend the contrastive Bayesian loss with a metric variance constraint. Our experimental results and ablation studies demonstrate that the proposed contrastive Bayesian metric learning method significantly improves the performance of deep metric learning in both supervised and pseudo-supervised scenarios, outperforming existing methods by a large margin.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Robustness meets low-rankness: unified entropy and tensor learning for multi-view subspace clustering

    Get PDF
    In this paper, we develop the weighted error entropy-regularized tensor learning method for multi-view subspace clustering (WETMSC), which integrates the noise disturbance removal and subspace structure discovery into one unified framework. Unlike most existing methods which focus only on the affinity matrix learning for the subspace discovery by different optimization models and simply assume that the noise is independent and identically distributed (i.i.d.), our WETMSC method adopts the weighted error entropy to characterize the underlying noise by assuming that noise is independent and piecewise identically distributed (i.p.i.d.). Meanwhile, WETMSC constructs the self-representation tensor by storing all self-representation matrices from the view dimension, preserving high-order correlation of views based on the tensor nuclear norm. To solve the proposed nonconvex optimization method, we design a half-quadratic (HQ) additive optimization technology and iteratively solve all subproblems under the alternating direction method of multipliers framework. Extensive comparison studies with state-of-the-art clustering methods on real-world datasets and synthetic noisy datasets demonstrate the ascendancy of the proposed WETMSC method

    Bi-nuclear tensor Schatten-p norm minimization for multi-view subspace clustering

    Get PDF
    Multi-view subspace clustering aims to integrate the complementary information contained in different views to facilitate data representation. Currently, low-rank representation (LRR) serves as a benchmark method. However, we observe that these LRR-based methods would suffer from two issues: limited clustering performance and high computational cost since (1) they usually adopt the nuclear norm with biased estimation to explore the low-rank structures; (2) the singular value decomposition of large-scale matrices is inevitably involved. Moreover, LRR may not achieve low-rank properties in both intra-views and interviews simultaneously. To address the above issues, this paper proposes the Bi-nuclear tensor Schatten-p norm minimization for multi-view subspace clustering (BTMSC). Specifically, BTMSC constructs a third-order tensor from the view dimension to explore the high-order correlation and the subspace structures of multi-view features. The Bi-Nuclear Quasi-Norm (BiN) factorization form of the Schatten-p norm is utilized to factorize the third-order tensor as the product of two small-scale thirdorder tensors, which not only captures the low-rank property of the third-order tensor but also improves the computational efficiency. Finally, an efficient alternating optimization algorithm is designed to solve the BTMSC model. Extensive experiments with ten datasets of texts and images illustrate the performance superiority of the proposed BTMSC method over state-of-the-art methods

    Interactive Virtual Reality Game for Online Learning of Science Subject in Primary Schools

    Get PDF
    Education plays an important role in nurturing children. COVID-19 pandemic brings challenges or disruptions to school education, due to school closures in some countries. Science subject in primary schools is unique as hands-on experiments are important learning components. Its learning process may be affected, as a new norm of online learning or home-based learning. This research project creates a serious game on science subject for primary school students aging within 10 to 11 years old using virtual reality (VR) technology. It consists of three virtual learning phases. Phase 1 explains theories of science topics on electricity and electric circuits. Phase 2 provides interactive hands-on experiment exercises where students can practice theory knowledge learned in the previous phase. An interactive quiz session is offered to reinforce the learning in Phase 3. Interactive VR features enable primary school students learning abstract science concepts in an interesting way compared to conventional classroom settings. Meticulous design attentions have been placed in the details such as visual instructions, voice instructions, speech tempo, animations, and colorful graphics to create a sense of realism and keep students actively engaged. Preliminary case study has been conducted with 10 students at primary schools in Singapore to evaluate learning effectiveness in this research

    Energy-Efficient Nonuniform Content Edge Pre-Caching to Improve Quality of Service in Fog Radio Access Networks

    No full text
    The fog radio access network (F-RAN) equipped with enhanced remote radio heads (eRRHs), which can pre-store some requested files in the edge cache and support mobile edge computing (MEC). To guarantee the quality-of-service (QoS) and energy efficiency of F-RAN, a proper content caching strategy is necessary to avoid coarse content storing locally in the cache or frequent fetching from a centralized baseband signal processing unit (BBU) pool via backhauls. In this paper we investigate the relationships among eRRH/terminal activities and content requesting in F-RANs, and propose an edge content caching strategy for eRRHs by mining out mobile network behavior information. Especially, to attain the inference for appropriate content caching, we establish a pre-mapping containing content preference information and geographical influence by an efficient non-uniformed accelerated matrix completion algorithm. The energy consumption analysis is given in order to discuss the energy saving properties of the proposed edge content caching strategy. Simulation results demonstrate our theoretical analysis on the inference validity of the pre-mapping construction method in static and dynamic cases, and show the energy efficiency achieved by the proposed edge content pre-caching strategy

    Solution of the Problem of Smoothing of the Signals at the Preprocessing of Thermal Images

    No full text
    Smoothing two-dimensional digital signals is important for a number of applications. The paper presents a mathematical method and an algorithm for smoothing two-dimensional digital signals. The method is based on minimizing the objective function using criteria of the first-order finite difference between the rows and columns of the image as a measure of distance. To estimate the parameters of the developed method, a non-iterative algorithm is used. The present study shows results of changing the smoothing filter core depending on variations in the method parameters

    A GAN-Based Input-Size Flexibility Model for Single Image Dehazing

    Full text link
    Image-to-image translation based on generative adversarial network (GAN) has achieved state-of-the-art performance in various image restoration applications. Single image dehazing is a typical example, which aims to obtain the haze-free image of a haze one. This paper concentrates on the challenging task of single image dehazing. Based on the atmospheric scattering model, we design a novel model to directly generate the haze-free image. The main challenge of image dehazing is that the atmospheric scattering model has two parameters, i.e., transmission map and atmospheric light. When we estimate them respectively, the errors will be accumulated to compromise dehazing quality. Considering this reason and various image sizes, we propose a novel input-size flexibility conditional generative adversarial network (cGAN) for single image dehazing, which is input-size flexibility at both training and test stages for image-to-image translation with cGAN framework. We propose a simple and effective U-type residual network (UR-Net) to combine the generator and adopt the spatial pyramid pooling (SPP) to design the discriminator. Moreover, the model is trained with multi-loss function, in which the consistency loss is a novel designed loss in this paper. We finally build a multi-scale cGAN fusion model to realize state-of-the-art single image dehazing performance. The proposed models receive a haze image as input and directly output a haze-free one. Experimental results demonstrate the effectiveness and efficiency of the proposed models.Comment: Computer Visio
    corecore