35 research outputs found

    Coded Residual Transform for Generalizable Deep Metric Learning

    Full text link
    A fundamental challenge in deep metric learning is the generalization capability of the feature embedding network model since the embedding network learned on training classes need to be evaluated on new test classes. To address this challenge, in this paper, we introduce a new method called coded residual transform (CRT) for deep metric learning to significantly improve its generalization capability. Specifically, we learn a set of diversified prototype features, project the feature map onto each prototype, and then encode its features using their projection residuals weighted by their correlation coefficients with each prototype. The proposed CRT method has the following two unique characteristics. First, it represents and encodes the feature map from a set of complimentary perspectives based on projections onto diversified prototypes. Second, unlike existing transformer-based feature representation approaches which encode the original values of features based on global correlation analysis, the proposed coded residual transform encodes the relative differences between the original features and their projected prototypes. Embedding space density and spectral decay analysis show that this multi-perspective projection onto diversified prototypes and coded residual representation are able to achieve significantly improved generalization capability in metric learning. Finally, to further enhance the generalization performance, we propose to enforce the consistency on their feature similarity matrices between coded residual transforms with different sizes of projection prototypes and embedding dimensions. Our extensive experimental results and ablation studies demonstrate that the proposed CRT method outperform the state-of-the-art deep metric learning methods by large margins and improving upon the current best method by up to 4.28% on the CUB dataset.Comment: Accepted by NeurIPS 202

    Tree-Based Backtracking Orthogonal Matching Pursuit for Sparse Signal Reconstruction

    Get PDF
    Compressed sensing (CS) is a theory which exploits the sparsity characteristic of the original signal in signal sampling and coding. By solving an optimization problem, the original sparse signal can be reconstructed accurately. In this paper, a new Tree-based Backtracking Orthogonal Matching Pursuit (TBOMP) algorithm is presented with the idea of the tree model in wavelet domain. The algorithm can convert the wavelet tree structure to the corresponding relations of candidate atoms without any prior information of signal sparsity. Thus, the atom selection process will be more structural and the search space can be narrowed. Moreover, according to the backtracking process, the previous chosen atoms’ reliability can be detected and the unreliable atoms can be deleted at each iteration, which leads to an accurate reconstruction of the signal ultimately. Compared with other compressed sensing algorithms, simulation results show the proposed algorithm’s superior performance to that of several other OMP-type algorithms

    Contrastive Bayesian Analysis for Deep Metric Learning

    Full text link
    Recent methods for deep metric learning have been focusing on designing different contrastive loss functions between positive and negative pairs of samples so that the learned feature embedding is able to pull positive samples of the same class closer and push negative samples from different classes away from each other. In this work, we recognize that there is a significant semantic gap between features at the intermediate feature layer and class labels at the final output layer. To bridge this gap, we develop a contrastive Bayesian analysis to characterize and model the posterior probabilities of image labels conditioned by their features similarity in a contrastive learning setting. This contrastive Bayesian analysis leads to a new loss function for deep metric learning. To improve the generalization capability of the proposed method onto new classes, we further extend the contrastive Bayesian loss with a metric variance constraint. Our experimental results and ablation studies demonstrate that the proposed contrastive Bayesian metric learning method significantly improves the performance of deep metric learning in both supervised and pseudo-supervised scenarios, outperforming existing methods by a large margin.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligenc

    DCPoint: global-local dual contrast for self-supervised representation learning of 3D point clouds

    Get PDF
    In recent years, 3D vision has gained increasing prominence in practical applications such as autonomous driving and robotics. However, the scarcity of large labeled point cloud datasets continues to be a bottleneck for deep networks. Self-supervised representation learning (SRL) has emerged as an effective approach to alleviate this issue by pre-training general feature encoders without requiring human annotations. Existing contrastive SRL methods for 3D point clouds have predominantly concentrated on object representations from a global or point perspective. They overlook essential local geometry information, thereby constraining the generalizability of pre-trained models. To address these challenges, we propose a local contrast module as an intermediate level between the scene and point levels. It is then integrated with a global contrast module to form a dual contrast method known as DCPoint. The local contrast module operates on point-wise representations of objects and designs contrastive pairs based on the spatial information of point clouds. It effectively addresses the challenges posed by the sparsity and irregularity of point clouds and imperfect partition issues. The point-wise local contrast module aims to enhance the internal connections between the components within the point cloud, while the global contrast module introduces semantic information about individual instances. Experimental results demonstrate the effectiveness of DCPoint across various downstream tasks on synthetic and real-world datasets. It consistently outperforms previously reported SRL methods and the randomly initialized counterparts. Additionally, the proposed local contrast module can enhance the performances of other SRL methods

    Robustness meets low-rankness: unified entropy and tensor learning for multi-view subspace clustering

    Get PDF
    In this paper, we develop the weighted error entropy-regularized tensor learning method for multi-view subspace clustering (WETMSC), which integrates the noise disturbance removal and subspace structure discovery into one unified framework. Unlike most existing methods which focus only on the affinity matrix learning for the subspace discovery by different optimization models and simply assume that the noise is independent and identically distributed (i.i.d.), our WETMSC method adopts the weighted error entropy to characterize the underlying noise by assuming that noise is independent and piecewise identically distributed (i.p.i.d.). Meanwhile, WETMSC constructs the self-representation tensor by storing all self-representation matrices from the view dimension, preserving high-order correlation of views based on the tensor nuclear norm. To solve the proposed nonconvex optimization method, we design a half-quadratic (HQ) additive optimization technology and iteratively solve all subproblems under the alternating direction method of multipliers framework. Extensive comparison studies with state-of-the-art clustering methods on real-world datasets and synthetic noisy datasets demonstrate the ascendancy of the proposed WETMSC method

    Bi-nuclear tensor Schatten-p norm minimization for multi-view subspace clustering

    Get PDF
    Multi-view subspace clustering aims to integrate the complementary information contained in different views to facilitate data representation. Currently, low-rank representation (LRR) serves as a benchmark method. However, we observe that these LRR-based methods would suffer from two issues: limited clustering performance and high computational cost since (1) they usually adopt the nuclear norm with biased estimation to explore the low-rank structures; (2) the singular value decomposition of large-scale matrices is inevitably involved. Moreover, LRR may not achieve low-rank properties in both intra-views and interviews simultaneously. To address the above issues, this paper proposes the Bi-nuclear tensor Schatten-p norm minimization for multi-view subspace clustering (BTMSC). Specifically, BTMSC constructs a third-order tensor from the view dimension to explore the high-order correlation and the subspace structures of multi-view features. The Bi-Nuclear Quasi-Norm (BiN) factorization form of the Schatten-p norm is utilized to factorize the third-order tensor as the product of two small-scale thirdorder tensors, which not only captures the low-rank property of the third-order tensor but also improves the computational efficiency. Finally, an efficient alternating optimization algorithm is designed to solve the BTMSC model. Extensive experiments with ten datasets of texts and images illustrate the performance superiority of the proposed BTMSC method over state-of-the-art methods

    Interactive Virtual Reality Game for Online Learning of Science Subject in Primary Schools

    Get PDF
    Education plays an important role in nurturing children. COVID-19 pandemic brings challenges or disruptions to school education, due to school closures in some countries. Science subject in primary schools is unique as hands-on experiments are important learning components. Its learning process may be affected, as a new norm of online learning or home-based learning. This research project creates a serious game on science subject for primary school students aging within 10 to 11 years old using virtual reality (VR) technology. It consists of three virtual learning phases. Phase 1 explains theories of science topics on electricity and electric circuits. Phase 2 provides interactive hands-on experiment exercises where students can practice theory knowledge learned in the previous phase. An interactive quiz session is offered to reinforce the learning in Phase 3. Interactive VR features enable primary school students learning abstract science concepts in an interesting way compared to conventional classroom settings. Meticulous design attentions have been placed in the details such as visual instructions, voice instructions, speech tempo, animations, and colorful graphics to create a sense of realism and keep students actively engaged. Preliminary case study has been conducted with 10 students at primary schools in Singapore to evaluate learning effectiveness in this research

    Energy-Efficient Nonuniform Content Edge Pre-Caching to Improve Quality of Service in Fog Radio Access Networks

    No full text
    The fog radio access network (F-RAN) equipped with enhanced remote radio heads (eRRHs), which can pre-store some requested files in the edge cache and support mobile edge computing (MEC). To guarantee the quality-of-service (QoS) and energy efficiency of F-RAN, a proper content caching strategy is necessary to avoid coarse content storing locally in the cache or frequent fetching from a centralized baseband signal processing unit (BBU) pool via backhauls. In this paper we investigate the relationships among eRRH/terminal activities and content requesting in F-RANs, and propose an edge content caching strategy for eRRHs by mining out mobile network behavior information. Especially, to attain the inference for appropriate content caching, we establish a pre-mapping containing content preference information and geographical influence by an efficient non-uniformed accelerated matrix completion algorithm. The energy consumption analysis is given in order to discuss the energy saving properties of the proposed edge content caching strategy. Simulation results demonstrate our theoretical analysis on the inference validity of the pre-mapping construction method in static and dynamic cases, and show the energy efficiency achieved by the proposed edge content pre-caching strategy

    Solution of the Problem of Smoothing of the Signals at the Preprocessing of Thermal Images

    No full text
    Smoothing two-dimensional digital signals is important for a number of applications. The paper presents a mathematical method and an algorithm for smoothing two-dimensional digital signals. The method is based on minimizing the objective function using criteria of the first-order finite difference between the rows and columns of the image as a measure of distance. To estimate the parameters of the developed method, a non-iterative algorithm is used. The present study shows results of changing the smoothing filter core depending on variations in the method parameters
    corecore