147 research outputs found

    A Validation Approach to Over-parameterized Matrix and Image Recovery

    Full text link
    In this paper, we study the problem of recovering a low-rank matrix from a number of noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a prior and use an overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground-truth. We then solve the associated nonconvex problem using gradient descent with small random initialization. We show that as long as the measurement operators satisfy the restricted isometry property (RIP) with its rank parameter scaling with the rank of ground-truth matrix rather than scaling with the overspecified matrix variable, gradient descent iterations are on a particular trajectory towards the ground-truth matrix and achieve nearly information-theoretically optimal recovery when stop appropriately. We then propose an efficient early stopping strategy based on the common hold-out method and show that it detects nearly optimal estimator provably. Moreover, experiments show that the proposed validation approach can also be efficiently used for image restoration with deep image prior which over-parameterizes an image with a deep network.Comment: 29 pages and 9 figure

    Nanoporous Structure of Sintered Metal Powder Heat Exchanger in Dilution Refrigeration: A Numerical Study

    Full text link
    We use LAMMPS to randomly pack hard spheres to simulate the heat exchanger, where the hard spheres represent sintered metal particles in the heat exchanger. We simulated the heat exchanger under different sphere radii and different packing fractions of the metal particle and researched pore space. To improve the performance of the heat exchanger, we adopted this simulation method to explore when the packing fraction is 65%, the optimal sintering particle radius in the heat exchanger is 30~35nm.Comment: 5 pages,3 figures, one tabl

    Are All Losses Created Equal: A Neural Collapse Perspective

    Full text link
    While cross entropy (CE) is the most commonly used loss to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so on. This paper studies the choice of loss function by examining the last-layer features of deep networks, drawing inspiration from a recent line work showing that the global optimal solution of CE and mean-square-error (MSE) losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large networks trained until convergence, (i) all features of the same class collapse to the corresponding class mean and (ii) the means associated with different classes are in a configuration where their pairwise distances are all equal and maximized. We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence, all relevant losses(i.e., CE, LS, FL, MSE) produce equivalent features on training data. Based on the unconstrained feature model assumption, we provide either the global landscape analysis for LS loss or the local landscape analysis for FL loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions either in the global scope for LS loss or in the local scope for FL loss near the optimal solution. The experiments further show that Neural Collapse features obtained from all relevant losses lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence.Comment: 32 page, 10 figures, NeurIPS 202

    Generalized Neural Collapse for a Large Number of Classes

    Full text link
    Neural collapse provides an elegant mathematical characterization of learned last layer representations (a.k.a. features) and classifier weights in deep classification models. Such results not only provide insights but also motivate new techniques for improving practical deep models. However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space. This paper extends neural collapse to cases where the number of classes are much larger than the dimension of feature space, which broadly occur for language models, retrieval systems, and face recognition applications. We show that the features and classifier exhibit a generalized neural collapse phenomenon, where the minimum one-vs-rest margins is maximized.We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks. Moreover, we provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint, under certain technical conditions on feature dimension and number of classes.Comment: 32 pages, 12 figure

    Multi-scale distribution of coal fractures based on CT digital core deep learning

    Get PDF
    In order to realize high-precision and high-efficiency identification of multi-scale distribution characteristics of coal fractures, carry out the study of multi-scale distribution characteristics identification methods based on CT digital core deep learning. Industrial CT scanning system is used to collect a large number of coal original CT digital core information array, the CT digital core information array is converted into a two-dimensional gray-scale image and then it is divided into square images of different scales and the image brightness is enhanced to different levels as training samples, Finally, the construction and optimization of model parameters of AlexNet, ResNet-18, GoogLeNet and Inception-V3 models for the identification of CT-containing fractures are realized by Matlab platform. Study the recognition accuracy and verification accuracy of different model training under different number of training samples; Study the accuracy, calculation efficiency and training time of different models for images with different scales and brightness under the same training sample, obtain the optimal model for calculating the fractal dimension of two-dimensional CT images with fractures, then, the fractal distribution characteristics of each fracture image are calculated according to the statistical method of box-counting dimension, compared with the traditional binarization method and human eye recognition method, The applicability of the multi-scale distribution characteristics identification method of coal fractures based on CT digital core deep learning is verified. The result shows: ① ResNet-18 model is the optimal model for calculating the fractal dimension of two-dimensional CT images with cracks when the image sample is brightness 4 and the scale is 3.5 mm to 21 mm, the model has high accuracy and short training time in calculating the fractal dimension of two-dimensional CT fracture images. ② Compared with the traditional binarization method, the multi-scale recognition method of coal fracture based on CT digital core deep learning has the advantages of fast speed, high accuracy and is not easily affected by impurities in coal
    corecore