14,896 research outputs found

    How deep is deep enough? -- Quantifying class separability in the hidden layers of deep neural networks

    Full text link
    Deep neural networks typically outperform more traditional machine learning models in their ability to classify complex data, and yet is not clear how the individual hidden layers of a deep network contribute to the overall classification performance. We thus introduce a Generalized Discrimination Value (GDV) that measures, in a non-invasive manner, how well different data classes separate in each given network layer. The GDV can be used for the automatic tuning of hyper-parameters, such as the width profile and the total depth of a network. Moreover, the layer-dependent GDV(L) provides new insights into the data transformations that self-organize during training: In the case of multi-layer perceptrons trained with error backpropagation, we find that classification of highly complex data sets requires a temporal {\em reduction} of class separability, marked by a characteristic 'energy barrier' in the initial part of the GDV(L) curve. Even more surprisingly, for a given data set, the GDV(L) is running through a fixed 'master curve', independently from the total number of network layers. Furthermore, applying the GDV to Deep Belief Networks reveals that also unsupervised training with the Contrastive Divergence method can systematically increase class separability over tens of layers, even though the system does not 'know' the desired class labels. These results indicate that the GDV may become a useful tool to open the black box of deep learning

    Dispelling Classes Gradually to Improve Quality of Feature Reduction Approaches

    Full text link
    Feature reduction is an important concept which is used for reducing dimensions to decrease the computation complexity and time of classification. Since now many approaches have been proposed for solving this problem, but almost all of them just presented a fix output for each input dataset that some of them aren't satisfied cases for classification. In this we proposed an approach as processing input dataset to increase accuracy rate of each feature extraction methods. First of all, a new concept called dispelling classes gradually (DCG) is proposed to increase separability of classes based on their labels. Next, this method is used to process input dataset of the feature reduction approaches to decrease the misclassification error rate of their outputs more than when output is achieved without any processing. In addition our method has a good quality to collate with noise based on adapting dataset with feature reduction approaches. In the result part, two conditions (With process and without that) are compared to support our idea by using some of UCI datasets.Comment: 11 Pages, 5 Figure, 7 Tables; Advanced Computing: An International Journal (ACIJ), Vol.3, No.3, May 201

    Provable Benefit of Mixup for Finding Optimal Decision Boundaries

    Full text link
    We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem. For a family of data distributions with a separability constant κ\kappa, we analyze how well the optimal classifier in terms of training loss aligns with the optimal one in test accuracy (i.e., Bayes optimal classifier). For vanilla training without augmentation, we uncover an interesting phenomenon named the curse of separability. As we increase κ\kappa to make the data distribution more separable, the sample complexity of vanilla training increases exponentially in κ\kappa; perhaps surprisingly, the task of finding optimal decision boundaries becomes harder for more separable distributions. For Mixup training, we show that Mixup mitigates this problem by significantly reducing the sample complexity. To this end, we develop new concentration results applicable to n2n^2 pair-wise augmented data points constructed from nn independent data, by carefully dealing with dependencies between overlapping pairs. Lastly, we study other masking-based Mixup-style techniques and show that they can distort the training loss and make its minimizer converge to a suboptimal classifier in terms of test accuracy.Comment: ICML 2023 camera-ready version; 48 page

    Problems with Jumping Coefficients

    Get PDF
    We study separability properties of solutions of elliptic equations with piecewise constant coefficients in R d, d ≥ 2. Besides that, we develop efficient tensor-structured preconditioner for the diffusion equation with variable coefficients. It is based only on rank structured decomposition of the tensor of reciprocal coefficient and on the decomposition of the inverse of the Laplacian operator. It can be applied to full vector with linear-logarithmic complexity in the number of unknowns N. It also allows lowrank tensor representation, which has linear complexity in dimension d, hence, it gets rid of the “curse of dimensionality ” and can be used for large values of d. Extensive numerical tests are presented. AMS Subject Classification: 65F30, 65F50, 65N35, 65F10 Key words: structured matrices, elliptic operators, Poisson equation, matrix approximations

    Multi-learner based recursive supervised training

    Get PDF
    In this paper, we propose the Multi-Learner Based Recursive Supervised Training (MLRT) algorithm which uses the existing framework of recursive task decomposition, by training the entire dataset, picking out the best learnt patterns, and then repeating the process with the remaining patterns. Instead of having a single learner to classify all datasets during each recursion, an appropriate learner is chosen from a set of three learners, based on the subset of data being trained, thereby avoiding the time overhead associated with the genetic algorithm learner utilized in previous approaches. In this way MLRT seeks to identify the inherent characteristics of the dataset, and utilize it to train the data accurately and efficiently. We observed that empirically, MLRT performs considerably well as compared to RPHP and other systems on benchmark data with 11% improvement in accuracy on the SPAM dataset and comparable performances on the VOWEL and the TWO-SPIRAL problems. In addition, for most datasets, the time taken by MLRT is considerably lower than the other systems with comparable accuracy. Two heuristic versions, MLRT-2 and MLRT-3 are also introduced to improve the efficiency in the system, and to make it more scalable for future updates. The performance in these versions is similar to the original MLRT system
    corecore