83 research outputs found

    Exploiting Universum data in AdaBoost using gradient descent

    Full text link
    Recently, Universum data that does not belong to any class of the training data, has been applied for training better classifiers. In this paper, we address a novel boosting algorithm called UAdaBoost that can improve the classification performance of AdaBoost with Universum data. UAdaBoost chooses a function by minimizing the loss for labeled data and Universum data. The cost function is minimized by a greedy, stagewise, functional gradient procedure. Each training stage of UAdaBoost is fast and efficient. The standard AdaBoost weights labeled samples during training iterations while UAdaBoost gives an explicit weighting scheme for Universum samples as well. In addition, this paper describes the practical conditions for the effectiveness of Universum learning. These conditions are based on the analysis of the distribution of ensemble predictions over training samples. Experiments on handwritten digits classification and gender classification problems are presented. As exhibited by our experimental results, the proposed method can obtain superior performances over the standard AdaBoost by selecting proper Universum data. © 2014 Elsevier B.V

    Drawing, Handwriting Processing Analysis: New Advances and Challenges

    No full text
    International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline

    Breast Cancer Classification by Gene Expression Analysis using Hybrid Feature Selection and Hyper-heuristic Adaptive Universum Support Vector Machine

    Get PDF
    Comprehensive assessments of the molecular characteristics of breast cancer from gene expression patterns can aid in the early identification and treatment of tumor patients. The enormous scale of gene expression data obtained through microarray sequencing increases the difficulty of training the classifier due to large-scale features. Selecting pivotal gene features can minimize high dimensionality and the classifier complexity with improved breast cancer detection accuracy. However, traditional filter and wrapper-based selection methods have scalability and adaptability issues in handling complex gene features. This paper presents a hybrid feature selection method of Mutual Information Maximization - Improved Moth Flame Optimization (MIM-IMFO) for gene selection along with an advanced Hyper-heuristic Adaptive Universum Support classification model Vector Machine (HH-AUSVM) to improve cancer detection rates. The hybrid gene selection method is developed by performing filter-based selection using MIM in the first stage followed by the wrapper method in the second stage, to obtain the pivotal features and remove the inappropriate ones. This method improves standard MFO by a hybrid exploration/exploitation phase to accomplish a better trade-off between exploration and exploitation phases. The classifier HH-AUSVM is formulated by integrating the Adaptive Universum learning approach to the hyper- heuristics-based parameter optimized SVM to tackle the class samples imbalance problem. Evaluated on breast cancer gene expression datasets from Mendeley Data Repository, this proposed MIM-IMFO gene selection-based HH-AUSVM classification approach provided better breast cancer detection with high accuracies of 95.67%, 96.52%, 97.97% and 95.5% and less processing time of 4.28, 3.17, 9.45 and 6.31 seconds, respectively

    Analysis and extensions of Universum learning

    Get PDF
    University of Minnesota Ph.D. dissertation. January 2014. Major: Electrical Engineering. Advisor: Vladimir Cherkassky. 1 computer file (PDF); xiii, 140 pages.Many applications of machine learning involve sparse high-dimensional data, where the number of input features is larger than (or comparable to) the number of data samples. Predictive modeling of such data sets is very ill-posed and prone to overfitting. Standard inductive learning methods may not be sufficient for sparse high-dimensional data, and this provides motivation for non-standard learning settings. This thesis investigates such a new learning methodology called Learning through Contradictions or Universum Learning proposed by Vapnik (1998, 2006) for binary classification. This method incorporates a priori knowledge about application data, in the form of additional Universum samples, into the learning process. However, such a new methodology is still not well-understood and represents a challenge to end users. An overall goal of this thesis is to improve understanding of this new Universum learning methodology and to improve its usability for general users. Specific objectives of this thesis include:Development of practical conditions for the effectiveness of Universum Learning for binary classification.Extension of Universum Learning to real life classification settings with different misclassification costs and unbalanced data.Extension of Universum Learning to single-class learning problems.Extension of Universum Learning to regression problems.The outcome of this research will result in better understanding and adoption of the Universum Learning methods for classification, single class learning and regression problems, common in many real life applications

    Intuitionistic Fuzzy Generalized Eigenvalue Proximal Support Vector Machine

    Full text link
    Generalized eigenvalue proximal support vector machine (GEPSVM) has attracted widespread attention due to its simple architecture, rapid execution, and commendable performance. GEPSVM gives equal significance to all samples, thereby diminishing its robustness and efficacy when confronted with real-world datasets containing noise and outliers. In order to reduce the impact of noises and outliers, we propose a novel intuitionistic fuzzy generalized eigenvalue proximal support vector machine (IF-GEPSVM). The proposed IF-GEPSVM assigns the intuitionistic fuzzy score to each training sample based on its location and surroundings in the high-dimensional feature space by using a kernel function. The solution of the IF-GEPSVM optimization problem is obtained by solving a generalized eigenvalue problem. Further, we propose an intuitionistic fuzzy improved GEPSVM (IF-IGEPSVM) by solving the standard eigenvalue decomposition resulting in simpler optimization problems with less computation cost which leads to an efficient intuitionistic fuzzy-based model. We conduct a comprehensive evaluation of the proposed IF-GEPSVM and IF-IGEPSVM models on UCI and KEEL datasets. Moreover, to evaluate the robustness of the proposed IF-GEPSVM and IF-IGEPSVM models, label noise is introduced into some UCI and KEEL datasets. The experimental findings showcase the superior generalization performance of the proposed models when compared to the existing baseline models, both with and without label noise. Our experimental results, supported by rigorous statistical analyses, confirm the superior generalization abilities of the proposed IF-GEPSVM and IF-IGEPSVM models over the baseline models. Furthermore, we implement the proposed IF-GEPSVM and IF-IGEPSVM models on the USPS recognition dataset, yielding promising results that underscore the models' effectiveness in practical and real-world applications

    Evolving GANs: When Contradictions Turn into Compliance

    Full text link
    Limited availability of labeled-data makes any supervised learning problem challenging. Alternative learning settings like semi-supervised and universum learning alleviate the dependency on labeled data, but still require a large amount of unlabeled data, which may be unavailable or expensive to acquire. GAN-based synthetic data generation methods have recently shown promise by generating synthetic samples to improve task at hand. However, these samples cannot be used for other purposes. In this paper, we propose a GAN game which provides improved discriminator accuracy under limited data settings, while generating realistic synthetic data. This provides the added advantage that now the generated data can be used for other similar tasks. We provide the theoretical guarantees and empirical results in support of our approach.Comment: Generative Adversarial Networks, Universum Learning, Semi-Supervised Learnin

    An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis

    Get PDF
    Balancing the accuracy rates of the majority and minority classes is challenging in imbalanced classification. Furthermore, data characteristics have a significant impact on the performance of imbalanced classifiers, which are generally neglected by existing evaluation methods. The objective of this study is to introduce a new criterion to comprehensively evaluate imbalanced classifiers. Specifically, we introduce an efficiency curve that is established using data envelopment analysis without explicit inputs (DEA-WEI), to determine the trade-off between the benefits of improved minority class accuracy and the cost of reduced majority class accuracy. In sequence, we analyze the impact of the imbalanced ratio and typical imbalanced data characteristics on the efficiency of the classifiers. Empirical analyses using 68 imbalanced data reveal that traditional classifiers such as C4.5 and the k-nearest neighbor are more effective on disjunct data, whereas ensemble and undersampling techniques are more effective for overlapping and noisy data. The efficiency of cost-sensitive classifiers decreases dramatically when the imbalanced ratio increases. Finally, we investigate the reasons for the different efficiencies of classifiers on imbalanced data and recommend steps to select appropriate classifiers for imbalanced data based on data characteristics.National Natural Science Foundation of China (NSFC) 71874023 71725001 71771037 7197104

    Adaptive robust AdaBoost-based kernel-free quadratic surface support vector machine with Universum data

    Get PDF
    In this paper, we proposed a novel binary classification framework named adaptive robust AdaBoost-based kernel-free quadratic surface support vector machine with Universum data (A-R-U-SQSSVM). First, we developed R-U-SQSSVM by integrating the capped L2,p L_{2, p} -norm distance metric and the generalized Welsch adaptive loss function to improve the model's robustness and adaptability. Furthermore, we introduced Universum data points into R-U-SQSSVM to enhance the model's generalization performance by incorporating valuable prior knowledge for the classifier. Additionally, we utilized R-U-SQSSVM as a weak classifier and embedded the AdaBoost algorithm within it to obtain a strong classifier, A-R-U-SQSSVM. To effectively solve our model, we transformed it into a quadratic programming problem using the half-quadratic (HQ) optimization algorithm and concave duality. This transformed problem can be solved using convex optimization methods, such as the sequential minimal optimization (SMO) algorithm. Experimental results on University of California, Irvine (UCI) datasets demonstrated the superior classification performance of our method. In large datasets, A-R-U-SQSSVM was hundreds or even a thousand times faster than traditional capped twin support vector machine (CTSVM), SQSSVM

    Support matrix machine: A review

    Full text link
    Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. The SMM method preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models. We also analyze the applications of the SMM model and conclude the article by outlining potential future research avenues and possibilities that may motivate academics to advance the SMM algorithm
    corecore