26,311 research outputs found

    A Novel Model of Working Set Selection for SMO Decomposition Methods

    Full text link
    In the process of training Support Vector Machines (SVMs) by decomposition methods, working set selection is an important technique, and some exciting schemes were employed into this field. To improve working set selection, we propose a new model for working set selection in sequential minimal optimization (SMO) decomposition methods. In this model, it selects B as working set without reselection. Some properties are given by simple proof, and experiments demonstrate that the proposed method is in general faster than existing methods.Comment: 8 pages, 12 figures, it was submitted to IEEE International conference of Tools on Artificial Intelligenc

    Identification of quasi-optimal regions in the design space using surrogate modeling

    Get PDF
    The use of Surrogate Based Optimization (SBO) is widely spread in engineering design to find optimal performance characteristics of expensive simulations (forward analysis: from input to optimal output). However, often the practitioner knows a priori the desired performance and is interested in finding the associated input parameters (reverse analysis: from desired output to input). A popular method to solve such reverse (inverse) problems is to minimize the error between the simulated performance and the desired goal. However, there might be multiple quasi-optimal solutions to the problem. In this paper, the authors propose a novel method to efficiently solve inverse problems and to sample Quasi-Optimal Regions (QORs) in the input (design) space more densely. The development of this technique, based on the probability of improvement criterion and kriging models, is driven by a real-life problem from bio-mechanics, i.e., determining the elasticity of the (rabbit) tympanic membrane, a membrane that converts acoustic sound wave into vibrations of the middle ear ossicular bones

    Predicting regression test failures using genetic algorithm-selected dynamic performance analysis metrics

    Get PDF
    A novel framework for predicting regression test failures is proposed. The basic principle embodied in the framework is to use performance analysis tools to capture the runtime behaviour of a program as it executes each test in a regression suite. The performance information is then used to build a dynamically predictive model of test outcomes. Our framework is evaluated using a genetic algorithm for dynamic metric selection in combination with state-of-the-art machine learning classifiers. We show that if a program is modified and some tests subsequently fail, then it is possible to predict with considerable accuracy which of the remaining tests will also fail which can be used to help prioritise tests in time constrained testing environments

    Learning Surrogate Models of Document Image Quality Metrics for Automated Document Image Processing

    Full text link
    Computation of document image quality metrics often depends upon the availability of a ground truth image corresponding to the document. This limits the applicability of quality metrics in applications such as hyperparameter optimization of image processing algorithms that operate on-the-fly on unseen documents. This work proposes the use of surrogate models to learn the behavior of a given document quality metric on existing datasets where ground truth images are available. The trained surrogate model can later be used to predict the metric value on previously unseen document images without requiring access to ground truth images. The surrogate model is empirically evaluated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets

    A Convex Feature Learning Formulation for Latent Task Structure Discovery

    Full text link
    This paper considers the multi-task learning problem and in the setting where some relevant features could be shared across few related tasks. Most of the existing methods assume the extent to which the given tasks are related or share a common feature space to be known apriori. In real-world applications however, it is desirable to automatically discover the groups of related tasks that share a feature space. In this paper we aim at searching the exponentially large space of all possible groups of tasks that may share a feature space. The main contribution is a convex formulation that employs a graph-based regularizer and simultaneously discovers few groups of related tasks, having close-by task parameters, as well as the feature space shared within each group. The regularizer encodes an important structure among the groups of tasks leading to an efficient algorithm for solving it: if there is no feature space under which a group of tasks has close-by task parameters, then there does not exist such a feature space for any of its supersets. An efficient active set algorithm that exploits this simplification and performs a clever search in the exponentially large space is presented. The algorithm is guaranteed to solve the proposed formulation (within some precision) in a time polynomial in the number of groups of related tasks discovered. Empirical results on benchmark datasets show that the proposed formulation achieves good generalization and outperforms state-of-the-art multi-task learning algorithms in some cases.Comment: ICML201

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    corecore