25,010 research outputs found

    Screening Rules for Convex Problems

    Get PDF
    We propose a new framework for deriving screening rules for convex optimization problems. Our approach covers a large class of constrained and penalized optimization formulations, and works in two steps. First, given any approximate point, the structure of the objective function and the duality gap is used to gather information on the optimal solution. In the second step, this information is used to produce screening rules, i.e. safely identifying unimportant weight variables of the optimal solution. Our general framework leads to a large variety of useful existing as well as new screening rules for many applications. For example, we provide new screening rules for general simplex and L1L_1-constrained problems, Elastic Net, squared-loss Support Vector Machines, minimum enclosing ball, as well as structured norm regularized problems, such as group lasso

    Scaling Up Large-scale Sparse Learning and Its Application to Medical Imaging

    Get PDF
    abstract: Large-scale â„“1\ell_1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to scaling up the optimization problem in parallel. Parallel solvers run multiple cores on a shared memory system or a distributed environment to speed up the computation, while the practical usage is limited by the huge dimension in the feature space and synchronization problems. In this dissertation, I carry out the research along the direction with particular focuses on scaling up the optimization of sparse learning for supervised and unsupervised learning problems. For the supervised learning, I firstly propose an asynchronous parallel solver to optimize the large-scale sparse learning model in a multithreading environment. Moreover, I propose a distributed framework to conduct the learning process when the dataset is distributed stored among different machines. Then the proposed model is further extended to the studies of risk genetic factors for Alzheimer's Disease (AD) among different research institutions, integrating a group feature selection framework to rank the top risk SNPs for AD. For the unsupervised learning problem, I propose a highly efficient solver, termed Stochastic Coordinate Coding (SCC), scaling up the optimization of dictionary learning and sparse coding problems. The common issue for the medical imaging research is that the longitudinal features of patients among different time points are beneficial to study together. To further improve the dictionary learning model, I propose a multi-task dictionary learning method, learning the different task simultaneously and utilizing shared and individual dictionary to encode both consistent and changing imaging features.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    High-Dimensional Screening Using Multiple Grouping of Variables

    Full text link
    Screening is the problem of finding a superset of the set of non-zero entries in an unknown p-dimensional vector \beta* given n noisy observations. Naturally, we want this superset to be as small as possible. We propose a novel framework for screening, which we refer to as Multiple Grouping (MuG), that groups variables, performs variable selection over the groups, and repeats this process multiple number of times to estimate a sequence of sets that contains the non-zero entries in \beta*. Screening is done by taking an intersection of all these estimated sets. The MuG framework can be used in conjunction with any group based variable selection algorithm. In the high-dimensional setting, where p >> n, we show that when MuG is used with the group Lasso estimator, screening can be consistently performed without using any tuning parameter. Our numerical simulations clearly show the merits of using the MuG framework in practice.Comment: This paper will appear in the IEEE Transactions on Signal Processing. See http://www.ima.umn.edu/~dvats/MuGScreening.html for more detail

    Chemical structure matching using correlation matrix memories

    Get PDF
    This paper describes the application of the Relaxation By Elimination (RBE) method to matching the 3D structure of molecules in chemical databases within the frame work of binary correlation matrix memories. The paper illustrates that, when combined with distributed representations, the method maps well onto these networks, allowing high performance implementation in parallel systems. It outlines the motivation, the neural architecture, the RBE method and presents some results of matching small molecules against a database of 100,000 models

    Supersparse Linear Integer Models for Optimized Medical Scoring Systems

    Full text link
    Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by solving an integer program that directly encodes measures of accuracy (the 0-1 loss) and sparsity (the â„“0\ell_0-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce highly tailored models without parameter tuning. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM was used to create a highly tailored scoring system for sleep apnea screeningComment: This version reflects our findings on SLIM as of January 2016 (arXiv:1306.5860 and arXiv:1405.4047 are out-of-date). The final published version of this articled is available at http://www.springerlink.co

    thermogram Breast Cancer Detection : a comparative study of two machine learning techniques

    Get PDF
    Breast cancer is considered one of the major threats for women’s health all over the world. The World Health Organization (WHO) has reported that 1 in every 12 women could be subject to a breast abnormality during her lifetime. To increase survival rates, it is found that it is very effective to early detect breast cancer. Mammography-based breast cancer screening is the leading technology to achieve this aim. However, it still can not deal with patients with dense breast nor with tumor size less than 2 mm. Thermography-based breast cancer approach can address these problems. In this paper, a thermogram-based breast cancer detection approach is proposed. This approach consists of four phases: (1) Image Pre-processing using homomorphic filtering, top-hat transform and adaptive histogram equalization, (2) ROI Segmentation using binary masking and K-mean clustering, (3) feature extraction using signature boundary, and (4) classification in which two classifiers, Extreme Learning Machine (ELM) and Multilayer Perceptron (MLP), were used and compared. The proposed approach is evaluated using the public dataset, DMR-IR. Various experiment scenarios (e.g., integration between geometrical feature extraction, and textural features extraction) were designed and evaluated using different measurements (i.e., accuracy, sensitivity, and specificity). The results showed that ELM-based results were better than MLP-based ones with more than 19%
    • …
    corecore