11,337 research outputs found

    Supervised classification and mathematical optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data.Ministerio de Ciencia e InnovaciónJunta de Andalucí

    Technical support for creating an artificial intelligence system for feature extraction and experimental design

    Get PDF
    Techniques for classifying objects into groups or clases go under many different names including, most commonly, cluster analysis. Mathematically, the general problem is to find a best mapping of objects into an index set consisting of class identifiers. When an a priori grouping of objects exists, the process of deriving the classification rules from samples of classified objects is known as discrimination. When such rules are applied to objects of unknown class, the process is denoted classification. The specific problem addressed involves the group classification of a set of objects that are each associated with a series of measurements (ratio, interval, ordinal, or nominal levels of measurement). Each measurement produces one variable in a multidimensional variable space. Cluster analysis techniques are reviewed and methods for incuding geographic location, distance measures, and spatial pattern (distribution) as parameters in clustering are examined. For the case of patterning, measures of spatial autocorrelation are discussed in terms of the kind of data (nominal, ordinal, or interval scaled) to which they may be applied

    Supervised Classification and Mathematical Optimization

    Get PDF
    Data Mining techniques often ask for the resolution of optimization problems. Supervised Classification, and, in particular, Support Vector Machines, can be seen as a paradigmatic instance. In this paper, some links between Mathematical Optimization methods and Supervised Classification are emphasized. It is shown that many different areas of Mathematical Optimization play a central role in off-the-shelf Supervised Classification methods. Moreover, Mathematical Optimization turns out to be extremely useful to address important issues in Classification, such as identifying relevant variables, improving the interpretability of classifiers or dealing with vagueness/noise in the data

    Physico-chemical foundations underpinning microarray and next-generation sequencing experiments

    Get PDF
    Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized

    Automatic Algorithm Selection for Pseudo-Boolean Optimization with Given Computational Time Limits

    Full text link
    Machine learning (ML) techniques have been proposed to automatically select the best solver from a portfolio of solvers, based on predicted performance. These techniques have been applied to various problems, such as Boolean Satisfiability, Traveling Salesperson, Graph Coloring, and others. These methods, known as meta-solvers, take an instance of a problem and a portfolio of solvers as input. They then predict the best-performing solver and execute it to deliver a solution. Typically, the quality of the solution improves with a longer computational time. This has led to the development of anytime selectors, which consider both the instance and a user-prescribed computational time limit. Anytime meta-solvers predict the best-performing solver within the specified time limit. Constructing an anytime meta-solver is considerably more challenging than building a meta-solver without the "anytime" feature. In this study, we focus on the task of designing anytime meta-solvers for the NP-hard optimization problem of Pseudo-Boolean Optimization (PBO), which generalizes Satisfiability and Maximum Satisfiability problems. The effectiveness of our approach is demonstrated via extensive empirical study in which our anytime meta-solver improves dramatically on the performance of Mixed Integer Programming solver Gurobi, which is the best-performing single solver in the portfolio. For example, out of all instances and time limits for which Gurobi failed to find feasible solutions, our meta-solver identified feasible solutions for 47% of these

    Optimization and Machine Learning Methods for Diagnostic Testing of Prostate Cancer

    Full text link
    Technological advances in biomarkers and imaging tests are creating new avenues to advance precision health for early detection of cancer. These advances have resulted in multiple layers of information that can be used to make clinical decisions, but how to best use these multiple sources of information is a challenging engineering problem due to the high cost and imperfect sensitivity and specificity of these tests. Questions that need to be addressed include which diagnostic tests to choose and how to best integrate them, in order to optimally balance the competing goals of early disease detection and minimal cost and harm from unnecessary testing. To study these research questions, we present new optimization-based models and data-driven analytic methods in three parts to improve early detection of prostate cancer (PCa). In the first part, we develop and validate predictive models to assess individual PCa risk using known clinical risk factors. Because not all men with newly-diagnosed PCa received imaging at diagnosis, we use an established method to correct for verification bias to evaluate the accuracy of published imaging guidelines. In addition to the published guidelines, we implement advanced classification modeling techniques to develop accurate classification rules identifying which patients should receive imaging. We propose a new algorithm for a classification model that considers information of patients with unverified disease and the high cost of misclassifying a metastatic patient. We summarize our development and implementation of state-wide, evidence-based imaging criteria that weigh the benefits and harms of radiological imaging for detection of metastatic PCa. In the second part of this thesis, we combine optimization and machine learning approaches into a robust optimization framework to design imaging guidelines that can account for imperfect calibration of predictions. We investigate efficient and effective ways to combine multiple medical diagnostic tests where the result of one test may be used to predict the outcome of another. We analyze the properties of the proposed optimization models from the perspectives of multiple stakeholders, and we present the results of fast approximation methods that we show can be used to solve large-scale models. In the third and final part of this thesis, we investigate the optimal design of composite multi-biomarker tests to achieve early detection of prostate cancer. Biomarker tests vary significantly in cost, and cause false positive and false negative results, leading to serious health implications for patients. Since no single biomarker on its own is considered satisfactory, we utilize simulation and statistical methods to develop the optimal diagnosis procedure for early detection of PCa consisting of a sequence of biomarker tests, balancing the benefits of early detection, such as increased survival, with the harms of testing, such as unnecessary prostate biopsies. In this dissertation, we identify new principles and methods to guide the design of early detection protocols for PCa using new diagnostic technologies. We provide important clinical evidence that can be used to improve health outcomes of patients while reducing wasteful application of diagnostic tests to patients for whom they are not effective. Moreover, some of the findings of this dissertation have been implemented directly into clinical practice in the state of Michigan. The models and methodologies we present in this thesis are not limited to PCa, and can be applied to a broad range of chronic diseases for which diagnostic tests are available.PHDIndustrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/143976/1/smerdan_1.pd
    corecore