1,238 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Otimização multi-objetivo em aprendizado de máquina

    Get PDF
    Orientador: Fernando José Von ZubenTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Regressão logística multinomial regularizada, classificação multi-rótulo e aprendizado multi-tarefa são exemplos de problemas de aprendizado de máquina em que objetivos conflitantes, como funções de perda e penalidades que promovem regularização, devem ser simultaneamente minimizadas. Portanto, a perspectiva simplista de procurar o modelo de aprendizado com o melhor desempenho deve ser substituída pela proposição e subsequente exploração de múltiplos modelos de aprendizado eficientes, cada um caracterizado por um compromisso (trade-off) distinto entre os objetivos conflitantes. Comitês de máquinas e preferências a posteriori do tomador de decisão podem ser implementadas visando explorar adequadamente este conjunto diverso de modelos de aprendizado eficientes, em busca de melhoria de desempenho. A estrutura conceitual multi-objetivo para aprendizado de máquina é suportada por três etapas: (1) Modelagem multi-objetivo de cada problema de aprendizado, destacando explicitamente os objetivos conflitantes envolvidos; (2) Dada a formulação multi-objetivo do problema de aprendizado, por exemplo, considerando funções de perda e termos de penalização como objetivos conflitantes, soluções eficientes e bem distribuídas ao longo da fronteira de Pareto são obtidas por um solver determinístico e exato denominado NISE (do inglês Non-Inferior Set Estimation); (3) Esses modelos de aprendizado eficientes são então submetidos a um processo de seleção de modelos que opera com preferências a posteriori, ou a filtragem e agregação para a síntese de ensembles. Como o NISE é restrito a problemas de dois objetivos, uma extensão do NISE capaz de lidar com mais de dois objetivos, denominada MONISE (do inglês Many-Objective NISE), também é proposta aqui, sendo uma contribuição adicional que expande a aplicabilidade da estrutura conceitual proposta. Para atestar adequadamente o mérito da nossa abordagem multi-objetivo, foram realizadas investigações mais específicas, restritas à aprendizagem de modelos lineares regularizados: (1) Qual é o mérito relativo da seleção a posteriori de um único modelo de aprendizado, entre os produzidos pela nossa proposta, quando comparado com outras abordagens de modelo único na literatura? (2) O nível de diversidade dos modelos de aprendizado produzidos pela nossa proposta é superior àquele alcançado por abordagens alternativas dedicadas à geração de múltiplos modelos de aprendizado? (3) E quanto à qualidade de predição da filtragem e agregação dos modelos de aprendizado produzidos pela nossa proposta quando aplicados a: (i) classificação multi-classe, (ii) classificação desbalanceada, (iii) classificação multi-rótulo, (iv) aprendizado multi-tarefa, (v) aprendizado com multiplos conjuntos de atributos? A natureza determinística de NISE e MONISE, sua capacidade de lidar adequadamente com a forma da fronteira de Pareto em cada problema de aprendizado, e a garantia de sempre obter modelos de aprendizado eficientes são aqui pleiteados como responsáveis pelos resultados promissores alcançados em todas essas três frentes de investigação específicasAbstract: Regularized multinomial logistic regression, multi-label classification, and multi-task learning are examples of machine learning problems in which conflicting objectives, such as losses and regularization penalties, should be simultaneously minimized. Therefore, the narrow perspective of looking for the learning model with the best performance should be replaced by the proposition and further exploration of multiple efficient learning models, each one characterized by a distinct trade-off among the conflicting objectives. Committee machines and a posteriori preferences of the decision-maker may be implemented to properly explore this diverse set of efficient learning models toward performance improvement. The whole multi-objective framework for machine learning is supported by three stages: (1) The multi-objective modelling of each learning problem, explicitly highlighting the conflicting objectives involved; (2) Given the multi-objective formulation of the learning problem, for instance, considering loss functions and penalty terms as conflicting objective functions, efficient solutions well-distributed along the Pareto front are obtained by a deterministic and exact solver named NISE (Non-Inferior Set Estimation); (3) Those efficient learning models are then subject to a posteriori model selection, or to ensemble filtering and aggregation. Given that NISE is restricted to two objective functions, an extension for many objectives, named MONISE (Many Objective NISE), is also proposed here, being an additional contribution and expanding the applicability of the proposed framework. To properly access the merit of our multi-objective approach, more specific investigations were conducted, restricted to regularized linear learning models: (1) What is the relative merit of the a posteriori selection of a single learning model, among the ones produced by our proposal, when compared with other single-model approaches in the literature? (2) Is the diversity level of the learning models produced by our proposal higher than the diversity level achieved by alternative approaches devoted to generating multiple learning models? (3) What about the prediction quality of ensemble filtering and aggregation of the learning models produced by our proposal on: (i) multi-class classification, (ii) unbalanced classification, (iii) multi-label classification, (iv) multi-task learning, (v) multi-view learning? The deterministic nature of NISE and MONISE, their ability to properly deal with the shape of the Pareto front in each learning problem, and the guarantee of always obtaining efficient learning models are advocated here as being responsible for the promising results achieved in all those three specific investigationsDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica2014/13533-0FAPES

    Machine Learning and System Identification for Estimation in Physical Systems

    Get PDF
    In this thesis, we draw inspiration from both classical system identification and modern machine learning in order to solve estimation problems for real-world, physical systems. The main approach to estimation and learning adopted is optimization based. Concepts such as regularization will be utilized for encoding of prior knowledge and basis-function expansions will be used to add nonlinear modeling power while keeping data requirements practical.The thesis covers a wide range of applications, many inspired by applications within robotics, but also extending outside this already wide field.Usage of the proposed methods and algorithms are in many cases illustrated in the real-world applications that motivated the research.Topics covered include dynamics modeling and estimation, model-based reinforcement learning, spectral estimation, friction modeling and state estimation and calibration in robotic machining.In the work on modeling and identification of dynamics, we develop regularization strategies that allow us to incorporate prior domain knowledge into flexible, overparameterized models. We make use of classical control theory to gain insight into training and regularization while using tools from modern deep learning. A particular focus of the work is to allow use of modern methods in scenarios where gathering data is associated with a high cost.In the robotics-inspired parts of the thesis, we develop methods that are practically motivated and make sure that they are implementable also outside the research setting. We demonstrate this by performing experiments in realistic settings and providing open-source implementations of all proposed methods and algorithms

    Robust Optimization for Sequential Field Development Planning

    Get PDF
    To achieve high profitability from an oil field, optimizing the field development strategy (e.g., well type, well placement, drilling schedule) before committing to a decision is critically important. The profitability at a given control setting is predicted by running a reservoir simulation model, while determining a robust optimal strategy generally requires many expensive simulations. In this work, we focus on developing practical and efficient methodologies to solving reservoir optimization problems for which the actions that can be controlled are discrete and sequential (e.g., drilling sequence of wells). The type of optimization problems I address must take into account both geological uncertainty and the reduction in uncertainty resulting from observations. As the actions are discrete and sequential, the process can be characterized as sequential decision- making under uncertainty, where past decisions may affect both the possibility of the future choices of actions and the possibility of future uncertainty reduction. This thesis tackles the challenges in sequential optimization by considering three main issues: 1) optimizing discrete-control variables, 2) dealing with geological uncertainty in robust optimization, and 3) accounting for future learning when making optimal decisions. As the first contribution of this work, we develop a practical online-learning method- ology derived from A* search for solving reservoir optimization problems with discrete sets of actions. Sequential decision making can be formulated as finding the path with the maximum reward in a decision tree. To efficiently compute an optimal or near- optimal path, heuristics from relaxed problems are first used to estimate the maximum value constrained to past decisions, and then online-learning techniques are applied to improve the estimation accuracy by learning the errors of the initial approximations ob- tained from previous decision steps. In this way, an accurate estimate of the maximized value can be inexpensively obtained, thereby guiding the search toward the optimal so- lution efficiently. This approach allows for optimization of either a complete strategy with all available actions taken sequentially or only the first few actions at a reduced cost by limiting the search depth. The second contribution is related to robust optimization when an ensemble of reservoir models is used to characterize geological uncertainty. Instead of computing the expectation of an objective function using ensemble-based average value, we develop various bias-correction methods applied to the reservoir mean model to estimate the expected value efficiently without sacrificing accuracy. The key point of this approach is that the bias between the objective-function value obtained from the mean model and the average objective-function value over an ensemble can be corrected by only using information from distinct controls and model realizations. During the optimization process, we only require simulations of the mean model to estimate the expected value using the bias-corrected mean model. This methodology can significantly improve the efficiency of robust optimization and allows for fairly general optimization methods. In the last contribution of this thesis, we address the problem of making optimal decisions while considering the possibility of learning through future actions, i.e., op- portunities to improve the optimal strategy resulting from future uncertainty reduction. To efficiently account for the impact of future information on optimal decisions, we sim- plify the value of information analysis through key information that would help make better future decisions and the key actions that would result in obtaining that informa- tion. In other words, we focus on the use of key observations to reduce the uncertainty in key reservoir features for optimization problems, rather than using all observations to reduce all uncertainties. Moreover, by using supervised-learning algorithms, we can identify the optimal observation subset for key uncertainty reduction automatically and evaluate the information’s reliability simultaneously. This allows direct computation of the posterior probability distribution of key uncertainty based on Bayes’ rule, avoiding the necessity of expensive data assimilation algorithms to update the entire reservoir modeDoktorgradsavhandlin
    corecore