34 research outputs found

    Support vector machines to detect physiological patterns for EEG and EMG-based human-computer interaction:a review

    Get PDF
    Support vector machines (SVMs) are widely used classifiers for detecting physiological patterns in human-computer interaction (HCI). Their success is due to their versatility, robustness and large availability of free dedicated toolboxes. Frequently in the literature, insufficient details about the SVM implementation and/or parameters selection are reported, making it impossible to reproduce study analysis and results. In order to perform an optimized classification and report a proper description of the results, it is necessary to have a comprehensive critical overview of the applications of SVM. The aim of this paper is to provide a review of the usage of SVM in the determination of brain and muscle patterns for HCI, by focusing on electroencephalography (EEG) and electromyography (EMG) techniques. In particular, an overview of the basic principles of SVM theory is outlined, together with a description of several relevant literature implementations. Furthermore, details concerning reviewed papers are listed in tables and statistics of SVM use in the literature are presented. Suitability of SVM for HCI is discussed and critical comparisons with other classifiers are reported

    A Unified Framework for Gradient-based Hyperparameter Optimization and Meta-learning

    Get PDF
    Machine learning algorithms and systems are progressively becoming part of our societies, leading to a growing need of building a vast multitude of accurate, reliable and interpretable models which should possibly exploit similarities among tasks. Automating segments of machine learning itself seems to be a natural step to undertake to deliver increasingly capable systems able to perform well in both the big-data and the few-shot learning regimes. Hyperparameter optimization (HPO) and meta-learning (MTL) constitute two building blocks of this growing effort. We explore these two topics under a unifying perspective, presenting a mathematical framework linked to bilevel programming that captures existing similarities and translates into procedures of practical interest rooted in algorithmic differentiation. We discuss the derivation, applicability and computational complexity of these methods and establish several approximation properties for a class of objective functions of the underlying bilevel programs. In HPO, these algorithms generalize and extend previous work on gradient-based methods. In MTL, the resulting framework subsumes classic and emerging strategies and provides a starting basis from which to build and analyze novel techniques. A series of examples and numerical simulations offer insight and highlight some limitations of these approaches. Experiments on larger-scale problems show the potential gains of the proposed methods in real-world applications. Finally, we develop two extensions of the basic algorithms apt to optimize a class of discrete hyperparameters (graph edges) in an application to relational learning and to tune online learning rate schedules for training neural network models, an old but crucially important issue in machine learning

    Agnostic Bayes

    Get PDF
    Tableau d'honneur de la FacultĂ© des Ă©tudes supĂ©rieures et postdorales, 2014-2015L’apprentissage automatique correspond Ă  la science de l’apprentissage Ă  partir d’exemples. Des algorithmes basĂ©s sur cette approche sont aujourd’hui omniprĂ©sents. Bien qu’il y ait eu un progrĂšs significatif, ce domaine prĂ©sente des dĂ©fis importants. Par exemple, simplement sĂ©lectionner la fonction qui correspond le mieux aux donnĂ©es observĂ©es n’offre aucune garantie statistiques sur les exemples qui n’ont pas encore Ă©tĂ© observĂ©es. Quelques thĂ©ories sur l’apprentissage automatique offrent des façons d’aborder ce problĂšme. Parmi ceux-ci, nous prĂ©sentons la modĂ©lisation bayĂ©sienne de l’apprentissage automatique et l’approche PACbayĂ©sienne pour l’apprentissage automatique dans une vue unifiĂ©e pour mettre en Ă©vidence d’importantes similaritĂ©s. Le rĂ©sultat de cette analyse suggĂšre que de considĂ©rer les rĂ©ponses de l’ensemble des modĂšles plutĂŽt qu’un seul correspond Ă  un des Ă©lĂ©ments-clĂ©s pour obtenir une bonne performance de gĂ©nĂ©ralisation. Malheureusement, cette approche vient avec un coĂ»t de calcul Ă©levĂ©, et trouver de bonnes approximations est un sujet de recherche actif. Dans cette thĂšse, nous prĂ©sentons une approche novatrice qui peut ĂȘtre appliquĂ©e avec un faible coĂ»t de calcul sur un large Ă©ventail de configurations d’apprentissage automatique. Pour atteindre cet objectif, nous appliquons la thĂ©orie de Bayes d’une maniĂšre diffĂ©rente de ce qui est conventionnellement fait pour l’apprentissage automatique. SpĂ©cifiquement, au lieu de chercher le vrai modĂšle Ă  l’origine des donnĂ©es observĂ©es, nous cherchons le meilleur modĂšle selon une mĂ©trique donnĂ©e. MĂȘme si cette diffĂ©rence semble subtile, dans cette approche, nous ne faisons pas la supposition que le vrai modĂšle appartient Ă  l’ensemble de modĂšles explorĂ©s. Par consĂ©quent, nous disons que nous sommes agnostiques. Plusieurs expĂ©rimentations montrent un gain de gĂ©nĂ©ralisation significatif en utilisant cette approche d’ensemble de modĂšles durant la phase de validation croisĂ©e. De plus, cet algorithme est simple Ă  programmer et n’ajoute pas un coĂ»t de calcul significatif Ă  la recherche d’hyperparamĂštres conventionnels. Finalement, cet outil probabiliste peut Ă©galement ĂȘtre utilisĂ© comme un test statistique pour Ă©valuer la qualitĂ© des algorithmes sur plusieurs ensembles de donnĂ©es d’apprentissage.Machine learning is the science of learning from examples. Algorithms based on this approach are now ubiquitous. While there has been significant progress, this field presents important challenges. Namely, simply selecting the function that best fits the observed data was shown to have no statistical guarantee on the examples that have not yet been observed. There are a few learning theories that suggest how to address this problem. Among these, we present the Bayesian modeling of machine learning and the PAC-Bayesian approach to machine learning in a unified view to highlight important similarities. The outcome of this analysis suggests that model averaging is one of the key elements to obtain a good generalization performance. Specifically, one should perform predictions based on the outcome of every model instead of simply the one that best fits the observed data. Unfortunately, this approach comes with a high computational cost problem, and finding good approximations is the subject of active research. In this thesis, we present an innovative approach that can be applied with a low computational cost on a wide range of machine learning setups. In order to achieve this, we apply the Bayes’ theory in a different way than what is conventionally done for machine learning. Specifically, instead of searching for the true model at the origin of the observed data, we search for the best model according to a given metric. While the difference seems subtle, in this approach, we do not assume that the true model belongs to the set of explored model. Hence, we say that we are agnostic. An extensive experimental setup shows a significant generalization performance gain when using this model averaging approach during the cross-validation phase. Moreover, this simple algorithm does not add a significant computational cost to the conventional search of hyperparameters. Finally, this probabilistic tool can also be used as a statistical significance test to evaluate the quality of learning algorithms on multiple datasets

    Data-driven quantitative photoacoustic tomography

    Get PDF
    Spatial information about the 3D distribution of blood oxygen saturation (sO2) in vivo is of clinical interest as it encodes important physiological information about tissue health/pathology. Photoacoustic tomography (PAT) is a biomedical imaging modality that, in principle, can be used to acquire this information. Images are formed by illuminating the sample with a laser pulse where, after multiple scattering events, the optical energy is absorbed. A subsequent rise in temperature induces an increase in pressure (the photoacoustic initial pressure p0) that propagates to the sample surface as an acoustic wave. These acoustic waves are detected as pressure time series by sensor arrays and used to reconstruct images of sample’s p0 distribution. This encodes information about the sample’s absorption distribution, and can be used to estimate sO2. However, an ill-posed nonlinear inverse problem stands in the way of acquiring estimates in vivo. Current approaches to solving this problem fall short of being widely and successfully applied to in vivo tissues due to their reliance on simplifying assumptions about the tissue, prior knowledge of its optical properties, or the formulation of a forward model accurately describing image acquisition with a specific imaging system. Here, we investigate the use of data-driven approaches (deep convolutional networks) to solve this problem. Networks only require a dataset of examples to learn a mapping from PAT data to images of the sO2 distribution. We show the results of training a 3D convolutional network to estimate the 3D sO2 distribution within model tissues from 3D multiwavelength simulated images. However, acquiring a realistic training set to enable successful in vivo application is non-trivial given the challenges associated with estimating ground truth sO2 distributions and the current limitations of simulating training data. We suggest/test several methods to 1) acquire more realistic training data or 2) improve network performance in the absence of adequate quantities of realistic training data. For 1) we describe how training data may be acquired from an organ perfusion system and outline a possible design. Separately, we describe how training data may be generated synthetically using a variant of generative adversarial networks called ambientGANs. For 2), we show how the accuracy of networks trained with limited training data can be improved with self-training. We also demonstrate how the domain gap between training and test sets can be minimised with unsupervised domain adaption to improve quantification accuracy. Overall, this thesis clarifies the advantages of data-driven approaches, and suggests concrete steps towards overcoming the challenges with in vivo application

    Discriminative, generative, and imitative learning

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.Includes bibliographical references (leaves 201-212).I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars. Conversely, discriminative algorithms adjust a possibly non-distributional model to data optimizing for a specific task, such as classification or prediction. This typically leads to superior performance yet compromises the flexibility of generative modeling. I present Maximum Entropy Discrimination (MED) as a framework to combine both discriminative estimation and generative probability densities. Calculations involve distributions over parameters, margins, and priors and are provably and uniquely solvable for the exponential family. Extensions include regression, feature selection, and transduction. SVMs are also naturally subsumed and can be augmented with, for example, feature selection, to obtain substantial improvements. To extend to mixtures of exponential families, I derive a discriminative variant of the Expectation-Maximization (EM) algorithm for latent discriminative learning (or latent MED).(cont.) While EM and Jensen lower bound log-likelihood, a dual upper bound is made possible via a novel reverse-Jensen inequality. The variational upper bound on latent log-likelihood has the same form as EM bounds, is computable efficiently and is globally guaranteed. It permits powerful discriminative learning with the wide range of contemporary probabilistic mixture models (mixtures of Gaussians, mixtures of multinomials and hidden Markov models). We provide empirical results on standardized data sets that demonstrate the viability of the hybrid discriminative-generative approaches of MED and reverse-Jensen bounds over state of the art discriminative techniques or generative approaches. Subsequently, imitative learning is presented as another variation on generative modeling which also learns from exemplars from an observed data source. However, the distinction is that the generative model is an agent that is interacting in a much more complex surrounding external world. It is not efficient to model the aggregate space in a generative setting. I demonstrate that imitative learning (under appropriate conditions) can be adequately addressed as a discriminative prediction task which outperforms the usual generative approach. This discriminative-imitative learning approach is applied with a generative perceptual system to synthesize a real-time agent that learns to engage in social interactive behavior.by Tony Jebara.Ph.D

    Kernel-Based Ranking. Methods for Learning and Performance Estimation

    Get PDF
    Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.Siirretty Doriast

    Towards Scalable Characterization of Noisy, Intermediate-Scale Quantum Information Processors

    Get PDF
    In recent years, quantum information processors (QIPs) have grown from one or two qubits to tens of qubits. As a result, characterizing QIPs – measuring how well they work, and how they fail – has become much more challenging. The obstacles to characterizing today’s QIPs will grow even more difficult as QIPs grow from tens of qubits to hundreds, and enter what has been called the “noisy, intermediate-scale quantum” (NISQ) era. This thesis develops methods based on advanced statistics and machine learning algorithms to address the difficulties of “quantum character- ization, validation, and verification” (QCVV) of NISQ processors. In the first part of this thesis, I use statistical model selection to develop techniques for choosing between several models for a QIPs behavior. In the second part, I deploy machine learning algorithms to develop a new QCVV technique and to do experiment design. These investigations help lay a foundation for extending QCVV to characterize the next generation of NISQ processors

    Smart Feature Selection to enable Advanced Virtual Metrology

    Get PDF
    The present dissertation enhances the research in computer science, especially state of the art Machine Learning (ML), in the ïŹeld of process development in Semiconductor Manufacturing (SM) by the invention of a new Feature Selection (FS) algorithm to discover the most important equipment and context parameters for highest performance of predicting process results in a newly developed advanced Virtual Metrology (VM) system. In complex high-mixture-low-volume SM, chips or rather silicon wafers for numerous products and technologies are manufactured on the same equipment. Process stability and control are key factors for the production of highest quality semiconductors. Advanced Process Control (APC) monitors manufacturing equipment and intervenes in the equipment control if critical states occur. Besides Run-To-Run (R2R) control and Fault Detection and ClassiïŹcation (FDC) new process control development activities focus on VM which predicts metrology results based on productive equipment and context data. More precisely, physical equipment parameters combined with logistical information about the manufactured product are used to predict the process result. The compulsory need for a reliable and most accurate VM system arises to imperatively reduce time and cost expensive physical metrology as well as to increase yield and stability of the manufacturing processes while concurrently minimizing economic expenditures and associated data ïŹ‚ow. The four challenges of (1) eïŹƒciency of development and deployment of a corporate-wide VM system, (2) scalability of enterprise data storage, data traïŹƒc and computational eïŹ€ort, (3) knowledge discovery out of available data for future enhancements and process developments as well as (4) highest accuracy including reliability and reproducibility of the prediction results are so far not successfully mastered at the same time by any other approach. Many ML techniques have already been investigated to build prediction models based on historical data. The outcomes are only partially satisfying in order to achieve the ambitious objectives in terms of highest accuracy resulting in tight control limits which tolerate almost no deviation from the intended process result. For optimization of prediction performance state of the art process engineering requirements lead to three criteria for assessment of the ML algorithm for the VM: outlier detection, model robustness with respect to equipment degradation over time and ever-changing manufacturing processes adapted for further development of products and technologies and ïŹnally highest prediction accuracy. It has been shown that simple regression methods fail in terms of prediction accuracy, outlier detection and model robustness while higher-sophisticated regression methods are almost able to constantly achieve these goals. Due to quite similar but still not optimal prediction performance as well as limited computational feasibility in case of numerous input parameters, the choice of superior ML regression methods does not ultimately resolve the problem. Considering the entire cycle of Knowledge Discovery in Databases including Data Mining (DM) another task appears to be crucial: FS. An optimal selection of the decisive parameters and hence reduction of the input space dimension boosts the model performance by omitting redundant as well as spurious information. Various FS algorithms exist to deal with correlated and noisy features, but each of its own is not capable to ensure that the ambitious targets for VM can be achieved in prevalent high-mixture-low-volume SM. The objective of the present doctoral thesis is the development of a smart FS algorithm to enable a by this advanced and also newly developed VM system to comply with all imperative requirements for improved process stability and control. At ïŹrst, a new Evolutionary Repetitive Backward Elimination (ERBE) FS algorithm is implemented combining the advantages of a Genetic Algorithm (GA) with Leave-One-Out (LOO) Backward Elimination as wrapper for Support Vector Regression (SVR). At second, a new high performance VM system is realized in the productive environment of High Density Plasma (HDP) Chemical Vapor Deposition (CVD) at the InïŹneon frontend manufacturing site Regensburg. The advanced VM system performs predictions based on three state of the art ML methods (i.e. Neural Network (NN), Decision Tree M5’ (M5’) & SVR) and can be deployed on many other process areas due to its generic approach and the adaptive design of the ERBE FS algorithm. The developed ERBE algorithm for smart FS enhances the new advanced VM system by revealing evidentially the crucial features for multivariate nonlinear regression. Enabling most capable VM turns statistical sampling metrology with typically 10% coverage of process results into a 100% metrological process monitoring and control. Hence, misprocessed wafers can be detected instantly. Subsequent rework or earliest scrap of those wafers result in signiïŹcantly increased stability of subsequent process steps and thus higher yield. An additional remarkable beneïŹt is the reduction of production cycle time due to the possible saving of time consuming physical metrology resulting in an increase of production volume output up to 10% in case of fab-wide implementation of the new VM system

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains
    corecore