17 research outputs found

    Newton-based maximum likelihood estimation in nonlinear state space models

    Full text link
    Maximum likelihood (ML) estimation using Newton's method in nonlinear state space models (SSMs) is a challenging problem due to the analytical intractability of the log-likelihood and its gradient and Hessian. We estimate the gradient and Hessian using Fisher's identity in combination with a smoothing algorithm. We explore two approximations of the log-likelihood and of the solution of the smoothing problem. The first is a linearization approximation which is computationally cheap, but the accuracy typically varies between models. The second is a sampling approximation which is asymptotically valid for any SSM but is more computationally costly. We demonstrate our approach for ML parameter estimation on simulated data from two different SSMs with encouraging results.Comment: 17 pages, 2 figures. Accepted for the 17th IFAC Symposium on System Identification (SYSID), Beijing, China, October 201

    Does k-anonymous microaggregation affect machine-learned macrotrends?

    Get PDF
    n the era of big data, the availability of massive amounts of information makes privacy protection more necessary than ever. Among a variety of anonymization mechanisms, microaggregation is a common approach to satisfy the popular requirement of k-anonymity in statistical databases. In essence, k-anonymous microaggregation aggregates quasi-identifiers to hide the identity of each data subject within a group of other k - 1 subjects. As any perturbative mechanism, however, anonymization comes at the cost of some information loss that may hinder the ulterior purpose of the released data, which very often is building machine-learning models for macrotrends analysis. To assess the impact of microaggregation on the utility of the anonymized data, it is necessary to evaluate the resulting accuracy of said models. In this paper, we address the problem of measuring the effect of k-anonymous microaggregation on the empirical utility of microdata. We quantify utility accordingly as the accuracy of classification models learned from microaggregated data, and evaluated over original test data. Our experiments indicate, with some consistency, that the impact of the de facto microaggregation standard (maximum distance to average vector) on the performance of machine-learning algorithms is often minor to negligible for a wide range of k for a variety of classification algorithms and data sets. Furthermore, experimental evidences suggest that the traditional measure of distortion in the community of microdata anonymization may be inappropriate for evaluating the utility of microaggregated data.Postprint (published version

    A New Model-Free Predictive Control Method Using Input and Output Data

    Get PDF
    The purpose of this paper is to present a new predictive control utilizing online data and stored data of input/output of the controlled system. The conventional predictive control methods utilize the mathematical model of the control system to predict an optimal future input to control the system. The model is usually obtained by a standard system identification method from the measured input/output data. The proposed method does not require the mathematical model to predict the optimal future control input to achieve the desired output. This control strategy, called just-in-time, was originally proposed by Inoue and Yamamoto in 2004. In this paper, we proposed a simplified version of the original just-in-time predictive control method. © (2014) Trans Tech Publications, Switzerland.研究者情報D

    Towards a Constrained Clustering Algorithm Selection

    Get PDF
    National audienceThe success of machine learning approaches to solving real-world problems motivated the plethora of new algorithms. However, it raises the issue of algorithm selection, as there is no algorithm that performs better than all others. Approaches for predicting which algorithms provide the best results for a given problem become useful, especially in the context of building workflows with several algorithms. Domain knowledge (in the form of constraints, preferences) should also be considered and used to guide the process and improve results. In this work, we propose a meta-learning approach that characterizes sets of constraints to decide which constrained clustering algorithm should be employed. We present an empirical study over real datasets using three clustering algorithms (one unsupervised and two semi-supervised), which shows improvements in cluster quality when compared to existing semi-supervised methodolo-gies

    A New Model-Free Predictive Control Method Using Input and Output Data

    Full text link

    CLASSIFICAÇÃO ESTATÍSTICA DE TIPOS DE FALHAS EM SINAIS DOCSIS 3.0 EM TESTES DE ROTEADORES WIRELESS.

    Get PDF
    ResumoA crescente competitividade nos sistemas industriais, faz com que detecção de falhasse torne cada vez mais importante. Neste trabalho é feito um experimento utilizandomodelos de classificação de reconhecimento de padrões em dados de roteadores wireless,modelo TC7337, com intuito de encontrar um classificador que melhor se adeque a essesdados. Ao fim do trabalho, foi desenvolvido uma ferramenta interativa para o softwareR, através do pacote Shiny para facilitar esta classificação ao usuário. Palavras-chave: Classificação, Pacote rminer, Reconhecimento de padrões. AbstractThe increasing competitiveness in industrial systems makes fault detection increasinglyimportant role. In this study, reflects on research an experiment using pattern recognitionclassification models on data from textit wireless routers, model TC7337, in order to finda classifier that best fits these data. At the end of the study, an interactive tool for textitsoftware R was developed, through the textit Shiny package to facilitate this classificationfor the user. Keywords: Classification, Pattern recognition, Rminer package

    Learning predictive models from massive, semantically disparate data

    Get PDF
    Machine learning approaches offer some of the most successful techniques for constructing predictive models from data. However, applying such techniques in practice requires overcoming several challenges: infeasibility of centralized access to the data because of the massive size of some of the data sets that often exceeds the size of memory available to the learner, distributed nature of data, access restrictions, data fragmentation, semantic disparities between the data sources, and data sources that evolve spatially or temporally (e.g. data streams and genomic data sources in which new data is being submitted continuously). Learning using statistical queries and semantic correspondences that present a unified view of disparate data sources to the learner offer a powerful general framework for addressing some of these challenges. Against this background, this thesis describes (1) approaches to deal with missing values in the statistical query based algorithms for building predictors (Nayve Bayes and decision trees) and the techniques to minimize the number of required queries in such a setting. (2) Sufficient statistics based algorithms for constructing and updating sequence classifiers. (3) Reduction of several aspects of learning from semantically disparate data sources (such as (a) how errors in mappings affect the accuracy of the learned model and (b) how to choose an optimal mapping from among a set of alternative expert-supplied or automatically generated mappings) to the well-studied problems of domain adaptation and learning in presence of noise and (4) a software for learning predictive models from semantically disparate data
    corecore