30,057 research outputs found

    Improving Missing Data Imputation with Deep Generative Models

    Full text link
    Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative models. Previous experiments with Generative Adversarial Networks and Variational Autoencoders showed interesting results in this domain, but it is not clear which method is preferable for different use cases. The goal of this work is twofold: we present a comparison between missing data imputation solutions based on deep generative models, and we propose improvements over those methodologies. We run our experiments using known real life datasets with different characteristics, removing values at random and reconstructing them with several imputation techniques. Our results show that the presence or absence of categorical variables can alter the selection of the best model, and that some models are more stable than others after similar runs with different random number generator seeds

    Optimization of fuzzy analogy in software cost estimation using linguistic variables

    Get PDF
    One of the most important objectives of software engineering community has been the increase of useful models that beneficially explain the development of life cycle and precisely calculate the effort of software cost estimation. In analogy concept, there is deficiency in handling the datasets containing categorical variables though there are innumerable methods to estimate the cost. Due to the nature of software engineering domain, generally project attributes are often measured in terms of linguistic values such as very low, low, high and very high. The imprecise nature of such value represents the uncertainty and vagueness in their elucidation. However, there is no efficient method that can directly deal with the categorical variables and tolerate such imprecision and uncertainty without taking the classical intervals and numeric value approaches. In this paper, a new approach for optimization based on fuzzy logic, linguistic quantifiers and analogy based reasoning is proposed to improve the performance of the effort in software project when they are described in either numerical or categorical data. The performance of this proposed method exemplifies a pragmatic validation based on the historical NASA dataset. The results were analyzed using the prediction criterion and indicates that the proposed method can produce more explainable results than other machine learning methods.Comment: 14 pages, 8 figures; Journal of Systems and Software, 2011. arXiv admin note: text overlap with arXiv:1112.3877 by other author

    Monoidal computer III: A coalgebraic view of computability and complexity

    Full text link
    Monoidal computer is a categorical model of intensional computation, where many different programs correspond to the same input-output behavior. The upshot of yet another model of computation is that a categorical formalism should provide a much needed high level language for theory of computation, flexible enough to allow abstracting away the low level implementation details when they are irrelevant, or taking them into account when they are genuinely needed. A salient feature of the approach through monoidal categories is the formal graphical language of string diagrams, which supports visual reasoning about programs and computations. In the present paper, we provide a coalgebraic characterization of monoidal computer. It turns out that the availability of interpreters and specializers, that make a monoidal category into a monoidal computer, is equivalent with the existence of a *universal state space*, that carries a weakly final state machine for any pair of input and output types. Being able to program state machines in monoidal computers allows us to represent Turing machines, to capture their execution, count their steps, as well as, e.g., the memory cells that they use. The coalgebraic view of monoidal computer thus provides a convenient diagrammatic language for studying computability and complexity.Comment: 34 pages, 24 figures; in this version: added the Appendi

    Conditional Random Field Autoencoders for Unsupervised Structured Prediction

    Full text link
    We introduce a framework for unsupervised learning of structured predictors with overlapping, global features. Each input's latent representation is predicted conditional on the observable data using a feature-rich conditional random field. Then a reconstruction of the input is (re)generated, conditional on the latent structure, using models for which maximum likelihood estimation has a closed-form. Our autoencoder formulation enables efficient learning without making unrealistic independence assumptions or restricting the kinds of features that can be used. We illustrate insightful connections to traditional autoencoders, posterior regularization and multi-view learning. We show competitive results with instantiations of the model for two canonical NLP tasks: part-of-speech induction and bitext word alignment, and show that training our model can be substantially more efficient than comparable feature-rich baselines
    corecore