30,057 research outputs found
Improving Missing Data Imputation with Deep Generative Models
Datasets with missing values are very common on industry applications, and
they can have a negative impact on machine learning models. Recent studies
introduced solutions to the problem of imputing missing values based on deep
generative models. Previous experiments with Generative Adversarial Networks
and Variational Autoencoders showed interesting results in this domain, but it
is not clear which method is preferable for different use cases. The goal of
this work is twofold: we present a comparison between missing data imputation
solutions based on deep generative models, and we propose improvements over
those methodologies. We run our experiments using known real life datasets with
different characteristics, removing values at random and reconstructing them
with several imputation techniques. Our results show that the presence or
absence of categorical variables can alter the selection of the best model, and
that some models are more stable than others after similar runs with different
random number generator seeds
Optimization of fuzzy analogy in software cost estimation using linguistic variables
One of the most important objectives of software engineering community has
been the increase of useful models that beneficially explain the development of
life cycle and precisely calculate the effort of software cost estimation. In
analogy concept, there is deficiency in handling the datasets containing
categorical variables though there are innumerable methods to estimate the
cost. Due to the nature of software engineering domain, generally project
attributes are often measured in terms of linguistic values such as very low,
low, high and very high. The imprecise nature of such value represents the
uncertainty and vagueness in their elucidation. However, there is no efficient
method that can directly deal with the categorical variables and tolerate such
imprecision and uncertainty without taking the classical intervals and numeric
value approaches. In this paper, a new approach for optimization based on fuzzy
logic, linguistic quantifiers and analogy based reasoning is proposed to
improve the performance of the effort in software project when they are
described in either numerical or categorical data. The performance of this
proposed method exemplifies a pragmatic validation based on the historical NASA
dataset. The results were analyzed using the prediction criterion and indicates
that the proposed method can produce more explainable results than other
machine learning methods.Comment: 14 pages, 8 figures; Journal of Systems and Software, 2011. arXiv
admin note: text overlap with arXiv:1112.3877 by other author
Monoidal computer III: A coalgebraic view of computability and complexity
Monoidal computer is a categorical model of intensional computation, where
many different programs correspond to the same input-output behavior. The
upshot of yet another model of computation is that a categorical formalism
should provide a much needed high level language for theory of computation,
flexible enough to allow abstracting away the low level implementation details
when they are irrelevant, or taking them into account when they are genuinely
needed. A salient feature of the approach through monoidal categories is the
formal graphical language of string diagrams, which supports visual reasoning
about programs and computations.
In the present paper, we provide a coalgebraic characterization of monoidal
computer. It turns out that the availability of interpreters and specializers,
that make a monoidal category into a monoidal computer, is equivalent with the
existence of a *universal state space*, that carries a weakly final state
machine for any pair of input and output types. Being able to program state
machines in monoidal computers allows us to represent Turing machines, to
capture their execution, count their steps, as well as, e.g., the memory cells
that they use. The coalgebraic view of monoidal computer thus provides a
convenient diagrammatic language for studying computability and complexity.Comment: 34 pages, 24 figures; in this version: added the Appendi
Conditional Random Field Autoencoders for Unsupervised Structured Prediction
We introduce a framework for unsupervised learning of structured predictors
with overlapping, global features. Each input's latent representation is
predicted conditional on the observable data using a feature-rich conditional
random field. Then a reconstruction of the input is (re)generated, conditional
on the latent structure, using models for which maximum likelihood estimation
has a closed-form. Our autoencoder formulation enables efficient learning
without making unrealistic independence assumptions or restricting the kinds of
features that can be used. We illustrate insightful connections to traditional
autoencoders, posterior regularization and multi-view learning. We show
competitive results with instantiations of the model for two canonical NLP
tasks: part-of-speech induction and bitext word alignment, and show that
training our model can be substantially more efficient than comparable
feature-rich baselines
- …