11,673 research outputs found
A Structured Approach to Predicting Image Enhancement Parameters
Social networking on mobile devices has become a commonplace of everyday
life. In addition, photo capturing process has become trivial due to the
advances in mobile imaging. Hence people capture a lot of photos everyday and
they want them to be visually-attractive. This has given rise to automated,
one-touch enhancement tools. However, the inability of those tools to provide
personalized and content-adaptive enhancement has paved way for machine-learned
methods to do the same. The existing typical machine-learned methods
heuristically (e.g. kNN-search) predict the enhancement parameters for a new
image by relating the image to a set of similar training images. These
heuristic methods need constant interaction with the training images which
makes the parameter prediction sub-optimal and computationally expensive at
test time which is undesired. This paper presents a novel approach to
predicting the enhancement parameters given a new image using only its
features, without using any training images. We propose to model the
interaction between the image features and its corresponding enhancement
parameters using the matrix factorization (MF) principles. We also propose a
way to integrate the image features in the MF formulation. We show that our
approach outperforms heuristic approaches as well as recent approaches in MF
and structured prediction on synthetic as well as real-world data of image
enhancement.Comment: WACV 201
Restricted Boltzmann Machines for Robust and Fast Latent Truth Discovery
We address the problem of latent truth discovery, LTD for short, where the
goal is to discover the underlying true values of entity attributes in the
presence of noisy, conflicting or incomplete information. Despite a multitude
of algorithms to address the LTD problem that can be found in literature, only
little is known about their overall performance with respect to effectiveness
(in terms of truth discovery capabilities), efficiency and robustness. A
practical LTD approach should satisfy all these characteristics so that it can
be applied to heterogeneous datasets of varying quality and degrees of
cleanliness.
We propose a novel algorithm for LTD that satisfies the above requirements.
The proposed model is based on Restricted Boltzmann Machines, thus coined
LTD-RBM. In extensive experiments on various heterogeneous and publicly
available datasets, LTD-RBM is superior to state-of-the-art LTD techniques in
terms of an overall consideration of effectiveness, efficiency and robustness
Bias in OLAP Queries: Detection, Explanation, and Removal
On line analytical processing (OLAP) is an essential element of
decision-support systems. OLAP tools provide insights and understanding needed
for improved decision making. However, the answers to OLAP queries can be
biased and lead to perplexing and incorrect insights. In this paper, we propose
HypDB, a system to detect, explain, and to resolve bias in decision-support
queries. We give a simple definition of a \emph{biased query}, which performs a
set of independence tests on the data to detect bias. We propose a novel
technique that gives explanations for bias, thus assisting an analyst in
understanding what goes on. Additionally, we develop an automated method for
rewriting a biased query into an unbiased query, which shows what the analyst
intended to examine. In a thorough evaluation on several real datasets we show
both the quality and the performance of our techniques, including the
completely automatic discovery of the revolutionary insights from a famous 1973
discrimination case.Comment: This paper is an extended version of a paper presented at SIGMOD 201
FairMod - Making Predictive Models Discrimination Aware
Predictive models such as decision trees and neural networks may produce
discrimination in their predictions. This paper proposes a method to
post-process the predictions of a predictive model to make the processed
predictions non-discriminatory. The method considers multiple protected
variables together. Multiple protected variables make the problem more
challenging than a simple protected variable. The method uses a well-cited
discrimination metric and adapts it to allow the specification of explanatory
variables, such as position, profession, education, that describe the contexts
of the applications. It models the post-processing of predictions problem as a
nonlinear optimization problem to find best adjustments to the predictions so
that the discrimination constraints of all protected variables are all met at
the same time. The proposed method is independent of classification methods. It
can handle the cases that existing methods cannot handle: satisfying multiple
protected attributes at the same time, allowing multiple explanatory
attributes, and being independent of classification model types. An evaluation
using four real world data sets shows that the proposed method is as
effectively as existing methods, in addition to its extra power
Bridging observational studies and randomized experiments by embedding the former in the latter
The health effects of environmental exposures have been studied for decades,
typically using standard regression models to assess exposure-outcome
associations found in observational non-experimental data. We propose and
illustrate a different approach to examine causal effects of environmental
exposures on health outcomes from observational data. Our strategy attempts to
structure the observational data to approximate data from a hypothetical, but
realistic, randomized experiment. This approach, based on insights from
classical experimental design, involves four stages, and relies on modern
computing to implement the effort in two of the four stages.More specifically,
our strategy involves: 1) a conceptual stage that involves the precise
formulation of the causal question in terms of a hypothetical randomized
experiment where the exposure is assigned to units; 2) a design stage that
attempts to reconstruct (or approximate) a randomized experiment before any
outcome data are observed, 3) a statistical analysis comparing the outcomes of
interest in the exposed and non-exposed units of the hypothetical randomized
experiment, and 4) a summary stage providing conclusions about statistical
evidence for the sizes of possible causal effects of the exposure on outcomes.
We illustrate our approach using an example examining the effect of parental
smoking on children's lung function collected in families living in East Boston
in the 1970's. To complement the traditional purely model-based approaches, our
strategy, which includes outcome free matched-sampling, provides workable tools
to quantify possible detrimental exposure effects on human health outcomes
especially because it also includes transparent diagnostics to assess the
assumptions of the four-stage statistical approach being applied
Multiplicative Coevolution Regression Models for Longitudinal Networks and Nodal Attributes
We introduce a simple and extendable coevolution model for the analysis of
longitudinal network and nodal attribute data. The model features parameters
that describe three phenomena: homophily, contagion and autocorrelation of the
network and nodal attribute process. Homophily here describes how changes to
the network may be associated with between-node similarities in terms of their
nodal attributes. Contagion refers to how node-level attributes may change
depending on the network. The model we present is based upon a pair of
intertwined autoregressive processes. We obtain least-squares parameter
estimates for continuous-valued fully-observed network and attribute data. We
also provide methods for Bayesian inference in several other cases, including
ordinal network and attribute data, and models involving latent nodal
attributes. These model extensions are applied to an analysis of international
relations data and to data from a study of teen delinquency and friendship
networks.Comment: 20 page
Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models
Deep generative neural networks have proven effective at both conditional and
unconditional modeling of complex data distributions. Conditional generation
enables interactive control, but creating new controls often requires expensive
retraining. In this paper, we develop a method to condition generation without
retraining the model. By post-hoc learning latent constraints, value functions
that identify regions in latent space that generate outputs with desired
attributes, we can conditionally sample from these regions with gradient-based
optimization or amortized actor functions. Combining attribute constraints with
a universal "realism" constraint, which enforces similarity to the data
distribution, we generate realistic conditional images from an unconditional
variational autoencoder. Further, using gradient-based optimization, we
demonstrate identity-preserving transformations that make the minimal
adjustment in latent space to modify the attributes of an image. Finally, with
discrete sequences of musical notes, we demonstrate zero-shot conditional
generation, learning latent constraints in the absence of labeled data or a
differentiable reward function. Code with dedicated cloud instance has been
made publicly available (https://goo.gl/STGMGx)
Semi-supervised Embedding Learning for High-dimensional Bayesian Optimization
Bayesian optimization is a broadly applied methodology to optimize the
expensive black-box function. Despite its success, it still faces the challenge
from the high-dimensional search space. To alleviate this problem, we propose a
novel Bayesian optimization framework (termed SILBO), which finds a
low-dimensional space to perform Bayesian optimization iteratively through
semi-supervised dimension reduction. SILBO incorporates both labeled points and
unlabeled points acquired from the acquisition function to guide the embedding
space learning. To accelerate the learning procedure, we present a randomized
method for generating the projection matrix. Furthermore, to map from the
low-dimensional space to the high-dimensional original space, we propose two
mapping strategies: and according to
the evaluation overhead of the objective function. Experimental results on both
synthetic function and hyperparameter optimization tasks demonstrate that SILBO
outperforms the existing state-of-the-art high-dimensional Bayesian
optimization methods
Packet Score based network security and Traffic Optimization
One of the critical threat to internet security is Distributed Denial of
Service (DDoS). This paper by the introduction of automated online attack
classification and attack packet discarding helps to resolve the network
security issue by certain level. The incoming packets are assigned scores based
on the priority associated with the attributes and on comparison with
probability distribution of arriving packets on per packet basis
Next Stop "NoOps": Enabling Cross-System Diagnostics Through Graph-based Composition of Logs and Metrics
Performing diagnostics in IT systems is an increasingly complicated task, and
it is not doable in satisfactory time by even the most skillful operators.
Systems and their architecture change very rapidly in response to business and
user demand. Many organizations see value in the maintenance and management
model of NoOps that stands for No Operations. One of the implementations of
this model is a system that is maintained automatically without any human
intervention. The path to NoOps involves not only precise and fast diagnostics
but also reusing as much knowledge as possible after the system is reconfigured
or changed. The biggest challenge is to leverage knowledge on one IT system and
reuse this knowledge for diagnostics of another, different system. We propose a
framework of weighted graphs which can transfer knowledge, and perform
high-quality diagnostics of IT systems. We encode all possible data in a graph
representation of a system state and automatically calculate weights of these
graphs. Then, thanks to the evaluation of similarity between graphs, we
transfer knowledge about failures from one system to another and use it for
diagnostics. We successfully evaluate the proposed approach on Spark, Hadoop,
Kafka and Cassandra systems.Comment: Peer-reviewed, accepted as a regular paper to IEEE Cluster 2018. To
be published through proceedings in September 201
- …