64,469 research outputs found
A General Framework for Fair Regression
Fairness, through its many forms and definitions, has become an important
issue facing the machine learning community. In this work, we consider how to
incorporate group fairness constraints in kernel regression methods, applicable
to Gaussian processes, support vector machines, neural network regression and
decision tree regression. Further, we focus on examining the effect of
incorporating these constraints in decision tree regression, with direct
applications to random forests and boosted trees amongst other widespread
popular inference techniques. We show that the order of complexity of memory
and computation is preserved for such models and tightly bound the expected
perturbations to the model in terms of the number of leaves of the trees.
Importantly, the approach works on trained models and hence can be easily
applied to models in current use and group labels are only required on training
data.Comment: 8 pages, 4 figures, 2 pages reference
On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions
This paper is about two related decision theoretic problems, nonparametric
two-sample testing and independence testing. There is a belief that two
recently proposed solutions, based on kernels and distances between pairs of
points, behave well in high-dimensional settings. We identify different sources
of misconception that give rise to the above belief. Specifically, we
differentiate the hardness of estimation of test statistics from the hardness
of testing whether these statistics are zero or not, and explicitly discuss a
notion of "fair" alternative hypotheses for these problems as dimension
increases. We then demonstrate that the power of these tests actually drops
polynomially with increasing dimension against fair alternatives. We end with
some theoretical insights and shed light on the \textit{median heuristic} for
kernel bandwidth selection. Our work advances the current understanding of the
power of modern nonparametric hypothesis tests in high dimensions.Comment: 19 pages, 9 figures, published in AAAI-15: The 29th AAAI Conference
on Artificial Intelligence (with author order reversed from ArXiv
Owl: Congestion Control with Partially Invisible Networks via Reinforcement Learning
Years of research on transport protocols have not solved the tussle between in-network and end-to-end congestion control. This debate is due to the variance of conditions and assumptions in different network scenarios, e.g., cellular versus data center networks. Recently, the community has proposed a few transport protocols driven by machine learning, nonetheless limited to end-to-end approaches.
In this paper, we present Owl, a transport protocol based on reinforcement learning, whose goal is to select the proper congestion window learning from end-to-end features and network signals, when available.
We show that our solution converges to a fair resource allocation after the learning overhead.
Our kernel implementation, deployed over emulated and large scale virtual network testbeds, outperforms all benchmark solutions based on end-to-end or in-network congestion control
Comparison of System Call Representations for Intrusion Detection
Over the years, artificial neural networks have been applied successfully in
many areas including IT security. Yet, neural networks can only process
continuous input data. This is particularly challenging for security-related
non-continuous data like system calls. This work focuses on four different
options to preprocess sequences of system calls so that they can be processed
by neural networks. These input options are based on one-hot encoding and
learning word2vec or GloVe representations of system calls. As an additional
option, we analyze if the mapping of system calls to their respective kernel
modules is an adequate generalization step for (a) replacing system calls or
(b) enhancing system call data with additional information regarding their
context. However, when performing such preprocessing steps it is important to
ensure that no relevant information is lost during the process. The overall
objective of system call based intrusion detection is to categorize sequences
of system calls as benign or malicious behavior. Therefore, this scenario is
used to evaluate the different input options as a classification task. The
results show, that each of the four different methods is a valid option when
preprocessing input data, but the use of kernel modules only is not recommended
because too much information is being lost during the mapping process.Comment: 12 pages, 1 figure, submitted to CISIS 201
A Confidence-Based Approach for Balancing Fairness and Accuracy
We study three classical machine learning algorithms in the context of
algorithmic fairness: adaptive boosting, support vector machines, and logistic
regression. Our goal is to maintain the high accuracy of these learning
algorithms while reducing the degree to which they discriminate against
individuals because of their membership in a protected group.
Our first contribution is a method for achieving fairness by shifting the
decision boundary for the protected group. The method is based on the theory of
margins for boosting. Our method performs comparably to or outperforms previous
algorithms in the fairness literature in terms of accuracy and low
discrimination, while simultaneously allowing for a fast and transparent
quantification of the trade-off between bias and error.
Our second contribution addresses the shortcomings of the bias-error
trade-off studied in most of the algorithmic fairness literature. We
demonstrate that even hopelessly naive modifications of a biased algorithm,
which cannot be reasonably said to be fair, can still achieve low bias and high
accuracy. To help to distinguish between these naive algorithms and more
sensible algorithms we propose a new measure of fairness, called resilience to
random bias (RRB). We demonstrate that RRB distinguishes well between our naive
and sensible fairness algorithms. RRB together with bias and accuracy provides
a more complete picture of the fairness of an algorithm
- …