64,469 research outputs found

    A General Framework for Fair Regression

    Full text link
    Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods, applicable to Gaussian processes, support vector machines, neural network regression and decision tree regression. Further, we focus on examining the effect of incorporating these constraints in decision tree regression, with direct applications to random forests and boosted trees amongst other widespread popular inference techniques. We show that the order of complexity of memory and computation is preserved for such models and tightly bound the expected perturbations to the model in terms of the number of leaves of the trees. Importantly, the approach works on trained models and hence can be easily applied to models in current use and group labels are only required on training data.Comment: 8 pages, 4 figures, 2 pages reference

    On the Decreasing Power of Kernel and Distance based Nonparametric Hypothesis Tests in High Dimensions

    Full text link
    This paper is about two related decision theoretic problems, nonparametric two-sample testing and independence testing. There is a belief that two recently proposed solutions, based on kernels and distances between pairs of points, behave well in high-dimensional settings. We identify different sources of misconception that give rise to the above belief. Specifically, we differentiate the hardness of estimation of test statistics from the hardness of testing whether these statistics are zero or not, and explicitly discuss a notion of "fair" alternative hypotheses for these problems as dimension increases. We then demonstrate that the power of these tests actually drops polynomially with increasing dimension against fair alternatives. We end with some theoretical insights and shed light on the \textit{median heuristic} for kernel bandwidth selection. Our work advances the current understanding of the power of modern nonparametric hypothesis tests in high dimensions.Comment: 19 pages, 9 figures, published in AAAI-15: The 29th AAAI Conference on Artificial Intelligence (with author order reversed from ArXiv

    Owl: Congestion Control with Partially Invisible Networks via Reinforcement Learning

    Get PDF
    Years of research on transport protocols have not solved the tussle between in-network and end-to-end congestion control. This debate is due to the variance of conditions and assumptions in different network scenarios, e.g., cellular versus data center networks. Recently, the community has proposed a few transport protocols driven by machine learning, nonetheless limited to end-to-end approaches. In this paper, we present Owl, a transport protocol based on reinforcement learning, whose goal is to select the proper congestion window learning from end-to-end features and network signals, when available. We show that our solution converges to a fair resource allocation after the learning overhead. Our kernel implementation, deployed over emulated and large scale virtual network testbeds, outperforms all benchmark solutions based on end-to-end or in-network congestion control

    Comparison of System Call Representations for Intrusion Detection

    Full text link
    Over the years, artificial neural networks have been applied successfully in many areas including IT security. Yet, neural networks can only process continuous input data. This is particularly challenging for security-related non-continuous data like system calls. This work focuses on four different options to preprocess sequences of system calls so that they can be processed by neural networks. These input options are based on one-hot encoding and learning word2vec or GloVe representations of system calls. As an additional option, we analyze if the mapping of system calls to their respective kernel modules is an adequate generalization step for (a) replacing system calls or (b) enhancing system call data with additional information regarding their context. However, when performing such preprocessing steps it is important to ensure that no relevant information is lost during the process. The overall objective of system call based intrusion detection is to categorize sequences of system calls as benign or malicious behavior. Therefore, this scenario is used to evaluate the different input options as a classification task. The results show, that each of the four different methods is a valid option when preprocessing input data, but the use of kernel modules only is not recommended because too much information is being lost during the mapping process.Comment: 12 pages, 1 figure, submitted to CISIS 201

    A Confidence-Based Approach for Balancing Fairness and Accuracy

    Full text link
    We study three classical machine learning algorithms in the context of algorithmic fairness: adaptive boosting, support vector machines, and logistic regression. Our goal is to maintain the high accuracy of these learning algorithms while reducing the degree to which they discriminate against individuals because of their membership in a protected group. Our first contribution is a method for achieving fairness by shifting the decision boundary for the protected group. The method is based on the theory of margins for boosting. Our method performs comparably to or outperforms previous algorithms in the fairness literature in terms of accuracy and low discrimination, while simultaneously allowing for a fast and transparent quantification of the trade-off between bias and error. Our second contribution addresses the shortcomings of the bias-error trade-off studied in most of the algorithmic fairness literature. We demonstrate that even hopelessly naive modifications of a biased algorithm, which cannot be reasonably said to be fair, can still achieve low bias and high accuracy. To help to distinguish between these naive algorithms and more sensible algorithms we propose a new measure of fairness, called resilience to random bias (RRB). We demonstrate that RRB distinguishes well between our naive and sensible fairness algorithms. RRB together with bias and accuracy provides a more complete picture of the fairness of an algorithm
    corecore