41 research outputs found
Machine learning in the real world with multiple objectives
Machine learning (ML) is ubiquitous in many real-world applications. Existing ML systems are based on optimizing a single quality metric such as prediction accuracy. These metrics typically do not fully align with real-world design constraints such as computation, latency, fairness, and acquisition costs that we encounter in real-world applications. In this thesis, we develop ML methods for optimizing prediction accuracy while accounting for such real-world constraints. In particular, we introduce multi-objective learning in two different setups: resource-efficient prediction and algorithmic fairness in language models.
First, we focus on decreasing the test-time computational costs of prediction systems. Budget constraints arise in many machine learning problems. Computational costs limit the usage of many models on small devices such as IoT or mobile phones and increase the energy consumption in cloud computing. We design systems that allow on-the-fly modification of the prediction model for each input sample. These sample-adaptive systems allow us to leverage wide variability in sample complexity where we learn policies for selecting cheap models for low complexity instances and using descriptive models only for complex ones. We utilize multiple--objective approach where one minimizes the system cost while preserving predictive accuracy. We demonstrate significant speed-ups in the fields of computer vision, structured prediction, natural language processing, and deep learning.
In the context of fairness, we first demonstrate that a naive application of ML methods runs the risk of amplifying social biases present in data. This danger is particularly acute for methods based on word embeddings, which are increasingly gaining importance in many natural language processing applications of ML. We show that word embeddings trained on Google News articles exhibit female/male gender stereotypes. We demonstrate that geometrically, gender bias is captured by unique directions in the word embedding vector space. To remove bias we formulate a empirical risk objective with fairness constraints to remove stereotypes from embeddings while maintaining desired associations. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduces gender bias in embeddings, while preserving its useful properties such as the ability to cluster related concepts
Resource Constrained Structured Prediction
We study the problem of structured prediction under test-time budget
constraints. We propose a novel approach applicable to a wide range of
structured prediction problems in computer vision and natural language
processing. Our approach seeks to adaptively generate computationally costly
features during test-time in order to reduce the computational cost of
prediction while maintaining prediction performance. We show that training the
adaptive feature generation system can be reduced to a series of structured
learning problems, resulting in efficient training using existing structured
learning algorithms. This framework provides theoretical justification for
several existing heuristic approaches found in literature. We evaluate our
proposed adaptive system on two structured prediction tasks, optical character
recognition (OCR) and dependency parsing and show strong performance in
reduction of the feature costs without degrading accuracy
Quantifying and Reducing Stereotypes in Word Embeddings
Machine learning algorithms are optimized to model statistical properties of
the training data. If the input data reflects stereotypes and biases of the
broader society, then the output of the learning algorithm also captures these
stereotypes. In this paper, we initiate the study of gender stereotypes in {\em
word embedding}, a popular framework to represent text data. As their use
becomes increasingly common, applications can inadvertently amplify unwanted
stereotypes. We show across multiple datasets that the embeddings contain
significant gender stereotypes, especially with regard to professions. We
created a novel gender analogy task and combined it with crowdsourcing to
systematically quantify the gender bias in a given embedding. We developed an
efficient algorithm that reduces gender stereotype using just a handful of
training examples while preserving the useful geometric properties of the
embedding. We evaluated our algorithm on several metrics. While we focus on
male/female stereotypes, our framework may be applicable to other types of
embedding biases.Comment: presented at 2016 ICML Workshop on #Data4Good: Machine Learning in
Social Good Applications, New York, N
Do Neural Ranking Models Intensify Gender Bias?
Concerns regarding the footprint of societal biases in information retrieval
(IR) systems have been raised in several previous studies. In this work, we
examine various recent IR models from the perspective of the degree of gender
bias in their retrieval results. To this end, we first provide a bias
measurement framework which includes two metrics to quantify the degree of the
unbalanced presence of gender-related concepts in a given IR model's ranking
list. To examine IR models by means of the framework, we create a dataset of
non-gendered queries, selected by human annotators. Applying these queries to
the MS MARCO Passage retrieval collection, we then measure the gender bias of a
BM25 model and several recent neural ranking models. The results show that
while all models are strongly biased toward male, the neural models, and in
particular the ones based on contextualized embedding models, significantly
intensify gender bias. Our experiments also show an overall increase in the
gender bias of neural models when they exploit transfer learning, namely when
they use (already biased) pre-trained embeddings.Comment: In Proceedings of ACM SIGIR 202
The impact of the EU on Turkey: Toward streamlining Europeanisation as a research programme
This article provides a reassessment of the literature on the transformative impact of the EU on Turkey through the lens of the Europeanisation research programme. It relies on systematic examination of a sample of the literature based on substantive findings, research design and methods. It suggests that this sample displays limitations characteristic of the Europeanisation research programme and proposes to remedy these limitations by applying the research design and methods used therein for generating empirically based comparative research on Turkey. © 2010 European Consortium for Political Research