636 research outputs found

    Relating Web pages to enable information-gathering tasks

    Full text link
    We argue that relationships between Web pages are functions of the user's intent. We identify a class of Web tasks - information-gathering - that can be facilitated by a search engine that provides links to pages which are related to the page the user is currently viewing. We define three kinds of intentional relationships that correspond to whether the user is a) seeking sources of information, b) reading pages which provide information, or c) surfing through pages as part of an extended information-gathering process. We show that these three relationships can be productively mined using a combination of textual and link information and provide three scoring mechanisms that correspond to them: {\em SeekRel}, {\em FactRel} and {\em SurfRel}. These scoring mechanisms incorporate both textual and link information. We build a set of capacitated subnetworks - each corresponding to a particular keyword - that mirror the interconnection structure of the World Wide Web. The scores are computed by computing flows on these subnetworks. The capacities of the links are derived from the {\em hub} and {\em authority} values of the nodes they connect, following the work of Kleinberg (1998) on assigning authority to pages in hyperlinked environments. We evaluated our scoring mechanism by running experiments on four data sets taken from the Web. We present user evaluations of the relevance of the top results returned by our scoring mechanisms and compare those to the top results returned by Google's Similar Pages feature, and the {\em Companion} algorithm proposed by Dean and Henzinger (1999).Comment: In Proceedings of ACM Hypertext 200

    Operationalizing Individual Fairness with Pairwise Fair Representations

    No full text
    We revisit the notion of individual fairness proposed by Dwork et al. A central challenge in operationalizing their approach is the difficulty in eliciting a human specification of a similarity metric. In this paper, we propose an operationalization of individual fairness that does not rely on a human specification of a distance metric. Instead, we propose novel approaches to elicit and leverage side-information on equally deserving individuals to counter subordination between social groups. We model this knowledge as a fairness graph, and learn a unified Pairwise Fair Representation (PFR) of the data that captures both data-driven similarity between individuals and the pairwise side-information in fairness graph. We elicit fairness judgments from a variety of sources, including human judgments for two real-world datasets on recidivism prediction (COMPAS) and violent neighborhood prediction (Crime & Communities). Our experiments show that the PFR model for operationalizing individual fairness is practically viable.Comment: To be published in the proceedings of the VLDB Endowment, Vol. 13, Issue.

    iFair: Learning Individually Fair Data Representations for Algorithmic Decision Making

    Get PDF
    People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: giving adequate success rates to specifically protected groups. In contrast, the alternative paradigm of individual fairness has received relatively little attention, and this paper advances this less explored direction. The paper introduces a method for probabilistically mapping user records into a low-rank representation that reconciles individual fairness and the utility of classifiers and rankings in downstream applications. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on a variety of real-world datasets. Our experiments show substantial improvements over the best prior work for this setting.Comment: Accepted at ICDE 2019. Please cite the ICDE 2019 proceedings versio

    {iFair}: {L}earning Individually Fair Data Representations for Algorithmic Decision Making

    Get PDF
    People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting

    Operationalizing fairness for responsible machine learning

    Get PDF
    As machine learning (ML) is increasingly used for decision making in scenarios that impact humans, there is a growing awareness of its potential for unfairness. A large body of recent work has focused on proposing formal notions of fairness in ML, as well as approaches to mitigate unfairness. However, there is a growing disconnect between the ML fairness literature and the needs to operationalize fairness in practice. This thesis addresses the need for responsible ML by developing new models and methods to address challenges in operationalizing fairness in practice. Specifically, it makes the following contributions. First, we tackle a key assumption in the group fairness literature that sensitive demographic attributes such as race and gender are known upfront, and can be readily used in model training to mitigate unfairness. In practice, factors like privacy and regulation often prohibit ML models from collecting or using protected attributes in decision making. To address this challenge we introduce the novel notion of computationally-identifiable errors and propose Adversarially Reweighted Learning (ARL), an optimization method that seeks to improve the worst-case performance over unobserved groups, without requiring access to the protected attributes in the dataset. Second, we argue that while group fairness notions are a desirable fairness criterion, they are fundamentally limited as they reduce fairness to an average statistic over pre-identified protected groups. In practice, automated decisions are made at an individual level, and can adversely impact individual people irrespective of the group statistic. We advance the paradigm of individual fairness by proposing iFair (individually fair representations), an optimization approach for learning a low dimensional latent representation of the data with two goals: to encode the data as well as possible, while removing any information about protected attributes in the transformed representation. Third, we advance the individual fairness paradigm, which requires that similar individuals receive similar outcomes. However, similarity metrics computed over observed feature space can be brittle, and inherently limited in their ability to accurately capture similarity between individuals. To address this, we introduce a novel notion of fairness graphs, wherein pairs of individuals can be identified as deemed similar with respect to the ML objective. We cast the problem of individual fairness into graph embedding, and propose PFR (pairwise fair representations), a method to learn a unified pairwise fair representation of the data. Fourth, we tackle the challenge that production data after model deployment is constantly evolving. As a consequence, in spite of the best efforts in training a fair model, ML systems can be prone to failure risks due to a variety of unforeseen reasons. To ensure responsible model deployment, potential failure risks need to be predicted, and mitigation actions need to be devised, for example, deferring to a human expert when uncertain or collecting additional data to address model’s blind-spots. We propose Risk Advisor, a model-agnostic meta-learner to predict potential failure risks and to give guidance on the sources of uncertainty inducing the risks, by leveraging information theoretic notions of aleatoric and epistemic uncertainty. This dissertation brings ML fairness closer to real-world applications by developing methods that address key practical challenges. Extensive experiments on a variety of real-world and synthetic datasets show that our proposed methods are viable in practice.Mit der zunehmenden Verwendung von Maschinellem Lernen (ML) in Situationen, die Auswirkungen auf Menschen haben, nimmt das Bewusstsein über das Potenzial für Unfair- ness zu. Ein großer Teil der jüngeren Forschung hat den Fokus auf das formale Verständnis von Fairness im Zusammenhang mit ML sowie auf Ansätze zur Überwindung von Unfairness gelegt. Jedoch driften die Literatur zu Fairness in ML und die Anforderungen zur Implementierung in der Praxis zunehmend auseinander. Diese Arbeit beschäftigt sich mit der Notwendigkeit für verantwortungsvolles ML, wofür neue Modelle und Methoden entwickelt werden, um die Herausforderungen im Fairness-Bereich in der Praxis zu bewältigen. Ihr wissenschaftlicher Beitrag ist im Folgenden dargestellt. In Kapitel 3 behandeln wir die Schlüsselprämisse in der Gruppenfairnessliteratur, dass sensible demografische Merkmale wie etwa die ethnische Zugehörigkeit oder das Geschlecht im Vorhinein bekannt sind und während des Trainings eines Modells zur Reduzierung der Unfairness genutzt werden können. In der Praxis hindern häufig Einschränkungen zum Schutz der Privatsphäre oder gesetzliche Regelungen ML-Modelle daran, geschützte Merkmale für die Entscheidungsfindung zu sammeln oder zu verwenden. Um diese Herausforderung zu überwinden, führen wir das Konzept der Komputational-identifizierbaren Fehler ein und stellen Adversarially Reweighted Learning (ARL) vor, ein Optimierungsverfahren, das die Worst-Case-Performance bei unbekannter Gruppenzugehörigkeit ohne Wissen über die geschützten Merkmale verbessert. In Kapitel 4 stellen wir dar, dass Konzepte für Gruppenfairness trotz ihrer Eignung als Fairnesskriterium grundsätzlich beschränkt sind, da Fairness auf eine gemittelte statistische Größe für zuvor identifizierte geschützte Gruppen reduziert wird. In der Praxis werden automatisierte Entscheidungen auf einer individuellen Ebene gefällt, und können unabhängig von der gruppenbezogenen Statistik Nachteile für Individuen haben. Wir erweitern das Konzept der individuellen Fairness um unsere Methode iFair (individually fair representations), ein Optimierungsverfahren zum Erlernen einer niedrigdimensionalen Darstellung der Daten mit zwei Zielen: die Daten so akkurat wie möglich zu enkodieren und gleichzeitig jegliche Information über die geschützten Merkmale in der transformierten Darstellung zu entfernen. In Kapitel 5 entwickeln wir das Paradigma der individuellen Fairness weiter, das ein ähnliches Ergebnis für ähnliche Individuen erfordert. Ähnlichkeitsmetriken im beobachteten Featureraum können jedoch unzuverlässig und inhärent beschränkt darin sein, Ähnlichkeit zwischen Individuen korrekt abzubilden. Um diese Herausforderung anzugehen, führen wir den neue Konzept der Fairnessgraphen ein, in denen Paare (oder Sets) von Individuen als ähnlich im Bezug auf die ML-Aufgabe identifiziert werden. Wir übersetzen das Problem der individuellen Fairness in eine Grapheinbindung und stellen PFR (pairwise fair representations) vor, eine Methode zum Erlernen einer vereinheitlichten paarweisen fairen Abbildung der Daten. In Kapitel 6 gehen wir die Herausforderung an, dass sich die Daten im Feld nach der Inbetriebnahme des Modells fortlaufend ändern. In der Konsequenz können ML-Systeme trotz größter Bemühungen, ein faires Modell zu trainieren, aufgrund einer Vielzahl an unvorhergesehenen Gründen scheitern. Um eine verantwortungsvolle Implementierung sicherzustellen, gilt es, Risiken für ein potenzielles Versagen vorherzusehen und Gegenmaßnahmen zu entwickeln,z.B. die Übertragung der Entscheidung an einen menschlichen Experten bei Unsicherheit oder das Sammeln weiterer Daten, um die blinden Flecken des Modells abzudecken. Wir stellen mit Risk Advisor einen modell-agnostischen Meta-Learner vor, der Risiken für potenzielles Versagen vorhersagt und Anhaltspunkte für die Ursache der zugrundeliegenden Unsicherheit basierend auf informationstheoretischen Konzepten der aleatorischen und epistemischen Unsicherheit liefert. Diese Dissertation bringt Fairness für verantwortungsvolles ML durch die Entwicklung von Ansätzen für die Lösung von praktischen Kernproblemen näher an die Anwendungen im Feld. Umfassende Experimente mit einer Vielzahl von synthetischen und realen Datensätzen zeigen, dass unsere Ansätze in der Praxis umsetzbar sind.The International Max Planck Research School for Computer Science (IMPRS-CS

    Learning Ideological Latent space in Twitter

    Get PDF
    People are shifting from traditional news sources to online news at an incredibly fast rate. However, the technology behind online news consumption forces users to be confined to content that confirms with their own point of view. This has led to social phenomena like polarization of point-of-view and intolerance towards opposing views. In this thesis we study information filter bubbles from a mathematical standpoint. We use data mining techniques to learn a liberal-conservative ideology space in Twitter and presents a case study on how such a latent space can be used to tackle the filter bubble problem on social networks. We model the problem of learning liberal-conservative ideology as a constrained optimization problem. Using matrix factorization we uncover an ideological latent space for content consumption and social interaction habits of users in Twitter. We validate our model on real world Twitter dataset on three controversial topics - "Obamacare", "gun control" and "abortion". Using the proposed technique we are able to separate users by their ideology with 95% purity. Our analysis shows that there is a very high correlation (0.8 - 0.9) between the estimated ideology using machine learning and true ideology collected from various sources. Finally, we re-examine the learnt latent space, and present a case study showcasing how this ideological latent space can be used to develop exploratory and interactive interfaces that can help in diffusing the information filter bubble. Our matrix factorization based model for learning ideology latent space, along with the case studies provide a theoretically solid as well as a practical and interesting point-of-view to online polarization. Further, it provides a strong foundation and suggests several avenues for future work in multiple emerging interdisciplinary research areas, for instance, humanly interpretable and explanatory machine learning, transparent recommendations and a new field that we coin as Next Generation Social Networks

    Computer-aided analysis and design of the shape rolling process for producing turbine engine airfoils

    Get PDF
    Mild steel (AISI 1018) was selected as model cold-rolling material and Ti-6Al-4V and INCONEL 718 were selected as typical hot-rolling and cold-rolling alloys, respectively. The flow stress and workability of these alloys were characterized and friction factor at the roll/workpiece interface was determined at their respective working conditions by conducting ring tests. Computer-aided mathematical models for predicting metal flow and stresses, and for simulating the shape-rolling process were developed. These models utilize the upper-bound and the slab methods of analysis, and are capable of predicting the lateral spread, roll-separating force, roll torque and local stresses, strains and strain rates. This computer-aided design (CAD) system is also capable of simulating the actual rolling process and thereby designing roll-pass schedule in rolling of an airfoil or similar shape. The predictions from the CAD system were verified with respect to cold rolling of mild steel plates. The system is being applied to cold and hot isothermal rolling of an airfoil shape, and will be verified with respect to laboratory experiments under controlled conditions
    corecore