Search CORE

9 research outputs found

Controlling for Unobserved Confounds in Classification Using Correlational Constraints

Author: Culotta Aron
Landeiro Virgile
Publication venue
Publication date: 03/05/2017
Field of study

As statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is an unobserved confounding variable

z

that influences both the features

\mathbf{x}

and the class variable

y

. When the influence of

z

changes from training to testing data, we find that the classifier accuracy can degrade rapidly. In our approach, we assume that we can predict the value of

z

at training time with some error. The prediction for

z

is then fed to Pearl's back-door adjustment to build our model. Because of the attenuation bias caused by measurement error in

z

, standard approaches to controlling for

z

are ineffective. In response, we propose a method to properly control for the influence of

z

by first estimating its relationship with the class variable

y

, then updating predictions for

z

to match that estimated relationship. By adjusting the influence of

z

, we show that we can build a model that exceeds competing baselines on accuracy as well as on robustness over a range of confounding relationships.Comment: 9 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Causally Regularized Learning with Agnostic Data Selection Bias

Author: Csurka Gabriella
Dos Reis Virgile Landeiro
Lechner Michael
Li Da
Long Mingsheng
Long Mingsheng
Pearl Judea
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/08/2018
Field of study

Most of previous machine learning algorithms are proposed based on the i.i.d. hypothesis. However, this ideal assumption is often violated in real applications, where selection bias may arise between training and testing process. Moreover, in many scenarios, the testing data is not even available during the training process, which makes the traditional methods like transfer learning infeasible due to their need on prior of test distribution. Therefore, how to address the agnostic selection bias for robust model learning is of paramount importance for both academic research and real applications. In this paper, under the assumption that causal relationships among variables are robust across domains, we incorporate causal technique into predictive modeling and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm by jointly optimize global confounder balancing and weighted logistic regression. Global confounder balancing helps to identify causal features, whose causal effect on outcome are stable across domains, then performing logistic regression on those causal features constructs a robust predictive model against the agnostic bias. To validate the effectiveness of our CRLR algorithm, we conduct comprehensive experiments on both synthetic and real world datasets. Experimental results clearly demonstrate that our CRLR algorithm outperforms the state-of-the-art methods, and the interpretability of our method can be fully depicted by the feature visualization.Comment: Oral paper of 2018 ACM Multimedia Conference (MM'18

arXiv.org e-Print Archive

Crossref

Robust Text Classification in the Presence of Confounding Bias

Author: Culotta Aron
Landeiro Virgile
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 21/02/2016
Field of study

As text classifiers become increasingly used in real-time applications, it is critical to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is a confounding variable Z that influences both the text features X and the class variable Y. For example, a classifier trained to predict the health status of a user based on their online communications may be confounded by socioeconomic variables. When the influence of Z changes from training to testing data, we find that classifier accuracy can degrade rapidly. Our approach, based on Pearl's back-door adjustment, estimates the underlying effect of a text variable on the class variable while controlling for the confounding variable. Although our goal is prediction, not causal inference, we find that such adjustments are essential to building text classifiers that are robust to confounding variables. On three diverse text classifications tasks, we find that covariate adjustment results in higher accuracy than competing baselines over a range of confounding relationships (e.g., in one setting, accuracy improves from 60% to 81%)

Association for the Advancement of Artificial Intelligence: AAAI Publications

Replication Data for: Controlling for Unobserved Confounds in Classification Using Correlational Constraints

Author: Culotta Aron
Landeiro Virgile
Publication venue: Harvard Dataverse
Publication date
Field of study

The replication data is stored as a tar archive file. It contains two folders: one for each main experiment described in the paper

Harvard Dataverse Network

Removing Confounds in Text Classification for Computational Social Science

Author: Landeiro Dos Reis Virgile
Publication venue
Publication date
Field of study

Nowadays, one can use social media and other online platforms to communicate with friends and family, write a review for a product, ask questions about a topic of interest, or even share details of private life with the rest of the world. The ever-increasing amount of user-generated content has provided researchers with data that can offer insights on human behavior. Because of that, the field of computational social science - at the intersection of machine learning and social sciences - has soared in the past years, especially within the field of public health research. However, working with large amounts of user-generated data creates new issues. In this thesis, we propose solutions for two problems encountered in computational social science and related to confounding bias.First, because of the anonymity provided by online forums, social networks, or other blogging platforms through the common usage of usernames, it is hard to get accurate information about users such as gender, age, or ethnicity. Therefore, although collecting data on a specific topic is made easier, conducting an observational study with this type of data is not simple. Indeed, when one wishes to run a study to measure the effect of a variable on another variable, one needs to control for potential confounding variables. In the case of user-generated data, these potential confounding variables are at best noisily observed or inferred and at worst not observed at all. In this work, we wish to provide a way to use these inferred latent attributes in order to conduct an observational study while reducing the effect of confounding bias as much as possible. We first present a simple matching method in a large-scale observational study. Then, we propose a method to retrieve relevant and representative documents through adaptive query building in order to build the treatment and control groups of an observational study.Second, we focus on the problem of controlling for confounding variables when the influence of these variables on the target variable of a classification problem changes over time. Although identifying and controlling for confounding variables has been assiduously studied in empirical social science, it is often neglected in text classification. This can be understood by the fact that, if we assume that the impact of confounding variables does not change between the training and the testing data, then prediction accuracy should only be slightly affected. Yet, this assumption often does not hold when working with user-generated text. Because of this, computational science studies are at risk of reaching false conclusions when based on text classifiers that are not controlling for confounding variables. In this document, we propose to build a classifier that is robust to confounding bias shift, and we show that we can build such a classifier in different situations: when there are one or more observed confounding variables, when there is one noisily predicted confounding variable, or when the confounding variable is unknown but can be detected through topic modeling

repository.iit (Illinois Institute of Technology)

Using Matched Samples to Estimate the Effects of Exercise on Mental Health via Twitter

Author: Culotta Aron
Landeiro Dos Reis Virgile
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 09/02/2015
Field of study

Recent work has demonstrated the value of social media monitoring for health surveillance (e.g., tracking influenza or depression rates). It is an open question whether such data can be used to make causal inferences (e.g., determining which activities lead to increased depression rates). Even in traditional, restricted domains, estimating causal effects from observational data is highly susceptible to confounding bias. In this work, we estimate the effect of exercise on mental health from Twitter, relying on statistical matching methods to reduce confounding bias. We train a text classifier to estimate the volume of a user's tweets expressing anxiety, depression, or anger, then compare two groups: those who exercise regularly (identified by their use of physical activity trackers like Nike+), and a matched control group. We find that those who exercise regularly have significantly fewer tweets expressing depression or anxiety; there is no significant difference in rates of tweets expressing anger. We additionally perform a sensitivity analysis to investigate how the many experimental design choices in such a study impact the final conclusions, including the quality of the classifier and the construction of the control group

Association for the Advancement of Artificial Intelligence: AAAI Publications

You Can't Stay Here

Author: Bernal James Lopez
Bernstein Michael S
Biddle Sam
Burnap Peter
Cheng Justin
Cheng Justin
DePuy Venita
Dos Reis Virgile Landeiro
Eisenstein Jacob
Hankes Keegan
Kiesler Sara
Koebler Jason
Kwok Irene
Michael Paul J.
Newell Edward
Rosenbaum Paul R
Sim Yanchuan
Syed Nabiha Syed
Van Dyke Michelle Broder
Warner William
Weber Anne
Xu Jun-Ming
Xu Zhi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Fine-Grained Privacy Detection with Graph-Regularized Hierarchical Attentive Representation Learning

Author: Bahdanau Dzmitry
Bruna Joan
Diederik
Dos Reis Virgile Landeiro
Ganguly Debasis
Han Shuguang
Huang Xiaolei
Humphreys Lee
Jacob Laurent
Jozefowicz Rafal
Krizhevsky Alex
Kumar Abhishek
Lai Siwei
Le Quoc
Lei Zhu
Li Qimai
Liqiang Nie
Liu Bang
Liu Pengfei
Maas Andrew L.
Mosallanezhad Ahmadreza
Nguyen Cam Tu
Nie Feiping
Pan Sinno Jialin
Phan NhatHai
Ruiyang Ren
Singh Abhishek Kumar
Song Yi
Thomas
Tran Lam
Vasalou Asimina
Xiaolin Chen
Xie Ruobing
Xuemeng Song
Yang Zichao
Zhang Min Ling
Zhiyong Cheng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

Author: A Nicholas
A Ricardo
A Scott
A Torralba
Abbe Mowshowitz
Abdulfatai Popoola
Abhay Sukumaran
Adam Fourney
Adam Mann
Adam N Joinson
Aditya Pal
Akshay Java
Alan Mislove
Alex Rosenblat
Alexandra Olteanu
Alexandra Olteanu
Alexandra Olteanu
Alexandra Olteanu
Alexandra Olteanu
Alexandra Olteanu
Alexandra Olteanu
Alexandra Olteanu
Alice E Marwick
Alice E Marwick
Amit Sharma
Amit Sharma
Andre Oboler
Andrew Schwartz
Andrew Yates
Andr�s Michael S Bernstein
Aniko Hannak
Aniko Hannak
Anne Archambault
Anne Aula
Anne Bowser
Anne Oeldorf-Hirsch
Arnaud De Myttenaere
Arvind Narayanan
Asaf Beasley
Axel Bruns
Axel Bruns
Axel Bruns
Aylin Caliskan-Islam
Barbara Poblete
Bernd Carsten Stahl
Bimal Viswanath
Boyd Danah
Brendan Daniel M Romero
Brent Hecht
Bryce Goodman
Caitlin Mclaughlin
Carlos Castillo
Carlos Castillo
Carlos Castillo
Chenliang Li
Chris Anderson
Chris Drummond
Chris Preist
Christian Reuter
Christian Sandvig
Christo Wilson
Claire Cain Miller
Claudia Wagner
Cliff Lampe
Cliff Lampe
Cristian Danescu-Niculescu-Mizil
Cynthia Dwork
Cynthia Rudin
D I Adam
D I Adam
D Mitja
D-P. Nguyen
Daniel Gayo Avello
Daniel Gayo-Avello
Daniel Preot�iucpreot�iuc-Pietro
Daniele Fanelli
Daphne Chang
Dario Amodei
David John Hughes
David Jurgens
David Jurgens
David Lazer
David Lazer
David Silverman
Davide Proserpio
Deen Freelon
Delia Mocanu
Delip Rao
Derek Ruths
Diego Saez-Trumper
Dimitar Nikolov
Dirk Hovy
Dong Nguyen
Dongyuan Bang Hui Lim
Dos Virgile Landeiro
Doug Schuler
Eduardo Ruiz
Edward Newell
Edward Newell
Elad Yom-Tov
Emilio Ferrara
Emilio Zagheni
Emilio Zagheni
Emre K?c?man
Emre K?c?man
Emre K?c?man
Emre K?c?man
Emre Kiciman
Emre Munmun De Choudhury
Erhard Rahm
Eric Gilbert
Eric Gilbert
Eric Horvitz
Erin Kenneally
Eszter Hargittai
Eszter Hargittai
Ethan Cohen-Cole
Eunil Sang Jib Kwon
Eyal Carmi
Eytan Bakshy
Eytan Bakshy
Eytan Bakshy
Fabrizio Silvestri
Felix Ming Fai Wong
Fernando Diaz
Fernando Diaz
Fernando Diaz
Fernando Diaz
Filip Radlinski
Florian Tramer
Fons Wijnhoven
Fred Morstatter
Fred Morstatter
Gang Wang
Gary King
Gary King
Giovanni Quattrone
Guy Shani
H Wallach
Hai Liang
Hamid Ekbia
Hannah Jean Miller
Hassan Saif
Hassan Saif
Hazim Almuhimedi
Hyunwoo Chun
H�seyin Oktay
I Ian
Ilknur Celik
Ingmar Venkata Rama Kiran Garimella
Ingmar Weber
Isaac Johnson
J Christopher
J Sean
Jacob Metcalf
Jacob Ratkiewicz
Jaime Teevan
James Grimmelmann
James Howison
James Matthew
James Mccorriston
Janne Lindqvist
Jeremy Ginsberg
Jes�s Bobadilla
Jiang Yang
Jie Tang
Jilin Chen
Jim Maddock
Joan Dimicco
Johan Ugander
Jonathan Cinnamon
Joseph Konstan
Josh Terrell
Jos� Van Dijck
Juhi Kulshrestha
Julia D Fraustino
Julia Schwarz
Jure Leskovec
Justin Cheng
Justin Cheng
Justin Cheng
Justin Cranshaw
Justin Sampson
Kalev Leetaru
Kashmir Hill
Kate Crawford
Kate Crawford
Kate Crawford
Kate Crawford
Kate Crawford
Kate Ehrlich
Kathy Charmaz
Katrin Weller
Katrin Weller
Kaveri Subrahmanyam
Kenneth Joseph
Kevin Macg
Kien Pham
Kira Radinsky
Kiri Wagstaff
Kj Ryan
Kristina Lerman
Kristina Lerman
Kristina Lerman
Kurt Thomas
L Daniel
L Norbert
L S Cynthia
Lars Backstrom
Laura Reed
Lauren Kirchner
Lev Muchnik
Lichan Hong
Lindsay Poirier
Linna Li
Liza Potts
Loizos Michael
Lucia Specia
Lucie Flekova
Luke Hutton
Luke Hutton
M Meredith
M Momin
M Momin
Maeve Duggan
Maeve Duggan
Marina Sokolova
Mark Dredze
Mark Dredze
Mark Graham
Markus Eni Mustafaraj
Martin Shelton
Matthew J Salganik
Matthew Richardson
Matthias R James W Pennebaker
Mattias Rost
Mauro Coletto
Meeyoung Cha
Meredith Ringel Munmun De Choudhury
Michael Zimmer
Michael Zimmer
Michail Vlachos
Michal Kosinski
Michele Starnini
Mike Dave D&apos
Miles Osborne
Miles Osborne
Miller Mcpherson
Moira Burke
Moira Burke
Monica Anderson
Mor Naaman
Moritz Hardt
Mossaab Bagdouri
Mrinal Kumar
Muhammad Bilal Zafar
Nan Lin
Nasir Naveed
Nicholas Diakopoulos
Nicole B Boyd
Nir Grinberg
Nitin Jindal
Norah Abokhodair
O Matthew
O&apos
Olteanu
Olteanu
Pablo Barbera
Paolo Giardullo
Parantapa Bhattacharya
Patrick Meier
Paul Bennett
Paul Ohm
Paul R Rosenbaum
Paul Resnick
Pedro Calais Guerra
R Colin
Ralph Gross
Raphael Ottoni
Raphael Ottoni
Raviv Cohen
Ricardo Baeza
Ricardo Baeza-Yates
Richard Mccreadie
Rishabh Mehrotra
Rohilla Cosma
Russell Lyons
Ruth Garcia-Gavilanes
Ryen White
Sai Teja Peddinti
Salvatore Scellato
Sam Burnett
Sandra Gonz�lez-Bail�n
Sandra Gonz�lez-Bail�n
Sara Hajian
Sarah Vieweg
Sarah Vieweg
Sarita Yardi Schoenebeck
Sauvik Das
Scott Counts
Scott Munmun De Choudhury
Shaomei Wu
Sharad Goel
Sherlock Campbell
Shirin Nilizadeh
Shuang-Hong Yang
Sinan Aral
Sitaram Asur
Sofiane Abbar
Solon Barocas
Solon Barocas
Solon Barocas
Sophie Chou
Subhasree Isaac L Johnson
Susan Dumais
Symeon Papadopoulos
Taha Yasseri
Tarleton Gillespie
Tatjana Scheffler
Tehila Minkus
Tim Harford
Tolga Bolukbasi
Tom Gruber
U N Ocha
U N Ocha
Umashanthi Pavalanathan
Umashanthi Pavalanathan
Us White House
Valentina Grasso
Virgile Landeiro
W Ryen
Wai-Tat Vera Liao
Walid Magdy
Wanita Sherchan
Wei Gao
Wei Gong
Wei Gong
William M Trochim
Winter Mason
Xin Yan
Yabing Liu
Yana Volkovich
Yang Wang
Yelena Mejova
Yishi Haji Mohammad Saleem
Yu Ru Munmun De Choudhury
Yu-Ru Lin
Yusuke Yamamoto
Yuxiao Dong
Zengbin Zhang
Zeynep Tufekci
Zoltan Gyongyi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Crossref