1 research outputs found
Can you Trust the Trend: Discovering Simpson's Paradoxes in Social Data
We investigate how Simpson's paradox affects analysis of trends in social
data. According to the paradox, the trends observed in data that has been
aggregated over an entire population may be different from, and even opposite
to, those of the underlying subgroups. Failure to take this effect into account
can lead analysis to wrong conclusions. We present a statistical method to
automatically identify Simpson's paradox in data by comparing statistical
trends in the aggregate data to those in the disaggregated subgroups. We apply
the approach to data from Stack Exchange, a popular question-answering
platform, to analyze factors affecting answerer performance, specifically, the
likelihood that an answer written by a user will be accepted by the asker as
the best answer to his or her question. Our analysis confirms a known Simpson's
paradox and identifies several new instances. These paradoxes provide novel
insights into user behavior on Stack Exchange.Comment: to appear in the Proceedings of WSDM-201