82 research outputs found
Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls
Large-scale databases of human activity in social media have captured
scientific and policy attention, producing a flood of research and discussion.
This paper considers methodological and conceptual challenges for this emergent
field, with special attention to the validity and representativeness of social
media big data analyses. Persistent issues include the over-emphasis of a
single platform, Twitter, sampling biases arising from selection by hashtags,
and vague and unrepresentative sampling frames. The socio-cultural complexity
of user behavior aimed at algorithmic invisibility (such as subtweeting,
mock-retweeting, use of "screen captures" for text, etc.) further complicate
interpretation of big data social media. Other challenges include accounting
for field effects, i.e. broadly consequential events that do not diffuse only
through the network under study but affect the whole society. The application
of network methods from other fields to the study of human social activity may
not always be appropriate. The paper concludes with a call to action on
practical steps to improve our analytic capacity in this promising,
rapidly-growing field.Comment: Tufekci, Zeynep. (2014). Big Questions for Social Media Big Data:
Representativeness, Validity and Other Methodological Pitfalls. In ICWSM '14:
Proceedings of the 8th International AAAI Conference on Weblogs and Social
Media, 2014. [forthcoming
Recommendations and User Agency: The Reachability of Collaboratively-Filtered Information
Recommender systems often rely on models which are trained to maximize
accuracy in predicting user preferences. When the systems are deployed, these
models determine the availability of content and information to different
users. The gap between these objectives gives rise to a potential for
unintended consequences, contributing to phenomena such as filter bubbles and
polarization. In this work, we consider directly the information availability
problem through the lens of user recourse. Using ideas of reachability, we
propose a computationally efficient audit for top- linear recommender
models. Furthermore, we describe the relationship between model complexity and
the effort necessary for users to exert control over their recommendations. We
use this insight to provide a novel perspective on the user cold-start problem.
Finally, we demonstrate these concepts with an empirical investigation of a
state-of-the-art model trained on a widely used movie ratings dataset.Comment: appeared at FAccT '2
POTs: Protective Optimization Technologies
Algorithmic fairness aims to address the economic, moral, social, and
political impact that digital systems have on populations through solutions
that can be applied by service providers. Fairness frameworks do so, in part,
by mapping these problems to a narrow definition and assuming the service
providers can be trusted to deploy countermeasures. Not surprisingly, these
decisions limit fairness frameworks' ability to capture a variety of harms
caused by systems.
We characterize fairness limitations using concepts from requirements
engineering and from social sciences. We show that the focus on algorithms'
inputs and outputs misses harms that arise from systems interacting with the
world; that the focus on bias and discrimination omits broader harms on
populations and their environments; and that relying on service providers
excludes scenarios where they are not cooperative or intentionally adversarial.
We propose Protective Optimization Technologies (POTs). POTs provide means
for affected parties to address the negative impacts of systems in the
environment, expanding avenues for political contestation. POTs intervene from
outside the system, do not require service providers to cooperate, and can
serve to correct, shift, or expose harms that systems impose on populations and
their environments. We illustrate the potential and limitations of POTs in two
case studies: countering road congestion caused by traffic-beating
applications, and recalibrating credit scoring for loan applicants.Comment: Appears in Conference on Fairness, Accountability, and Transparency
(FAT* 2020). Bogdan Kulynych and Rebekah Overdorf contributed equally to this
work. Version v1/v2 by Seda G\"urses, Rebekah Overdorf, and Ero Balsa was
presented at HotPETS 2018 and at PiMLAI 201
Recommended from our members
Lets Talk about Race: Identity, Chatbots, and AI
Why is it so hard for chatbots to talk about race? This work explores how the biased contents of databases, the syntactic focus of natural language processing, and the opaque nature of deep learning algorithms cause chatbots difficulty in handling race-talk. In each of these areas, the tensions between race and chatbots create new opportunities for people and machines. By making the abstract and disparate qualities of this problem space tangible, we can develop chatbots that are more capable of handling race-talk in its many forms. Our goal is to provide the HCI community with ways to begin addressing the question, how can chatbots handle race-talk in new and improved ways
What were the historical reasons for the resistance to recognizing airborne transmission during the COVIDâ19 pandemic?
The question of whether SARSâCoVâ2 is mainly transmitted by droplets or aerosols has been highly controversial. We sought to explain this controversy through a historical analysis of transmission research in other diseases. For most of human history, the dominant paradigm was that many diseases were carried by the air, often over long distances and in a phantasmagorical way. This miasmatic paradigm was challenged in the mid to late 19th century with the rise of germ theory, and as diseases such as cholera, puerperal fever, and malaria were found to actually transmit in other ways. Motivated by his views on the importance of contact/droplet infection, and the resistance he encountered from the remaining influence of miasma theory, prominent public health official Charles Chapin in 1910 helped initiate a successful paradigm shift, deeming airborne transmission most unlikely. This new paradigm became dominant. However, the lack of understanding of aerosols led to systematic errors in the interpretation of research evidence on transmission pathways. For the next five decades, airborne transmission was considered of negligible or minor importance for all major respiratory diseases, until a demonstration of airborne transmission of tuberculosis (which had been mistakenly thought to be transmitted by droplets) in 1962. The contact/droplet paradigm remained dominant, and only a few diseases were widely accepted as airborne before COVIDâ19: those that were clearly transmitted to people not in the same room. The acceleration of interdisciplinary research inspired by the COVIDâ19 pandemic has shown that airborne transmission is a major mode of transmission for this disease, and is likely to be significant for many respiratory infectious diseases
Facebook, Youth and Privacy in Networked Publics
Media accounts would have us believe that today’s youth are a particularly narcissistic generation. Young adults are often portrayed as exhibitionists who share personal information excessively and only react if “burned” by experience. This paper reports results from 450 surveys of young adults on social network site usage and privacy and surveillance experiences--as well as from a historical archive dating back to 2006. The findings show a complex picture of a generation actively engaging visibility and social boundaries online through privacy and visibility practices. A striking increase in privacy protective activities is documented. I examine whether these changes are in response to personal negative experiences from online disclosure or if they derive from general awareness. I find that students are reacting pro-actively and adjusting their privacy settings above and beyond the impact of negative personal experiences. Contrary to media reports, young adults do not appear uncaring about privacy and are not waiting until they get burned. Significant racial and gender differences remain in privacy behaviors. Strikingly, about 20% report having deactivated their profile at least once
Engineering the public: Big data, surveillance and computational politics
Digital technologies have given rise to a new combination of big data and computational practices which allow for massive, latent data collection and sophisticated computational modeling, increasing the capacity of those with resources and access to use these tools to carry out highly effective, opaque and unaccountable campaigns of persuasion and social engineering in political, civic and commercial spheres. I examine six intertwined dynamics that pertain to the rise of computational politics: the rise of big data, the shift away from demographics to individualized targeting, the opacity and power of computational modeling, the use of persuasive behavioral science, digital media enabling dynamic real-time experimentation, and the growth of new power brokers who own the data or social media environments. I then examine the consequences of these new mechanisms on the public sphere and political campaigns
- âŠ