6,493 research outputs found
Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture
Scholars and practitioners across domains are increasingly concerned with
algorithmic transparency and opacity, interrogating the values and assumptions
embedded in automated, black-boxed systems, particularly in user-generated
content platforms. I report from an ethnography of infrastructure in Wikipedia
to discuss an often understudied aspect of this topic: the local, contextual,
learned expertise involved in participating in a highly automated
social-technical environment. Today, the organizational culture of Wikipedia is
deeply intertwined with various data-driven algorithmic systems, which
Wikipedians rely on to help manage and govern the "anyone can edit"
encyclopedia at a massive scale. These bots, scripts, tools, plugins, and
dashboards make Wikipedia more efficient for those who know how to work with
them, but like all organizational culture, newcomers must learn them if they
want to fully participate. I illustrate how cultural and organizational
expertise is enacted around algorithmic agents by discussing two
autoethnographic vignettes, which relate my personal experience as a veteran in
Wikipedia. I present thick descriptions of how governance and gatekeeping
practices are articulated through and in alignment with these automated
infrastructures. Over the past 15 years, Wikipedian veterans and administrators
have made specific decisions to support administrative and editorial workflows
with automation in particular ways and not others. I use these cases of
Wikipedia's bot-supported bureaucracy to discuss several issues in the fields
of critical algorithms studies, critical data studies, and fairness,
accountability, and transparency in machine learning -- most principally
arguing that scholarship and practice must go beyond trying to "open up the
black box" of such systems and also examine sociocultural processes like
newcomer socialization.Comment: 14 pages, typo fixed in v
A Comparative analysis: QA evaluation questions versus real-world queries
This paper presents a comparative analysis of user queries to a web search engine, questions to a Q&A service (answers.com), and questions employed in question answering (QA) evaluations at TREC and CLEF. The analysis shows that user queries to search engines contain mostly content words (i.e. keywords) but lack structure words (i.e. stopwords) and capitalization. Thus, they resemble natural language input after case folding and stopword removal. In contrast, topics for QA evaluation and questions to answers.com mainly
consist of fully capitalized and syntactically well-formed questions. Classification experiments using a na¨ıve Bayes classifier show that stopwords play an important role in determining the expected answer type. A classification based on stopwords is considerably more accurate (47.5% accuracy) than a classification based on all query words (40.1% accuracy) or on content words (33.9% accuracy). To
simulate user input, questions are preprocessed by case folding and stopword removal. Additional classification experiments aim at reconstructing the syntactic wh-word frame of a question, i.e. the embedding of the interrogative word. Results indicate that this part of
questions can be reconstructed with moderate accuracy (25.7%), but for a classification problem with a much larger number of classes compared to classifying queries by expected answer type (2096 classes vs. 130 classes). Furthermore, eliminating stopwords can lead to multiple reconstructed questions with a different or with the opposite meaning (e.g. if negations or temporal restrictions are included). In conclusion, question reconstruction from short user queries can be seen as a new realistic evaluation challenge for QA systems
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Contextual question answering for the health domain
Studies have shown that natural language interfaces such as question answering and conversational systems allow information to be accessed and understood more easily by users who are unfamiliar with the nuances of the delivery mechanisms (e.g., keyword-based search engines) or have limited literacy in certain domains (e.g., unable to comprehend health-related content due to terminology barrier). In particular, the increasing use of the web for health information prompts us to reexamine our existing delivery mechanisms. We present enquireMe, which is a contextual question answering system that provides lay users with the ability to obtain responses about a wide range of health topics by vaguely expressing at the start and gradually refining their information needs over the course of an interaction session using natural language. enquireMe allows the users to engage in 'conversations' about their health concerns, a process that can be therapeutic in itself. The system uses community-driven question-answer pairs from the web together with a decay model to deliver the top scoring answers as responses to the users' unrestricted inputs. We evaluated enquireMe using benchmark data from WebMD and TREC to assess the accuracy of system-generated answers. Despite the absence of complex knowledge acquisition and deep language processing, enquireMe is comparable to the state-of-the-art question answering systems such as START as well as those interactive systems from TREC
- …