15,240 research outputs found
The free encyclopaedia that anyone can edit: the shifting values of Wikipedia editors
Wikipedia is often held up as an example of the potential of the internet to foster open, free and non-commercial collaboration. However such discourses often conflate these values without recognising how they play out in reality in a peer-production community. As Wikipedia is evolving, it is an ideal time to examine these discourses and the tensions that exist between its initial ideals and the reality of commercial activity in the encyclopaedia. Through an analysis of three failed proposals to ban paid advocacy editing in the English language Wikipedia, this paper highlights the shift in values from the early editorial community that forked encyclopaedic content over the threat of commercialisation, to one that today values the freedom that allows anyone to edit the encyclopaedia
Wikipedia and the politics of mass collaboration
Working together to produce socio-technological objects, based on emergent platforms
of economic production, is of great importance in the task of political transformation and the
creation of new subjectivities. Increasingly, “collaboration” has become a veritable buzzword
used to describe the human associations that create such new media objects. In the language
of “Web 2.0”, “participatory culture”, “user-generated content”, “peer production” and
the “produser”, first and foremost we are all collaborators. In this paper I investigate recent
literature that stresses the collaborative nature of Web 2.0, and in particular, works that
address the nascent processes of peer production. I contend that this material positions such
projects as what Chantal Mouffe has described as the “post-political”; a fictitious space far
divorced from the clamour of the everyday. I analyse one Wikipedia entry to demonstrate the
distance between this post-political discourse of collaboration and the realities it describes,
and finish by arguing for a more politicised notion of collaboration
What to do about non-standard (or non-canonical) language in NLP
Real world data differs radically from the benchmark corpora we use in
natural language processing (NLP). As soon as we apply our technologies to the
real world, performance drops. The reason for this problem is obvious: NLP
models are trained on samples from a limited set of canonical varieties that
are considered standard, most prominently English newswire. However, there are
many dimensions, e.g., socio-demographics, language, genre, sentence type, etc.
on which texts can differ from the standard. The solution is not obvious: we
cannot control for all factors, and it is not clear how to best go beyond the
current practice of training on homogeneous data from a single domain and
language.
In this paper, I review the notion of canonicity, and how it shapes our
community's approach to language. I argue for leveraging what I call fortuitous
data, i.e., non-obvious data that is hitherto neglected, hidden in plain sight,
or raw data that needs to be refined. If we embrace the variety of this
heterogeneous data by combining it with proper algorithms, we will not only
produce more robust models, but will also enable adaptive language technology
capable of addressing natural language variation.Comment: KONVENS 201
Multilingual search for cultural heritage archives via combining multiple translation resources
The linguistic features of material in Cultural Heritage (CH) archives may be in various languages requiring a facility for effective multilingual search. The specialised
language often associated with CH content introduces problems for automatic translation to support search applications. The MultiMatch project is focused on enabling
users to interact with CH content across different media types and languages. We present results from a MultiMatch study exploring various translation techniques for
the CH domain. Our experiments examine translation techniques for the English language CLEF 2006 Cross-Language
Speech Retrieval (CL-SR) task using Spanish, French and German queries. Results compare effectiveness of our query
translation against a monolingual baseline and show improvement when combining a domain-specific translation lexicon with a standard machine translation system
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
- …