Search CORE

249 research outputs found

Wikipedia vandalism detection: combining natural language, metadata, and reputation features

Author: B. Adler
G. Druck
K. Smets
L. Breiman
M. Potthast
M. Potthast
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Wikipedia is an online encyclopedia which anyone can edit. While most edits are constructive, about 7% are acts of vandalism. Such behavior is characterized by modifications made in bad faith; introducing spam and other inappropriate content. In this work, we present the results of an effort to integrate three of the leading approaches to Wikipedia vandalism detection: a spatio-temporal analysis of metadata (STiki), a reputation-based system (WikiTrust), and natural language processing features. The performance of the resulting joint system improves the state-of-the-art from all previous methods and establishes a new baseline for Wikipedia vandalism detection. We examine in detail the contribution of the three approaches, both for the task of discovering fresh vandalism, and for the task of locating vandalism in the complete set of Wikipedia revisions.The authors from Universitat Politècnica de València thank also the MICINN research project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i). UPenn contributions were supported in part by ONR MURI N00014-07-1-0907. This research was partially supported by award 1R01GM089820-01A1 from the National Institute Of General Medical Sciences, and by ISSDM, a UCSC-LANL educational collaboration.Adler, BT.; Alfaro, LD.; Mola Velasco, SM.; Rosso, P.; West, AG. (2011). Wikipedia vandalism detection: combining natural language, metadata, and reputation features. En Computational Linguistics and Intelligent Text Processing. Springer Verlag (Germany). 6609:277-288. https://doi.org/10.1007/978-3-642-19437-5_23S2772886609Wikimedia Foundation: Wikipedia (2010) [Online; accessed December 29, 2010]Wikimedia Foundation: Wikistats (2010) [Online; accessed December 29, 2010]Potthast, M.: Crowdsourcing a Wikipedia Vandalism Corpus. In: Proc. of the 33rd Intl. ACM SIGIR Conf. (SIGIR 2010). ACM Press, New York (July 2010)Gralla, P.: U.S. senator: It’s time to ban Wikipedia in schools, libraries, http://blogs.computerworld.com/4598/u_s_senator_its_time_to_ban_wikipedia_in_schools_libraries [Online; accessed November 15, 2010]Olanoff, L.: School officials unite in banning Wikipedia. Seattle Times (November 2007)Mola-Velasco, S.M.: Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops, Padua, Italy, September 22-23 (2010)Adler, B., de Alfaro, L., Pye, I.: Detecting Wikipedia Vandalism using WikiTrust. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops, Padua, Italy, September 22-23 (2010)West, A.G., Kannan, S., Lee, I.: Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In: EUROSEC 2010: Proceedings of the Third European Workshop on System Security, pp. 22–28 (2010)West, A.G.: STiki: A Vandalism Detection Tool for Wikipedia (2010), http://en.wikipedia.org/wiki/Wikipedia:STikiWikipedia: User: AntiVandalBot – Wikipedia, http://en.wikipedia.org/wiki/User:AntiVandalBot (2010) [Online; accessed November 2, 2010]Wikipedia: User:MartinBot – Wikipedia (2010), http://en.wikipedia.org/wiki/User:MartinBot [Online; accessed November 2, 2010]Wikipedia: User:ClueBot – Wikipedia (2010), http://en.wikipedia.org/wiki/User:ClueBot [Online; accessed November 2, 2010]Carter, J.: ClueBot and Vandalism on Wikipedia (2008), http://www.acm.uiuc.edu/~carter11/ClueBot.pdf [Online; accessed November 2, 2010]Rodríguez Posada, E.J.: AVBOT: detección y corrección de vandalismos en Wikipedia. NovATIca (203), 51–53 (2010)Potthast, M., Stein, B., Gerling, R.: Automatic Vandalism Detection in Wikipedia. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 663–668. Springer, Heidelberg (2008)Smets, K., Goethals, B., Verdonk, B.: Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach. In: WikiAI 2008: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 43–48. AAAI Press, Menlo Park (2008)Druck, G., Miklau, G., McCallum, A.: Learning to Predict the Quality of Contributions to Wikipedia. In: WikiAI 2008: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 7–12. AAAI Press, Menlo Park (2008)Itakura, K.Y., Clarke, C.L.: Using Dynamic Markov Compression to Detect Vandalism in the Wikipedia. In: SIGIR 2009: Proc. of the 32nd Intl. ACM Conference on Research and Development in Information Retrieval, pp. 822–823 (2009)Chin, S.C., Street, W.N., Srinivasan, P., Eichmann, D.: Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models. In: WICOW 2010: Proc. of the 4th Workshop on Information Credibility on the Web (April 2010)Zeng, H., Alhoussaini, M., Ding, L., Fikes, R., McGuinness, D.: Computing Trust from Revision History. In: Intl. Conf. on Privacy, Security and Trust (2006)McGuinness, D., Zeng, H., da Silva, P., Ding, L., Narayanan, D., Bhaowal, M.: Investigation into Trust for Collaborative Information Repositories: A Wikipedia Case Study. In: Proc. of the Workshop on Models of Trust for the Web (2006)Adler, B., de Alfaro, L.: A Content-Driven Reputation System for the Wikipedia. In: WWW 2007: Proceedings of the 16th International World Wide Web Conference. ACM Press, New York (2007)Belani, A.: Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach. Computing Research Repository (CoRR) abs/1001.0700 (2010)Potthast, M., Stein, B., Holfeld, T.: Overview of the 1st International Competition on Wikipedia Vandalism Detection. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops, Padua, Italy, September 22-23 (2010)Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: ICML 2006: Proc. of the 23rd Intl. Conf. on Machine Learning (2006

CiteSeerX

Crossref

RiuNet

ScholarlyCommons@Penn

Development of a low cost robot system for autonomous measuring of spatial field distributions

Author: B. Schetelig
S. Dickmann
S. Parr
S. Potthast
Publication venue: Copernicus Publications
Publication date: 01/09/2012
Field of study

A new kind of a modular multi-purpose robot system is developed to measure the spatial field distributions of very large as well as of small and crowded areas. The probe is automatically placed at a number of pre-defined positions where measurements are carried out. The advantages of this system are its very low influence on the measured field as well as its wide area of possible applications. In addition, the initial costs are quite low. In this paper the theory underlying the measurement principle is explained. The accuracy is analyzed and sample measurements are presented

Crossref

Directory of Open Access Journals

A Decade of Shared Tasks in Digital Text Forensics at PAN

Author: B Stein
E Amigó
E Stamatatos
E Stamatatos
E Stamatatos
H Asghari
J Holmes
JW Pennebaker
M Koppel
M Potthast
M Potthast
M Potthast
M Potthast
O Halvani
P Rosso
S Argamon
T Gollub
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

[EN] Digital text forensics aims at examining the originality and credibility of information in electronic documents and, in this regard, to extract and analyze information about the authors of these documents. The research field has been substantially developed during the last decade. PAN is a series of shared tasks that started in 2009 and significantly contributed to attract the attention of the research community in well-defined digital text forensics tasks. Several benchmark datasets have been developed to assess the state-of-the-art performance in a wide range of tasks. In this paper, we present the evolution of both the examined tasks and the developed datasets during the last decade. We also briefly introduce the upcoming PAN 2019 shared tasks.We are indebted to many colleagues and friends who contributed greatly to PAN's tasks: Maik Anderka, Shlomo Argamon, Alberto Barrón-Cedeño, Fabio Celli, Fabio Crestani, Walter Daelemans, Andreas Eiselt, Tim Gollub, Parth Gupta, Matthias Hagen, Teresa Holfeld, Patrick Juola, Giacomo Inches, Mike Kestemont, Moshe Koppel, Manuel Montes-y-Gómez, Aurelio Lopez-Lopez, Francisco Rangel, Miguel Angel Sánchez-Pérez, Günther Specht, Michael Tschuggnall, and Ben Verhoeven. Our special thanks go to PAN¿s sponsors throughout the years and not least to the hundreds of participants.Potthast, M.; Rosso, P.; Stamatatos, E.; Stein, B. (2019). A Decade of Shared Tasks in Digital Text Forensics at PAN. Lecture Notes in Computer Science. 11438:291-300. https://doi.org/10.1007/978-3-030-15719-7_39S2913001143

Crossref

RiuNet

Overview of the 2nd international competition on plagiarism detection

Author: Barron-Cedeno A.
Eiselt A.
Potthast M.
Rosso P.
Stein B.
Publication venue: CEUR-WS
Publication date: 01/01/2010
Field of study

This paper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10. We start with a unified retrieval process that summarizes the best practices employed this year. Then, the detectors' performances are evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length. Finally, all results are compared to those of last year's competition

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Overview of the 3rd international competition on plagiarism detection

Author: Barron-Cedeno A.
Eiselt A.
Potthast M.
Rosso P.
Stein B.
Publication venue: CEUR-WS
Publication date: 01/01/2011
Field of study

This paper overviews eleven plagiarism detectors that have been developed and evaluated within PAN'11. We survey the detection approaches developed for the two sub-tasks "external plagiarism detection" and "intrinsic plagiarism detection," and we report on their detailed evaluation based on the third revised edition of the PAN plagiarism corpus PAN-PC-11

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Overview of the 1st international competition on plagiarism detection

Author: Barron-Cedeno A.
Eiselt A.
Potthast M.
Rosso P.
Stein B.
Publication venue
Publication date: 01/01/2009
Field of study

The 1st International Competition on Plagiarism Detection, held in conjunction with the 3rd PAN workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse, brought together researchers from many disciplines around the exciting retrieval task of automatic plagiarism detection. The competition was divided into the subtasks external plagiarism detection and intrinsic plagiarism detection, which were tackled by 13 participating groups. An important by-product of the competition is an evaluation framework for plagiarism detection, which consists of a large-scale plagiarism corpus and detection quality measures. The framework may serve as a unified test environment to compare future plagiarism detection research. In this paper we describe the corpus design and the quality measures, survey the detection approaches developed by the participants, and compile the achieved performance results of the competitors

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

THE EFFECT OF DIFFERENT FOOTWEAR ON THE MYOELECTRIC ACTIVITY OF M. TIBIALIS POSTERIOR DURING TREADMILL RUNNING

Author: Brüggemann G.-P.
Goldmann J.-P.
Lersch C.
Potthast W.
Segesser B.
Publication venue: International Society of Biomechanics in Sports (ISBS)
Publication date: 02/11/2007
Field of study

Overload running injuries of the lower extremity, particularly the knee, are associated with excessive pronation of the foot resulting in tibial rotation (Nigg et al., 1995). M. tibialis posterior (TP) is shown to have an active influence on pronation and the medial longitudinal arch (Kaye & Jahss, 1991). Its functional role during running and interaction with footwear is still not clearly understood (Reber et al., 1993; O’Connor & Hamill, 2004). Therefore the purpose of this study is to investigate the influence of different footwear on the muscle’s EMG pattern

ISBS (International Society of Biomechanics in Sports): Conference Proceedings Archive

WIRE EMG OF FLEXOR HALLUCIS LONGUS DURING BAREFOOT AND SHOD RUNNING ON A TREADMILL: A PILOT STUDY

Author: Brüggemann G.-P.
Goldmann J.-P.
Lersch C.
Potthast W.
Segesser B.
Publication venue: International Society of Biomechanics in Sports (ISBS)
Publication date: 31/10/2007
Field of study

Excessive pronation is associated with overload injuries of the lower extremity (Nigg, 1995). The flexor hallucis longus (FHL) acts against the pronation of the calcaneus (Klein, 1996). The influence of different footwear on the activity of the FHL was neither measured in walking nor running. The purpose of this study was to investigate the activity of the FHL during different phases in stance of walking and running in different footwear conditions

ISBS (International Society of Biomechanics in Sports): Conference Proceedings Archive

Simplified modeling of EM field coupling to complex cable bundles

Author: B. Schetelig
J. Keghie
L.-O. Fichte
R. Kanyou Nana
S. Dickmann
S. Potthast
Publication venue: Copernicus Publications
Publication date: 01/10/2010
Field of study

In this contribution, the procedure "Equivalent Cable Bundle Method" is used for the simplification of large cable bundles, and it is extended to the application on differential signal lines. The main focus is on the reduction of twisted-pair cables. Furthermore, the process presented here allows to take into account cables with wires that are situated quite close to each other. The procedure is based on a new approach to calculate the geometry of the simplified cable and uses the fact that the line parameters do not uniquely correspond to a certain geometry. For this reason, an optimization algorithm is applied

Crossref

Directory of Open Access Journals