259 research outputs found
Wikipedia vandalism detection: combining natural language, metadata, and reputation features
Wikipedia is an online encyclopedia which anyone can edit.
While most edits are constructive, about 7% are acts of vandalism. Such
behavior is characterized by modifications made in bad faith; introducing
spam and other inappropriate content.
In this work, we present the results of an effort to integrate three of the
leading approaches to Wikipedia vandalism detection: a spatio-temporal
analysis of metadata (STiki), a reputation-based system (WikiTrust),
and natural language processing features. The performance of the resulting
joint system improves the state-of-the-art from all previous methods
and establishes a new baseline for Wikipedia vandalism detection. We
examine in detail the contribution of the three approaches, both for the
task of discovering fresh vandalism, and for the task of locating vandalism
in the complete set of Wikipedia revisions.The authors from Universitat Politècnica de València thank also the MICINN research project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i). UPenn contributions were supported in part by ONR MURI N00014-07-1-0907. This research was partially supported by award 1R01GM089820-01A1 from the National Institute Of General Medical Sciences, and by ISSDM, a UCSC-LANL educational collaboration.Adler, BT.; Alfaro, LD.; Mola Velasco, SM.; Rosso, P.; West, AG. (2011). Wikipedia vandalism detection: combining natural language, metadata, and reputation features. En Computational Linguistics and Intelligent Text Processing. Springer Verlag (Germany). 6609:277-288. https://doi.org/10.1007/978-3-642-19437-5_23S2772886609Wikimedia Foundation: Wikipedia (2010) [Online; accessed December 29, 2010]Wikimedia Foundation: Wikistats (2010) [Online; accessed December 29, 2010]Potthast, M.: Crowdsourcing a Wikipedia Vandalism Corpus. In: Proc. of the 33rd Intl. ACM SIGIR Conf. (SIGIR 2010). ACM Press, New York (July 2010)Gralla, P.: U.S. senator: It’s time to ban Wikipedia in schools, libraries, http://blogs.computerworld.com/4598/u_s_senator_its_time_to_ban_wikipedia_in_schools_libraries [Online; accessed November 15, 2010]Olanoff, L.: School officials unite in banning Wikipedia. Seattle Times (November 2007)Mola-Velasco, S.M.: Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops, Padua, Italy, September 22-23 (2010)Adler, B., de Alfaro, L., Pye, I.: Detecting Wikipedia Vandalism using WikiTrust. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops, Padua, Italy, September 22-23 (2010)West, A.G., Kannan, S., Lee, I.: Detecting Wikipedia Vandalism via Spatio-Temporal Analysis of Revision Metadata. In: EUROSEC 2010: Proceedings of the Third European Workshop on System Security, pp. 22–28 (2010)West, A.G.: STiki: A Vandalism Detection Tool for Wikipedia (2010), http://en.wikipedia.org/wiki/Wikipedia:STikiWikipedia: User: AntiVandalBot – Wikipedia, http://en.wikipedia.org/wiki/User:AntiVandalBot (2010) [Online; accessed November 2, 2010]Wikipedia: User:MartinBot – Wikipedia (2010), http://en.wikipedia.org/wiki/User:MartinBot [Online; accessed November 2, 2010]Wikipedia: User:ClueBot – Wikipedia (2010), http://en.wikipedia.org/wiki/User:ClueBot [Online; accessed November 2, 2010]Carter, J.: ClueBot and Vandalism on Wikipedia (2008), http://www.acm.uiuc.edu/~carter11/ClueBot.pdf [Online; accessed November 2, 2010]RodrĂguez Posada, E.J.: AVBOT: detecciĂłn y correcciĂłn de vandalismos en Wikipedia. NovATIca (203), 51–53 (2010)Potthast, M., Stein, B., Gerling, R.: Automatic Vandalism Detection in Wikipedia. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 663–668. Springer, Heidelberg (2008)Smets, K., Goethals, B., Verdonk, B.: Automatic Vandalism Detection in Wikipedia: Towards a Machine Learning Approach. In: WikiAI 2008: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 43–48. AAAI Press, Menlo Park (2008)Druck, G., Miklau, G., McCallum, A.: Learning to Predict the Quality of Contributions to Wikipedia. In: WikiAI 2008: Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 7–12. AAAI Press, Menlo Park (2008)Itakura, K.Y., Clarke, C.L.: Using Dynamic Markov Compression to Detect Vandalism in the Wikipedia. In: SIGIR 2009: Proc. of the 32nd Intl. ACM Conference on Research and Development in Information Retrieval, pp. 822–823 (2009)Chin, S.C., Street, W.N., Srinivasan, P., Eichmann, D.: Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models. In: WICOW 2010: Proc. of the 4th Workshop on Information Credibility on the Web (April 2010)Zeng, H., Alhoussaini, M., Ding, L., Fikes, R., McGuinness, D.: Computing Trust from Revision History. In: Intl. Conf. on Privacy, Security and Trust (2006)McGuinness, D., Zeng, H., da Silva, P., Ding, L., Narayanan, D., Bhaowal, M.: Investigation into Trust for Collaborative Information Repositories: A Wikipedia Case Study. In: Proc. of the Workshop on Models of Trust for the Web (2006)Adler, B., de Alfaro, L.: A Content-Driven Reputation System for the Wikipedia. In: WWW 2007: Proceedings of the 16th International World Wide Web Conference. ACM Press, New York (2007)Belani, A.: Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach. Computing Research Repository (CoRR) abs/1001.0700 (2010)Potthast, M., Stein, B., Holfeld, T.: Overview of the 1st International Competition on Wikipedia Vandalism Detection. In: Braschler, M., Harman, D. (eds.) Notebook Papers of CLEF 2010 LABs and Workshops, Padua, Italy, September 22-23 (2010)Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: ICML 2006: Proc. of the 23rd Intl. Conf. on Machine Learning (2006
Recommended from our members
Observation error statistics for Doppler radar radial wind superobservations assimilated into the DWD COSMO-KENDA system
Currently in operational numerical weather prediction (NWP) the density of high-resolution observations, such as Doppler radar radial winds (DRWs), is severely reduced in part to avoid violating the assumption of uncorrelated observation errors. To improve the quantity of observations used and the impact that they have on the forecast requires an accurate specification of the observation uncertainties. Observation uncertainties can be estimated using a simple diagnostic that utilises the statistical averages of observation-minus-background and observation-minus-analysis residuals. We are the first to use a modified form of the diagnostic to estimate spatial correlations for observations used in an operational ensemble data assimilation system. The uncertainties for DRW superobservations assimilated into the Deutscher Wetterdienst convection-permitting NWP model are estimated and compared to previous uncertainty estimates for DRWs. The new results show that most diagnosed standard deviations are smaller than those used in the assimilation, hence it may be feasible assimilate DRWs using reduced error standard deviations. However, some of the estimated standard deviations are considerably larger than those used in the assimilation; these large errors highlight areas where the observation processing system may be improved. The error correlation length scales are larger than the observation separation distance and influenced by both the superobbing procedure and observation operator. This is supported by comparing these results to our previous study using Met Office data. Our results suggest that DRW error correlations may be reduced by improving the superobbing procedure and observation operator; however, any remaining correlations should be accounted for in the assimilation
Monte Carlo Procedure for Protein Design
A new method for sequence optimization in protein models is presented. The
approach, which has inherited its basic philosophy from recent work by Deutsch
and Kurosky [Phys. Rev. Lett. 76, 323 (1996)] by maximizing conditional
probabilities rather than minimizing energy functions, is based upon a novel
and very efficient multisequence Monte Carlo scheme. By construction, the
method ensures that the designed sequences represent good folders
thermodynamically. A bootstrap procedure for the sequence space search is
devised making very large chains feasible. The algorithm is successfully
explored on the two-dimensional HP model with chain lengths N=16, 18 and 32.Comment: 7 pages LaTeX, 4 Postscript figures; minor change
Predicting the Next Best View for 3D Mesh Refinement
3D reconstruction is a core task in many applications such as robot
navigation or sites inspections. Finding the best poses to capture part of the
scene is one of the most challenging topic that goes under the name of Next
Best View. Recently, many volumetric methods have been proposed; they choose
the Next Best View by reasoning over a 3D voxelized space and by finding which
pose minimizes the uncertainty decoded into the voxels. Such methods are
effective, but they do not scale well since the underlaying representation
requires a huge amount of memory. In this paper we propose a novel mesh-based
approach which focuses on the worst reconstructed region of the environment
mesh. We define a photo-consistent index to evaluate the 3D mesh accuracy, and
an energy function over the worst regions of the mesh which takes into account
the mutual parallax with respect to the previous cameras, the angle of
incidence of the viewing ray to the surface and the visibility of the region.
We test our approach over a well known dataset and achieve state-of-the-art
results.Comment: 13 pages, 5 figures, to be published in IAS-1
Recommended from our members
Assimilation of 3D radar reflectivities with an ensemble Kalman filter on the convective scale
An ensemble data assimilation system for 3D radar reflectivity data is introduced for the convection-permitting numerical weather prediction model of the COnsortium for Small-scale MOdelling (COSMO) based on the Kilometre-scale ENsemble Data Assimilation system (KENDA), developed by Deutscher Wetterdienst and its partners. KENDA provides a state-of-the-art ensemble data assimilation system on the convective scale for operational data assimilation and forecasting based on the Local Ensemble Transform Kalman Filter (LETKF). In this study, the Efficient Modular VOlume RADar Operator is applied for the assimilation of radar reflectivity data to improve short-term predictions of precipitation. Both deterministic and ensemble forecasts have been carried out. A case-study shows that the assimilation of 3D radar reflectivity data clearly improves precipitation location in the analysis and significantly improves forecasts for lead times up to 4 h, as quantified by the Brier Score and the Continuous Ranked Probability Score. The influence of different update rates on the noise in terms of surface pressure tendencies and on the forecast quality in general is investigated. The results suggest that, while high update rates produce better analyses, forecasts with lead times of above 1 h benefit from less frequent updates. For a period of seven consecutive days, assimilation of radar reflectivity based on the LETKF is compared to that of DWD's current operational radar assimilation scheme based on latent heat nudging (LHN). It is found that the LETKF competes with LHN, although it is still in an experimental phase
Retention of improvement in gait stability over 14 weeks due to trip-perturbation training is dependent on perturbation dose
© 2018 Elsevier Ltd Perturbation training is an emerging approach to reduce fall risk in the elderly. This study examined potential differences in retention of improvements in reactive gait stability over 14 weeks resulting from unexpected trip-like gait perturbations. Twenty-four healthy middle-aged adults (41–62 years) were assigned randomly to either a single perturbation group (SINGLE, n = 9) or a group subjected to eight trip-like gait perturbations (MULTIPLE, n = 15). While participants walked on a treadmill a custom-built brake-and-release system was used to unexpectedly apply resistance during swing phase to the lower right limb via an ankle strap. The anteroposterior margin of stability (MoS) was calculated as the difference between the anterior boundary of the base of support and the extrapolated centre of mass at foot touchdown for the perturbed step and the first recovery step during the first and second (MULTIPLE group only) perturbation trials for the initial walking session and retention-test walking 14 weeks later. Group MULTIPLE retained the improvements in reactive gait stability to the perturbations (increased MoS at touchdown for perturbed and first recovery steps; p < 0.01). However, in group SINGLE no differences in MoS were detected after 14 weeks compared to the initial walking session. These findings provide evidence for the requirement of a threshold trip-perturbation dose if adaptive changes in the human neuromotor system over several months, aimed at the improvement in fall-resisting skills, are to occur
Passenger car data – a new source of real-time weather information for nowcasting, forecasting, and road safety
PresentaciĂłn realizada en la 3rd European Nowcasting Conference, celebrada en la sede central de AEMET en Madrid del 24 al 26 de abril de 2019
- …