1,696 research outputs found

    Impact Of Content Features For Automatic Online Abuse Detection

    Full text link
    Online communities have gained considerable importance in recent years due to the increasing number of people connected to the Internet. Moderating user content in online communities is mainly performed manually, and reducing the workload through automatic methods is of great financial interest for community maintainers. Often, the industry uses basic approaches such as bad words filtering and regular expression matching to assist the moderators. In this article, we consider the task of automatically determining if a message is abusive. This task is complex since messages are written in a non-standardized way, including spelling errors, abbreviations, community-specific codes... First, we evaluate the system that we propose using standard features of online messages. Then, we evaluate the impact of the addition of pre-processing strategies, as well as original specific features developed for the community of an online in-browser strategy game. We finally propose to analyze the usefulness of this wide range of features using feature selection. This work can lead to two possible applications: 1) automatically flag potentially abusive messages to draw the moderator's attention on a narrow subset of messages ; and 2) fully automate the moderation process by deciding whether a message is abusive without any human intervention

    Abusive Language Detection in Online Conversations by Combining Content-and Graph-based Features

    Full text link
    In recent years, online social networks have allowed worldwide users to meet and discuss. As guarantors of these communities, the administrators of these platforms must prevent users from adopting inappropriate behaviors. This verification task, mainly done by humans, is more and more difficult due to the ever growing amount of messages to check. Methods have been proposed to automatize this moderation process, mainly by providing approaches based on the textual content of the exchanged messages. Recent work has also shown that characteristics derived from the structure of conversations, in the form of conversational graphs, can help detecting these abusive messages. In this paper, we propose to take advantage of both sources of information by proposing fusion methods integrating content-and graph-based features. Our experiments on raw chat logs show that the content of the messages, but also of their dynamics within a conversation contain partially complementary information, allowing performance improvements on an abusive message classification task with a final F-measure of 93.26%

    La politique de répartition géographique des effectifs médicaux au Québec

    Get PDF
    Un des objectifs du Ministère des Affaires sociales est d’assurer le plus possible à tous un égal accès aux soins de santé. Une évaluation quantitative de la répartition des médecins montre qu’il existe des disparités régionales importantes. Les médecins sont fortement concentrés dans les régions universitaires de Montréal, Québec et Sherbrooke, ce qui n’assure pas à la population de certaines régions périphériques l’accessibilité visée. Plusieurs solutions ont été envisagées à la suite de l’analyse des causes de ce déséquilibre régional des effectifs médicaux. Finalement, pour corriger cette situation, le Gouvernement a élaboré et mis en place une politique de répartition axée principalement sur une rémunération différente des médecins selon les lieux d’exercice. Quelques résultats préliminaires de cette politique sont présentés dans la dernière partie.One of the Ministry of Social Affairs' objectives is to insure, as much as possible, that everyone has equal access to medical care. A quantitative assessment of the distribution of doctors shows that there are significant regional disparities. Doctors are heavily concentrated in the university regions of Montréal, Québec City and Sherbrooke, which does not insure that the population of certain peripheral regions get the planned access to medical care. Several solutions have been considered, following the analysis of the regional unbalance of the medical staff. Finally, in order to correct this situation, the Government has developed and put in place a distribution policy mainly centered around a specific remuneration of doctors according to place of practice. Some preliminary results of this policy are presented in the last section.Uno de los objetivos del ministerio de asuntos sociales es de asegurar un acceso libre e igualitario a los servicios de salud. La informacion disponible muestra que la distribucion espacial de los médicos contiene disparidades regionales importantes. Hay grandes concentrationes en las regiones universitarias (Montréal, Québec y Sherbrooke) mientras que los médicos se vuelven escasos en las regiones periféricas. Se han considerado varias soluciones posibles. Finalmente, el gobierno ha puesto en practica una politica de re-distribucion sobre la base de una remuneracion diferencial, cuyos primeros resultados son presentados en la ultima parte

    Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

    Full text link
    The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wikipedia articles. One of the main difficulty in using topic model on huge data collection is related to the material resources (CPU time and memory) required for model estimate. To deal with this issue, we propose to build topic spaces from summarized documents. In this paper, we present a study of topic space representation in the context of big data. The topic space representation behavior is analyzed on different languages. Experiments show that topic spaces estimated from text summaries are as relevant as those estimated from the complete documents. The real advantage of such an approach is the processing time gain: we showed that the processing time can be drastically reduced using summarized documents (more than 60\% in general). This study finally points out the differences between thematic representations of documents depending on the targeted languages such as English or latin languages.Comment: 16 pages, 4 tables, 8 figure

    Identication-robust moment-based tests for Markov switching in autoregressive models

    Get PDF
    This paper develops tests of the null hypothesis of linearity in the context of autoregressive models with Markov-switching means and variances. These tests are robust to the identi!cation failures that plague conventional likelihood-based inference methods. The approach exploits the moments of normal mixtures implied by the regime-switching process and uses Monte Carlo test techniques to deal with the presence of an autoregressive component in the model speci!cation. The proposed tests have very respectable power in comparison with the optimal tests for Markov-switching parameters of Carrasco et al. (2014), and they are also quite attractive owing to their computational simplicity. The new tests are illustrated with an empirical application to an autoregressive model of USA output growth
    • …