113 research outputs found

    Detecting and Monitoring Hate Speech in Twitter

    Get PDF
    Social Media are sensors in the real world that can be used to measure the pulse of societies. However, the massive and unfiltered feed of messages posted in social media is a phenomenon that nowadays raises social alarms, especially when these messages contain hate speech targeted to a specific individual or group. In this context, governments and non-governmental organizations (NGOs) are concerned about the possible negative impact that these messages can have on individuals or on the society. In this paper, we present HaterNet, an intelligent system currently being used by the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that identifies and monitors the evolution of hate speech in Twitter. The contributions of this research are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification approaches based on different document representation strategies and text classification models. (4) The best approach consists of a combination of a LTSM+MLP neural network that takes as input the tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge

    A mathematical pre-disaster model with uncertainty and multiple criteria for facility location and network fortification

    Get PDF
    Disasters have catastrophic effects on the affected population, especially in developing and underdeveloped countries. Humanitarian Logistics models can help decision-makers to efficiently and effectively warehouse and distribute emergency goods to the affected population, to reduce casualties and suffering. However, poor planning and structural damage to the transportation infrastructure could hamper these efforts and, eventually, make it impossible to reach all the affected demand centers. In this paper, a pre-disaster Humanitarian Logistics model is presented that jointly optimizes the prepositioning of aid distribution centers and the strengthening of road sections to ensure that as much affected population as possible can efficiently get help. The model is stochastic in nature and considers that the demand in the centers affected by the disaster and the state of the transportation network are random. Uncertainty is represented through scenarios representing possible disasters. The methodology is applied to a real-world case study based on the 2018 storm system that hit the Nampula Province in Mozambique

    Applications of data science in policing: VeriPol as an investigation support tool

    Get PDF
    Data Science is an interdisciplinary field involving the development of processes and systems to extract knowledge and understanding from data in different formats and from different sources. Considering the large amount of data generated and managed by public safety agencies, Data Science applications in the police sector are numerous. More important are the advantages that the different applications of Data Science could provide the police on issues such as the optimization of resources, the increase of efficiency and effectiveness, the modernization and its exemplariness when compared with other institutions. In this paper we present different potential applications fields of Data Science for the police. In addition, we focus on the case of VeriPol, a tool for automatic detection of false violent robbery reports, currently under development by the Spanish National Police. In particular, we illustrate a detailed analysis of the results of a recent pilot study aimed at assessing the effectiveness of the tool

    Back to the basics: a quantitative analysis of statistical and graph-based term weighting schemes for keyword extraction

    Get PDF
    Term weighting schemes are widely used in Natural Language Processing and Information Retrieval. In particular, term weighting is the basis for keyword extraction. However, there are relatively few evaluation studies that shed light about the strengths and shortcomings of each weighting scheme. In fact, in most cases researchers and practitioners resort to the well-known tf-idf as default, despite the existence of other suitable alternatives, including graph-based models. In this paper, we perform an exhaustive and large-scale empirical comparison of both statistical and graph-based term weighting methods in the context of keyword extraction. Our analysis reveals some interesting findings such as the advantages of the less-known lexical specificity with respect to tf-idf, or the qualitative differences between statistical and graph-based methods. Finally, based on our findings we discuss and devise some suggestions for practitioner

    Evolution and study of a copycat effect in intimate partner homicides: A lesson from Spanish femicides

    Get PDF
    Objectives This paper focuses on the issue of intimate partner violence and, specifically, on the distribution of femicides over time and the existence of copycat effects. This is the subject of an ongoing debate often triggered by the social alarm following multiple intimate partner homicides (IPHs) occurring in a short span of time. The aim of this research is to study the evolution of IPHs and provide a far-reaching answer by rigorously analyzing and searching for patterns in data on femicides. Methods The study analyzes an official dataset, provided by the system VioGén of the Secretaría de Estado de Seguridad (Spanish State Secretariat for Security), including all the femicides occurred in Spain in 2007-2017. A statistical methodology to identify temporal interdepen-dencies in count time series is proposed and applied to the dataset. The same methodology can be applied to other contexts. Results There has been a decreasing trend in the number of femicides per year. No interdependen-cies among the temporal distribution of femicides are observed. Therefore, according to data, the existence of copycat effect in femicides cannot be claimed. Conclusions Around 2011 there was a clear change in the average number of femicides which has not picked up. Results allow for an informed answer to the debate on copycat effect in Spanish femicides. The planning of femicides prevention activities should not be a reaction to a perceived increase in their occurrence. As a copycat effect is not detected in the studied time period, there is no evidence supporting the need to censor media reports on femicides.The work by Torrecilla has been partially supported by Spanish Grant MTM2016-78751-P. The research of Liberatore has been supported by the Government of Spain, grant MTM2015-65803-R, and by the Government of Madrid, grant S2013/ICE-284

    Equity in the Police Districting Problem: balancing territorial and racial fairness in patrolling operations

    Get PDF
    Objectives The Police Districting Problem concerns the definition of patrol districts that distribute police resources in a territory in such a way that high-risk areas receive more patrolling time than low-risk areas, according to a principle of territorial fairness. This results in patrolling configurations that are efficient and effective at controlling crime but that, at the same time, might exacerbate racial disparity in police stops and arrests. In this paper, an Equitable Police Districting Problem that combines crime-reduction effectiveness with racial fairness is proposed. The capability of this model in designing patrolling configurations that find a balance between territorial and racial fairness is assessed. Also, the trade-off between these two criteria is analyzed. Methods The Equitable Police Districting Problem is defined as a mixed-integer program. The objective function is formulated using Compromise Programming and Goal Programming. The model is validated on a real-world case study on the Central District of Madrid, Spain, and its solutions are compared to standard patrolling configurations currently used by the police. Results A trade-off between racial fairness and crime control is detected. However, the experiments show that including the proposed racial criterion in the optimization of patrol districts greatly improves racial fairness with limited detriment to the policing effectiveness. Also, the model produces solutions that dominate the patrolling configurations currently in use by the police. Conclusions The results show that the model successfully provides a quantitative evaluation of the trade-off between the criteria and is capable of defining patrolling configurations that are efficient in terms of both racial and territorial fairness

    Applications of data science in policing: VeriPol as an investigation support tool

    Get PDF
    Data Science is an interdisciplinary field involving the development of processes and systems to extract knowledge and understanding from data in different formats and from different sources. Considering the large amount of data generated and managed by public safety agencies, Data Science applications in the police sector are numerous. More important are the advantages that the different applications of Data Science could provide the police on issues such as the optimization of resources, the increase of efficiency and effectiveness, the modernization and its exemplariness when compared with other institutions. In this paper we present different potential applications fields of Data Science for the police. In addition, we focus on the case of VeriPol, a tool for automatic detection of false violent robbery reports, currently under development by the Spanish National Police. In particular, we illustrate a detailed analysis of the results of a recent pilot study aimed at assessing the effectiveness of the tool

    Enhancing information retrieval in fact extraction and verification

    Get PDF
    Modern fact verification systems have distanced themselves from the black box paradigm by providing the evidence used to infer their veracity judgments. Hence, evidence-backed fact verification systems’ performance heavily depends on the capabilities of their retrieval component to identify these facts. A popular evaluation benchmark for these systems is the FEVER task, which consists of determining the veracity of short claims using sentences extracted from Wikipedia. In this paper, we present a novel approach to the the retrieval steps of the FEVER task leveraging the graph structure of Wikipedia. The retrieval models surpass state of the art results at both sentence and document level. Additionally, we show that by feeding our retrieved evidence to the best-performing textual entailment model, we set a new state of the art in the FEVER competition

    Towards social fairness in smart policing: leveraging territorial, racial, and workload fairness in the police districting problem

    Get PDF
    Recent events (e.g., George Floyd protests) have shown the impact that inequality in policing can have on society. Thus, police operations should be planned and designed taking into account the interests of three main groups of directly affected stakeholders (i.e., general population, minorities, and police agents) to pursue fairness. Most models presented so far in the literature failed at this, optimizing cost efficiency or operational effectiveness instead while disregarding other social goals. In this paper, a Smart Policing model that produces operational patrolling districts and includes territorial, racial, and workload fairness criteria is proposed. The patrolling configurations are designed according to the territorial distribution of crime risk and population subgroups, while equalizing the total risk exposure across the districts, according to the preferences of a decision-maker. The model is formulated as a multi-objective mixed-integer program. Computational experiments on randomly generated data are used to empirically draw insights into the relationship between the fairness criteria considered. Finally, the model is tested and validated on a real-world dataset about the Central District of Madrid (Spain). Experiments show that the model identifies solutions that dominate the current patrolling configuration used
    corecore