9 research outputs found

    The use of big data and data mining in the investigation of criminal offences

    Get PDF
    The aim of this study was to determine the features and prospects of using Big Data and Data Mining in criminal proceedings. The research involved the methods of a systematic approach, descriptive analysis, systematic sampling, formal legal approach and forecasting. The object of using Big Data and Data Mining are various crimes, the common features of which are the seriousness and complexity of the investigation. The common tools of Big Data and Data Mining in crime investigation and crime forecasting as interrelated tasks were identified. The creation of databases is the result of the processing of data sources by Data Mining methods, each being distinguished by the specifics of use. The main risks of implementing Big Data and Data Mining are violations of human rights and freedoms. Improving the use of Big Data and Data Mining requires standardization of procedures with strict adherence to the fundamental ethical, organizational and procedural rules. The use of Big Data and Data Mining is a forensic innovation in the investigation of serious crimes and the creation of an evidence base for criminal justice. The prospects for widespread use of these methods involve the standardization of procedures based on ethical, organizational and procedural principles. It is appropriate to outline these procedures in framework practical recommendations, emphasizing the responsibility of officials in case of violation of the specified principles. The area of further research is the improvement of innovative technologies and legal regulation of their application

    Determinación de Criminales Potenciales en Análisis de Textos: Caso de Estudio

    Get PDF
    Esta investigación está orientada a clasificar textos usando Redes Neuronales Artificiales (RNA) específicamente el Perceptron Multicapa (PMC) con Técnicas básicas de palabras embebidas. La clasificación consiste en determinar ya sea que el texto tenga un contexto criminal o no por medio de reconocimiento de patrones. El PMC fue entrenado bajo entrenamiento supervisado y en un rango corto de vocabulario y registros de entrenamiento, cada uno de los cuales tiene una longitud máxima de 300 palabras para hacer procesos de clasificación. Analizar estos tipos de textos podría ayudar a las fuerzas de seguridad del gobierno, a los militares, etc. para fácilmente detectar gente que podría dañar a la población y predecir posibles ataques y prevenirlos. El software desarrollado necesita más técnicas de palabras embebidas, un vocabulario más grande y más registros de entrenamiento para ser más eficiente. El conjunto de datos consiste de dos clases principales que están organizadas como textos de tipo criminal y regular

    Crime prediction and monitoring in Porto, Portugal, using machine learning, spatial and text analytics

    Get PDF
    Crimes are a common societal concern impacting quality of life and economic growth. Despite the global decrease in crime statistics, specific types of crime and feelings of insecurity, have often increased, leading safety and security agencies with the need to apply novel approaches and advanced systems to better predict and prevent occurrences. The use of geospatial technologies, combined with data mining and machine learning techniques allows for significant advances in the criminology of place. In this study, official police data from Porto, in Portugal, between 2016 and 2018, was georeferenced and treated using spatial analysis methods, which allowed the identification of spatial patterns and relevant hotspots. Then, machine learning processes were applied for space-time pattern mining. Using lasso regression analysis, significance for crime variables were found, with random forest and decision tree supporting the important variable selection. Lastly, tweets related to insecurity were collected and topic modeling and sentiment analysis was performed. Together, these methods assist interpretation of patterns, prediction and ultimately, performance of both police and planning professionals

    Crime event prediction with dynamic features

    Get PDF
    Nowadays, Location-Based Social Networks (LBSN) collect a vast range of information which can help us to understand the regional dynamics (i.e. human mobility) across an entire city. LBSN provides unprecedented opportunities to tackle various social problems. In this work, we explore dynamic features derived from Foursquare check-in data in short-term crime event prediction with fine spatio-temporal granularity. While crime event prediction has been investigated widely due to its social importance, its success rate is far from satisfactory. The existing studies rely on relatively static features such as regional characteristics, demographic information and the topics obtained from tweets but very few studies focus on exploring human mobility through social media. In this study, we identify a number of dynamic features based on the research findings in Criminology, and report their correlations with different types of crime events. In particular, we observe that some types of crime events are more highly correlated to the dynamic features, e.g., Theft, Drug Offence, Fraud, Unlawful Entry and Assault than others e.g. Traffic Related Offence. A key challenge of the research is that the dynamic information is very sparse compared to the relatively static information. To address this issue, we develop a matrix factorization based approach to estimate the missing dynamic features across the city. Interestingly, the estimated dynamic features still maintain the correlation with crime occurrence across different types. We evaluate the proposed methods in different time intervals. The results verify that the crime prediction performance can be significantly improved with the inclusion of dynamic features across different types of crime events

    Prédiction du temps de réparation à la suite d'un accident automobile et optimisation en utilisant de l'information contextuelle

    Get PDF
    Ce mémoire a pour but d'explorer l'utilisation de données de contexte, notamment spatial, pour prédire de la durée que va prendre un garage pour effectuer les réparations à la suite d'un accident automobile. Le contexte réfère à l'environnement dans lequel évolue le garage. Il s'agit donc de développer une approche permettant de prédire une caractéristique précise en utilisant notamment de l'information historique. L'information historique comprend des composantes spatiales, comme des adresses, qui vont être exploitées afin de générer de nouvelles informations relatives à la localisation des garages automobiles. L'utilisation des données accumulées sur les réclamations automobiles va permettre d'établir un niveau initial de prédiction qu'il est possible d'atteindre avec de l'apprentissage supervisé. En ajoutant ensuite petit à petit de l'information de contexte spatial dans lequel évolue le garage responsable des réparations, de nouveaux niveaux de prédiction seront atteints. Il sera alors possible d'évaluer la pertinence de considérer le contexte spatial dans un problème de prédiction comme celui des temps de réparations des véhicules accidentés en comparant ces niveaux de prédiction précédemment cités. L'utilisation de données historiques pour prédire une nouvelle donnée se fait depuis plusieurs années à l'aide d'une branche de l'intelligence artificielle, à savoir : l'apprentissage machine. Couplées à cette méthode d'analyse et de production de données, des analyses spatiales vont être présentées et introduites pour essayer de modéliser le contexte spatial. Pour quantifier l'apport d'analyses spatiales et de données localisées dans un problème d'apprentissage machine, il sera question de comparer l'approche n'utilisant pas d'analyse spatiale pour produire de nouvelles données, avec une approche similaire considérant cette fois-ci les données de contexte spatial dans lequel évolue le garage. L'objectif est de voir l'impact que peut avoir une contextualisation spatiale sur la prédiction d'une variable quantitative.The purpose of this paper is to explore the use of context data, particularly spatial context, to predict how long it will take a garage to complete repairs following an automobile accident. The context refers to the environment in which the garage evolves. It is therefore a question of developing an approach that makes it possible to predict a precise characteristic by using historical information in particular. The historical information includes spatial components, such as addresses, which will be exploited to generate new information about the location of car garages. The use of the accumulated data on car claims will allow to establish an initial level of prediction that can be reached with supervised learning. By then gradually adding information about the spatial context in which the garage responsible for the repairs evolves, new levels of prediction will be reached. It will then be possible to evaluate the relevance of considering the spatial context in a prediction problem such as that of the repair times of accidented vehicles by comparing these prediction levels previously mentioned. The use of historical data to predict new data has been done for several years with the help of a branch of artificial intelligence, namely: machine learning. Coupled with this method of data analysis and production, spatial analyses will be presented and introduced to try to model the spatial context. To quantify the contribution of spatial analysis and localized data in a machine learning problem, we will compare the approach that does not use spatial analysis to produce new data with a similar approach that considers the spatial context data in which the garage evolves. The objective is to see the impact that spatial contextualization can have on the prediction of a quantitative variable

    Real Time Crime Prediction Using Social Media

    Get PDF
    There is no doubt that crime is on the increase and has a detrimental influence on a nation's economy despite several attempts of studies on crime prediction to minimise crime rates. Historically, data mining techniques for crime prediction models often rely on historical information and its mostly country specific. In fact, only a few of the earlier studies on crime prediction follow standard data mining procedure. Hence, considering the current worldwide crime trend in which criminals routinely publish their criminal intent on social media and ask others to see and/or engage in different crimes, an alternative, and more dynamic strategy is needed. The goal of this research is to improve the performance of crime prediction models. Thus, this thesis explores the potential of using information on social media (Twitter) for crime prediction in combination with historical crime data. It also figures out, using data mining techniques, the most relevant feature engineering needed for United Kingdom dataset which could improve crime prediction model performance. Additionally, this study presents a function that could be used by every state in the United Kingdom for data cleansing, pre-processing and feature engineering. A shinny App was also use to display the tweets sentiment trends to prevent crime in near-real time.Exploratory analysis is essential for revealing the necessary data pre-processing and feature engineering needed prior to feeding the data into the machine learning model for efficient result. Based on earlier documented studies available, this is the first research to do a full exploratory analysis of historical British crime statistics using stop and search historical dataset. Also, based on the findings from the exploratory study, an algorithm was created to clean the data, and prepare it for further analysis and model creation. This is an enormous success because it provides a perfect dataset for future research, particularly for non-experts to utilise in constructing models to forecast crime or conducting investigations in around 32 police districts of the United Kingdom.Moreover, this study is the first study to present a complete collection of geo-spatial parameters for training a crime prediction model by combining demographic data from the same source in the United Kingdom with hourly sentiment polarity that was not restricted to Twitter keyword search. Six unique base models that were frequently mentioned in the previous literature was selected and used to train stop-and-search historical crime dataset and evaluated on test data and finally validated with dataset from London and Kent crime datasets.Two different datasets were created from twitter and historical data (historical crime data with twitter sentiment score and historical data without twitter sentiment score). Six of the most prevalent machine learning classifiers (Random Forest, Decision Tree, K-nearest model, support vector machine, neural network and naïve bayes) were trained and tested on these datasets. Additionally, hyperparameters of each of the six models developed were tweaked using random grid search. Voting classifiers and logistic regression stacked ensemble of different models were also trained and tested on the same datasets to enhance the individual model performance.In addition, two combinations of stack ensembles of multiple models were constructed to enhance and choose the most suitable models for crime prediction, and based on their performance, the appropriate prediction model for the UK dataset would be selected. In terms of how the research may be interpreted, it differs from most earlier studies that employed Twitter data in that several methodologies were used to show how each attribute contributed to the construction of the model, and the findings were discussed and interpreted in the context of the study. Further, a shiny app visualisation tool was designed to display the tweets’ sentiment score, the text, the users’ screen name, and the tweets’ vicinity which allows the investigation of any criminal actions in near-real time. The evaluation of the models revealed that Random Forest, Decision Tree, and K nearest neighbour outperformed other models. However, decision trees and Random Forests perform better consistently when evaluated on test data
    corecore