14 research outputs found

    Point set signature and algorithm of classifications on its basis

    Get PDF
    На данный момент существует большое количество задач по автоматизированной обработке многомерных данных, например, классификация, кластеризация, прогнозирование, задачи управления сложными объектами. Соответственно, возникает необходимость в развитии математического и алгоритмического обеспечения для решения возникающих задач. Целью исследования является развитие алгоритмов классификации точечных множеств на основе их пространственного распределения. В работе предлагается рассматривать данные как точки в многомерном метрическом пространстве. В работе рассмотрены подходы к описанию характеристик точечных множеств в пространствах высокой размерности и предлагается подход к описанию точечного множества на основе сигнатур, которые представляют собой характеристику заполненности точечного множества на основе расширения понятия пространственного хеширования. Обобщенный подход к вычислению сигнатур точечных множеств заключается в разбиении пространства, занимаемого множеством на регулярную сетку с помощью метода пространственного хеширования, вычисления геометрических характеристик множества в полученных ячейках и определения наиболее заполненных ячеек по каждому из пространственных измерений. Предлагается новый подход к классификации на основе сигнатур множества, который заключается в нахождении сигнатур для точек с известным значением принадлежности к некоторым классам, а для новых точек вычисляется расстояние от хеша точки до сигнатуры каждого из известных множеств, на основе чего определяется наиболее вероятный класс точки. В качестве используемых метрик предлагаются Евклидово расстояние и метрика городских кварталов. В работе проведён сравнительный анализ используемых метрик с точки зрения точности классификации. Преимуществами предложенного подхода являются простота вычислений и высокая степень точности классификации для равномерно распределенных точек. Представленный алгоритм реализован в виде программного приложения на языке Python с использованием библиотеки NumPy. Также рассмотрены варианты использования предложенного подхода для задач с не числовыми данными, такими как строковые и булевы значения. Для таких данных предложено использовать метрику Хэмминга, проведённые эксперименты показали работоспособность алгоритма для таких типов данных.There are many unsolved problems in the field of automatic multi-dimensional data processing, for example, classification, clustering, regression, and control of complex objects. This leads to the need of development of mathematical and algorithmical background for such problems. In our research we aim to development of classification algorithms of point sets based on their spatial distribution. We propose to consider data as points in multi-dimensional metric space. The approaches to describe point set features in high dimensional spaces are viewed. The algorithm of describing of point set based on their signatures, that are spatial distribution of point set is considered. In our approach we extend spatial hashing technique. The generalized method of computation of point set signatures is to split space, occupied by point set into regular grid by the spatial hashing algorithm, then we evaluate geometrical characteristics of the set in cells of the grid and define cells, that contain most of the points for the all of coordinate axis. The new approach to classification by means of point set signatures is developed that is to find signatures of known points with the classes defined and then we compute spatial hashes for unknown points and their distance to the signatures of classes. The probable class of the tested point is defined by the minimal distance among all distances to each signature. To define distance in our approach we use Manhattan and Euclidean metric. The comparative study of impact of metrics used to the classification error is provided. The main advantage of our method is computation simplicity and low classification error for evenly distributed points. Prototype implementation of our algorithm was written in order to test this algorithm for practical classification applications. The implementation was coded in Python with use NumPy library. The use of our algorithm to the classification of non-numerical data such as texts and booleans is viewed. For such data types we propose use of Hamming distance and experiments done show practical viability for such data types

    Crime Prediction Using Machine Learning

    Get PDF
    Predikce kriminality může v praxi významně zlepšit strategické rozmístění policejních hlídek ve městě, což pomáhá prevenci před vznikem kriminálních činů. Strojové učení je jedna z nejpoužívanějších metod pro predikci kriminality. Je však potřeba stále porovnávat různé typy algoritmů a postupy pro získání nejlepších výsledků. Tato práce porovnává několik druhů algoritmů. Pro učení modelů byla použita data poskytnutá Policí České republiky (PČR) za roky 2020 až 2021 na území města Ostravy. Do modelů vstupují vybrané kategorie trestných činů: krádeže, krádeže vloupáním, jiná majetková trestná činnost a přestupky proti majetku dle §50. V práci bylo porovnáno několik metod pro převzorkování nevybalancovaného datasetu. Jako nejlepší metoda byla zvolena SMOTETomek. Bylo zjištěno, že komplexnější algoritmy dosahují přesnějších výsledků predikce, například boostovací rozhodovací stromy nebo neuronová síť.In practice criminality prediction can significantly improve strategic positioning of police patrol in the city, which helps prevent crime from occurring. Machine learning is one of the most widely used method for this problem. However, there is still need to keep comparing various types of algorithms and approaches to get better results. This thesis compares several types of algorithms. Models was learned from data provided by Police of Czech Republic (PČR) for the years 2020 and 2021 on the territory of the city Ostrava. Only selected categories of crimes are entered into the models: theft, burglary, other property crimes and offences against property according to §50. Several methods for resampling the unbalanced dataset were compared in this paper. SMOTETomek was chosen as the best method. It was found that more complex algorithms, such as boosting decision trees or neural networks yield more effective results.548 - Katedra geoinformatikyvelmi dobř

    Detecting and Monitoring Hate Speech in Twitter

    Get PDF
    Social Media are sensors in the real world that can be used to measure the pulse of societies. However, the massive and unfiltered feed of messages posted in social media is a phenomenon that nowadays raises social alarms, especially when these messages contain hate speech targeted to a specific individual or group. In this context, governments and non-governmental organizations (NGOs) are concerned about the possible negative impact that these messages can have on individuals or on the society. In this paper, we present HaterNet, an intelligent system currently being used by the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that identifies and monitors the evolution of hate speech in Twitter. The contributions of this research are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification approaches based on different document representation strategies and text classification models. (4) The best approach consists of a combination of a LTSM+MLP neural network that takes as input the tweet’s word, emoji, and expression tokens’ embeddings enriched by the tf-idf, and obtains an area under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Union’s Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge

    Data Mining and Predictive Policing

    Get PDF
    This paper focuses on the operation and utilization of predictive policing software that generates spatial and temporal hotspots. There is a literature review that evaluates previous work surrounding the topics branched from predictive policing. It dissects two different crime datasets for San Francisco, California and Chicago, Illinois. Provided, is an in depth comparison between the datasets using both statistical analysis and graphing tools. Then, it shows the application of the Apriori algorithm to re-enforce the formation of possible hotspots pointed out in a actual predictive policing software. To further the analysis, targeted demographics of the study were evaluated to create a snapshot of the factors that have attributed to the safety of the neighborhoods. The results of this study can be used to create solutions for long term crime reduction by adding green spaces and community planning in areas with high crime rates and heavy environmental neglect

    Graph deep learning model for network-based predictive hotspot mapping of sparse spatio-temporal events

    Get PDF
    The predictive hotspot mapping of sparse spatio-temporal events (e.g., crime and traffic accidents) aims to forecast areas or locations with higher average risk of event occurrence, which is important to offer insight for preventative strategies. Although a network-based structure can better capture the micro-level variation of spatio-temporal events, existing deep learning methods of sparse events forecasting are either based on area or grid units due to the data sparsity in both space and time, and the complex network topology. To overcome these challenges, this paper develops the first deep learning (DL) model for network-based predictive mapping of sparse spatio-temporal events. Leveraging a graph-based representation of the network-structured data, a gated localised diffusion network (GLDNet) is introduced, which integrating a gated network to model the temporal propagation and a novel localised diffusion network to model the spatial propagation confined by the network topology. To deal with the sparsity issue, we reformulate the research problem as an imbalance regression task and employ a weighted loss function to train the DL model. The framework is validated on a crime forecasting case of South Chicago, USA, which outperforms the state-of-the-art benchmark by 12% and 25% in terms of the mean hit rate at 10% and 20% coverage level, respectively

    Suraksha: Spatio-Temporal Crime Forecasting and Micro-Location Analysis

    Get PDF
    Suraksha, a spatiotemporal crime prediction system, designed to elevate crime prevention with precise insights, empowering law enforcement for a safer tomorrow. Utilizing vast datasets, machine learning, and GIS, it forecasts crime hotspots by incorporating Chicago's extensive crime statistics. Addressing both precision and ethical considerations, Suraksha achieves RMSE values of 0.0874 (latitude) and 0.0602 (longitude), marking a leap in predictive policing. This pioneering approach aims to transform public safety by proactively combating crime, ensuring community well-being through innovative data-driven strategies

    Geospatial-based data and knowledge driven approaches for burglary crime susceptibility mapping in urban areas

    Get PDF
    The Damansara-Penchala region in Malaysia, is well-known for its high frequency of burglary crime and monetary loss based on the 2011-2016 geospatial burglary data provided by the Polis Diraja Malaysia (PDRM). As such, in order to have a better understanding of the components which influenced the burglary crime incidences in this area, this research aims at developing a geospatial-based burglary crime susceptibility mapping in this urban area. The spatial indicator maps was developed from the burglary data, census data and building footprint data. The initial phase of research focused on the development of the spatial indicators that influence the susceptibility of building towards the burglary crime. The indicators that formed the variable of susceptibility were first enlisted from the literature review. They were later narrowed down to the 18 indicators that were marked as important via the interview sessions with police officers and burglars. The burglary susceptibility mapping was done based on data-driven and knowledge-driven approaches. The data-driven burglary susceptibility maps were developed using bivariate statistics approach of Information Value Modelling (IVM), machine learning approach of Support Vector Machine (SVM) and Artificial Neural Network (ANN). Meanwhile, the knowledge-driven burglary susceptibility maps were developed using Relative Vulnerability Index (RVI) based on the input from experts. In order to obtain the best results, different parameter settings and indicators manipulation were established in the susceptibility modelling process. Both susceptibility modelling approaches were compared and validated with the same independent validation dataset using several accuracy assessment approaches of Area Under Curve - Receiver Operator Characteristic (AUC-ROC curve) and correlation matrix of True Positive and True Negative. The matrix is used to calculate the sensitivity, specificity and accuracy of the models. The performance of ANN and SVM were found to be close to one another with a sensitivity of 91.74% and 88.46%, respectively. However, in terms of specificity, SVM had a higher percentage than ANN at 57.59% and 40.46% respectively. In addition, the error term in classifying high frequency burglary building was also included as part of the measurements in order to decide on the best method. By comparing both classification results with the validation data, it was found that the ANN method has successfully classified buildings with high frequency of burglary cases to the high susceptibility class with no error at all, thus, proving it to be the best method. Meanwhile, the output from IVM had a very moderate percentage of sensitivity and specificity at 54.56% and 46.42% respectively. On the contrary, the knowledge-driven susceptibility map had a high percentage of sensitivity (86.51%) but a very low percentage of specificity (16.4%) which making it the least accurate model as it was not able to classify the high susceptible area correctly as compared to other modelling approaches. In conclusion, the results have indicated that the 18 indicators used in this research could be employed to successfully map the burglary susceptibility in the study area. Furthermore, it was also found that residential areas within the vicinity of Brickfields, Bangsar Baru, Hartamas and Bukit Pantai are consistent to be classified as high susceptible areas, meanwhile areas of Jalan Duta and Taman Tunku are both identified as the least susceptible areas across the modelling methods
    corecore