2,418 research outputs found

    Ubiquitous intelligence for smart cities: a public safety approach

    Get PDF
    Citizen-centered safety enhancement is an integral component of public safety and a top priority for decision makers in a smart city development. However, public safety agencies are constantly faced with the challenge of deterring crime. While most smart city initiatives have placed emphasis on the use of modern technology for fighting crime, this may not be sufficient to achieve a sustainable safe and smart city in a resource constrained environment, such as in Africa. In particular, crime series which is a set of crimes considered to have been committed by the same offender is currently less explored in developing nations and has great potential in helping to fight against crime and promoting safety in smart cities. This research focuses on detecting the situation of crime through data mining approaches that can be used to promote citizens' safety, and assist security agencies in knowledge-driven decision support, such as crime series identification. While much research has been conducted on crime hotspots, not enough has been done in the area of identifying crime series. This thesis presents a novel crime clustering model, CriClust, for crime series pattern (CSP) detection and mapping to derive useful knowledge from a crime dataset, drawing on sound scientific and mathematical principles, as well as assumptions from theories of environmental criminology. The analysis is augmented using a dual-threshold model, and pattern prevalence information is encoded in similarity graphs. Clusters are identified by finding highly-connected subgraphs using adaptive graph size and Monte-Carlo heuristics in the Karger-Stein mincut algorithm. We introduce two new interest measures: (i) Proportion Difference Evaluation (PDE), which reveals the propagation effect of a series and dominant series; and (ii) Pattern Space Enumeration (PSE), which reveals underlying strong correlations and defining features for a series. Our findings on experimental quasi-real data set, generated based on expert knowledge recommendation, reveal that identifying CSP and statistically interpretable patterns could contribute significantly to strengthening public safety service delivery in a smart city development. Evaluation was conducted to investigate: (i) the reliability of the model in identifying all inherent series in a crime dataset; (ii) the scalability of the model with varying crime records volume; and (iii) unique features of the model compared to competing baseline algorithms and related research. It was found that Monte Carlo technique and adaptive graph size mechanism for crime similarity clustering yield substantial improvement. The study also found that proportion estimation (PDE) and PSE of series clusters can provide valuable insight into crime deterrence strategies. Furthermore, visual enhancement of clusters using graphical approaches to organising information and presenting a unified viable view promotes a prompt identification of important areas demanding attention. Our model particularly attempts to preserve desirable and robust statistical properties. This research presents considerable empirical evidence that the proposed crime cluster (CriClust) model is promising and can assist in deriving useful crime pattern knowledge, contributing knowledge services for public safety authorities and intelligence gathering organisations in developing nations, thereby promoting a sustainable "safe and smart" city

    2023 SDSU Data Science Symposium Presentation Abstracts

    Get PDF
    This document contains abstracts for presentations and posters 2023 SDSU Data Science Symposium

    2023 SDSU Data Science Symposium Presentation Abstracts

    Get PDF
    This document contains abstracts for presentations and posters 2023 SDSU Data Science Symposium

    A systematic survey of online data mining technology intended for law enforcement

    Get PDF
    As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

    Three Essays on Crime and Delinquency: Immigration, Alcohol, and Probation

    Get PDF
    This dissertation is a collection of three distinct chapters. The first chapter explores the associations between various assimilation measures and outcomes of delinquent risk factor development for immigrant youth from Los Angeles County, CA, using both logit regression and doubly robust estimation methods. Youth with a low level of assimilation were more likely to have a high level of attachment to family and school than those with a high level of assimilation. However, the associations were less evident for the other delinquent risk factors. Among the assimilation measures used, the generation status measure was not predictive of the outcomes. The two language-related assimilation measures were less strongly associated with the outcomes than the immigrant status measure. The second chapter investigates crime impacts of the increased alcohol availability made through repealing a Sunday off-premise liquor sales ban. Since 2003, Pennsylvania permitted a part of its state-run liquor stores to open on Sunday by repealing the ban, which enables a quasi-experimental triple difference design. Some evidence of local crime pattern changes after the repeal was found. The repeal was associated with an increase in total crime incidents occurring in the immediate vicinity of the Sunday-open liquor stores in Philadelphia. At the same time, total crime incidents occurring relatively farther away from the stores decreased. These pattern changes were present in low socioeconomic status (SES) neighborhoods, but not evident in high SES ones. The third chapter evaluates whether the effect of a cognitive behavioral therapy (CBT) program on recidivism differ depending on probationer characteristics from a Bayesian perspective. Using Philadelphia CBT randomized controlled experiment data with a Bayesian hierarchical Gamma-Poisson model, the study compares average recidivism rates between the CBT and non-CBT groups, conditional on probationer characteristics. The Bayesian analysis showed that the effects of the CBT program were statistically meaningfully more evident for the high-risk probationers who were between 10-19 and 30-39 years old, who had more extensive prior experience on probation, and who had a higher ratio of high risk predictions. These results contradict the frequentist evaluation results that the CBT program had no statistically significant effect on recidivism

    Agents for educational games and simulations

    Get PDF
    This book consists mainly of revised papers that were presented at the Agents for Educational Games and Simulation (AEGS) workshop held on May 2, 2011, as part of the Autonomous Agents and MultiAgent Systems (AAMAS) conference in Taipei, Taiwan. The 12 full papers presented were carefully reviewed and selected from various submissions. The papers are organized topical sections on middleware applications, dialogues and learning, adaption and convergence, and agent applications

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Democratizing machine learning

    Get PDF
    Modelle des maschinellen Lernens sind zunehmend in der Gesellschaft verankert, oft in Form von automatisierten Entscheidungsprozessen. Ein wesentlicher Grund dafür ist die verbesserte Zugänglichkeit von Daten, aber auch von Toolkits für maschinelles Lernen, die den Zugang zu Methoden des maschinellen Lernens für Nicht-Experten ermöglichen. Diese Arbeit umfasst mehrere Beiträge zur Demokratisierung des Zugangs zum maschinellem Lernen, mit dem Ziel, einem breiterem Publikum Zugang zu diesen Technologien zu er- möglichen. Die Beiträge in diesem Manuskript stammen aus mehreren Bereichen innerhalb dieses weiten Gebiets. Ein großer Teil ist dem Bereich des automatisierten maschinellen Lernens (AutoML) und der Hyperparameter-Optimierung gewidmet, mit dem Ziel, die oft mühsame Aufgabe, ein optimales Vorhersagemodell für einen gegebenen Datensatz zu finden, zu vereinfachen. Dieser Prozess besteht meist darin ein für vom Benutzer vorgegebene Leistungsmetrik(en) optimales Modell zu finden. Oft kann dieser Prozess durch Lernen aus vorhergehenden Experimenten verbessert oder beschleunigt werden. In dieser Arbeit werden drei solcher Methoden vorgestellt, die entweder darauf abzielen, eine feste Menge möglicher Hyperparameterkonfigurationen zu erhalten, die wahrscheinlich gute Lösungen für jeden neuen Datensatz enthalten, oder Eigenschaften der Datensätze zu nutzen, um neue Konfigurationen vorzuschlagen. Darüber hinaus wird eine Sammlung solcher erforderlichen Metadaten zu den Experimenten vorgestellt, und es wird gezeigt, wie solche Metadaten für die Entwicklung und als Testumgebung für neue Hyperparameter- Optimierungsmethoden verwendet werden können. Die weite Verbreitung von ML-Modellen in vielen Bereichen der Gesellschaft erfordert gleichzeitig eine genauere Untersuchung der Art und Weise, wie aus Modellen abgeleitete automatisierte Entscheidungen die Gesellschaft formen, und ob sie möglicherweise Individuen oder einzelne Bevölkerungsgruppen benachteiligen. In dieser Arbeit wird daher ein AutoML-Tool vorgestellt, das es ermöglicht, solche Überlegungen in die Suche nach einem optimalen Modell miteinzubeziehen. Diese Forderung nach Fairness wirft gleichzeitig die Frage auf, ob die Fairness eines Modells zuverlässig geschätzt werden kann, was in einem weiteren Beitrag in dieser Arbeit untersucht wird. Da der Zugang zu Methoden des maschinellen Lernens auch stark vom Zugang zu Software und Toolboxen abhängt, sind mehrere Beiträge in Form von Software Teil dieser Arbeit. Das R-Paket mlr3pipelines ermöglicht die Einbettung von Modellen in sogenan- nte Machine Learning Pipelines, die Vor- und Nachverarbeitungsschritte enthalten, die im maschinellen Lernen und AutoML häufig benötigt werden. Das mlr3fairness R-Paket hingegen ermöglicht es dem Benutzer, Modelle auf potentielle Benachteiligung hin zu über- prüfen und diese durch verschiedene Techniken zu reduzieren. Eine dieser Techniken, multi-calibration wurde darüberhinaus als seperate Software veröffentlicht.Machine learning artifacts are increasingly embedded in society, often in the form of automated decision-making processes. One major reason for this, along with methodological improvements, is the increasing accessibility of data but also machine learning toolkits that enable access to machine learning methodology for non-experts. The core focus of this thesis is exactly this – democratizing access to machine learning in order to enable a wider audience to benefit from its potential. Contributions in this manuscript stem from several different areas within this broader area. A major section is dedicated to the field of automated machine learning (AutoML) with the goal to abstract away the tedious task of obtaining an optimal predictive model for a given dataset. This process mostly consists of finding said optimal model, often through hyperparameter optimization, while the user in turn only selects the appropriate performance metric(s) and validates the resulting models. This process can be improved or sped up by learning from previous experiments. Three such methods one with the goal to obtain a fixed set of possible hyperparameter configurations that likely contain good solutions for any new dataset and two using dataset characteristics to propose new configurations are presented in this thesis. It furthermore presents a collection of required experiment metadata and how such meta-data can be used for the development and as a test bed for new hyperparameter optimization methods. The pervasion of models derived from ML in many aspects of society simultaneously calls for increased scrutiny with respect to how such models shape society and the eventual biases they exhibit. Therefore, this thesis presents an AutoML tool that allows incorporating fairness considerations into the search for an optimal model. This requirement for fairness simultaneously poses the question of whether we can reliably estimate a model’s fairness, which is studied in a further contribution in this thesis. Since access to machine learning methods also heavily depends on access to software and toolboxes, several contributions in the form of software are part of this thesis. The mlr3pipelines R package allows for embedding models in so-called machine learning pipelines that include pre- and postprocessing steps often required in machine learning and AutoML. The mlr3fairness R package on the other hand enables users to audit models for potential biases as well as reduce those biases through different debiasing techniques. One such technique, multi-calibration is published as a separate software package, mcboost
    • …
    corecore