905 research outputs found

    Classification of Credit Card Frauds Detection using machine learning techniques

    Get PDF
    Credit card fraud refers to the illegal activities carried out by criminals. In this research paper, we delve into the topic by exploring four different approaches to analyze fraud, namely decision trees, logistic regression, support vector machines, and Random Forests. Our proposed technique encompasses four stages: inputting the dataset, balancing the data through sampling, training classifier models, and detecting fraud. To analyze the data, we utilized two methods: forward stepwise logistic regression analysis (LR) and decision tree analysis (DT), in addition to Random Forest and support vector machine. Based on the outcomes of our analysis, the decision tree algorithm produced the highest AUC and accuracy value, achieving a perfect score of 1. On the other hand, logistic regression yielded the lowest values of 0.33 and 0.2933 for AUC and accuracy, respectively. Moreover, the implementation of forest algorithms resulted in an impressive accuracy rate of 99.5%, which signifies a significant advancement in automating the detection of credit card fraud

    A systematic survey of online data mining technology intended for law enforcement

    Get PDF
    As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

    Artificial Intelligence for Sustainability—A Systematic Review of Information Systems Literature

    Get PDF
    The booming adoption of Artificial Intelligence (AI) likewise poses benefits and challenges. In this paper, we particularly focus on the bright side of AI and its promising potential to face our society’s grand challenges. Given this potential, different studies have already conducted valuable work by conceptualizing specific facets of AI and sustainability, including reviews on AI and Information Systems (IS) research or AI and business values. Nonetheless, there is still little holistic knowledge at the intersection of IS, AI, and sustainability. This is problematic because the IS discipline, with its socio-technical nature, has the ability to integrate perspectives beyond the currently dominant technological one as well as can advance both theory and the development of purposeful artifacts. To bridge this gap, we disclose how IS research currently makes use of AI to boost sustainable development. Based on a systematically collected corpus of 95 articles, we examine sustainability goals, data inputs, technologies and algorithms, and evaluation approaches that coin the current state of the art within the IS discipline. This comprehensive overview enables us to make more informed investments (e.g., policy and practice) as well as to discuss blind spots and possible directions for future research

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Data science for tax administration

    Get PDF
    In this PhD-thesis several new and existing data science application are described that are particularly focused on applications for tax administrations. The thesis contains a chapter on the managerial side of analytics with a balanced overview of the pros and cons of applying analytics within taxpayer supervision. Another topic is (tax) fraud detection with unsupervised anomaly detection techniques. Here a new type of outliers is described (singular outliers) and an algorithm is provided for finding them. Attention is also paid to improving risk selection models. It is noted that most current algorithms cannot treat interactions of categorical variables with many levels very well. An extension of logistic regression is provided that uses Factorization Machines, which resulted in a ten percent improvement in precision. A fourth topic is statistical testing on similar treatment of similar cases. A contribution is made by providing an algorithm to statistically test on similar treatment based on process logs. The thesis contains further a benchmark study of different anomaly detection algorithms. Finally HR Analytics, Reinforcement Learning and applications of fuzzy sets are shortly described. Algorithms and the Foundations of Software technolog

    Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery

    Get PDF
    Evaluating transactional payment behaviour offers a competitive advantage in the modern payment ecosystem, not only for confirming the presence of good credit applicants or unlocking the cross-selling potential between the respective product and service portfolios of financial institutions, but also to rule out bad credit applicants precisely in transactional payments streams. In a diagnostic test for analysing the payment behaviour, I have used a hybrid approach comprising a combination of supervised and unsupervised learning algorithms to discover behavioural patterns. Supervised learning algorithms can compute a range of credit scores and cross-sell candidates, although the applied methods only discover limited behavioural patterns across the payment streams. Moreover, the performance of the applied supervised learning algorithms varies across the different data models and their optimisation is inversely related to the pre-processed dataset. Subsequently, the research experiments conducted suggest that the Two-Class Decision Forest is an effective algorithm to determine both the cross-sell candidates and creditworthiness of their customers. In addition, a deep-learning model using neural network has been considered with a meaningful interpretation of future payment behaviour through categorised payment transactions, in particular by providing additional deep insights through graph-based visualisations. However, the research shows that unsupervised learning algorithms play a central role in evaluating the transactional payment behaviour of customers to discover associations using market basket analysis based on previous payment transactions, finding the frequent transactions categories, and developing interesting rules when each transaction category is performed on the same payment stream. Current research also reveals that the transactional payment behaviour analysis is multifaceted in the financial industry for assessing the diagnostic ability of promotion candidates and classifying bad credit applicants from among the entire customer base. The developed predictive models can also be commonly used to estimate the credit risk of any credit applicant based on his/her transactional payment behaviour profile, combined with deep insights from the categorised payment transactions analysis. The research study provides a full review of the performance characteristic results from different developed data models. Thus, the demonstrated data science approach is a possible proof of how machine learning models can be turned into cost-sensitive data models

    Cyber Places, Crime Patterns, and Cybercrime Prevention: An Environmental Criminology and Crime Analysis approach through Data Science

    Get PDF
    For years, academics have examined the potential usefulness of traditional criminological theories to explain and prevent cybercrime. Some analytical frameworks from Environmental Criminology and Crime Analysis (ECCA), such as the Routine Activities Approach and Situational Crime Prevention, are frequently used in theoretical and empirical research for this purpose. These efforts have led to a better understanding of how crime opportunities are generated in cyberspace, thus contributing to advancing the discipline. However, with a few exceptions, other ECCA analytical frameworks — especially those based on the idea of geographical place— have been largely ignored. The limited attention devoted to ECCA from a global perspective means its true potential to prevent cybercrime has remained unknown to date. In this thesis we aim to overcome this geographical gap in order to show the potential of some of the essential concepts that underpin the ECCA approach, such as places and crime patterns, to analyse and prevent four crimes committed in cyberspace. To this end, this dissertation is structured in two phases: firstly, a proposal for the transposition of ECCA's fundamental propositions to cyberspace; and secondly, deriving from this approach some hypotheses are contrasted in four empirical studies through Data Science. The first study contrasts a number of premises of repeat victimization in a sample of more than nine million self-reported website defacements. The second examines the precipitators of crime at cyber places where allegedly fixed match results are advertised and the hyperlinked network they form. The third explores the situational contexts where repeated online harassment occurs among a sample of non-university students. And the fourth builds two metadata-driven machine learning models to detect online hate speech in a sample of Twitter messages collected after a terrorist attack. General results show (1) that cybercrimes are not randomly distributed in space, time, or among people; and (2) that the environmental features of the cyber places where they occur determine the emergence of crime opportunities. Overall, we conclude that the ECCA approach and, in particular, its place-based analytical frameworks can also be valid for analysing and preventing crime in cyberspace. We anticipate that this work can guide future research in this area including: the design of secure online environments, the allocation of preventive resources to high-risk cyber places, and the implementation of new evidence- based situational prevention measure
    • …
    corecore