5 research outputs found

    Neural network algorithms for fraud detection: a comparison of the complementary techniques in the last five years

    Get PDF
    Purpose: The purpose of this research is to analyse the complementary updates and techniques in the optimization of the results of neural network algorithms (NNA) in order to detect financial fraud, providing a comparison of the trend, addressed field and efficiency of the models developed in current research. Design/Methodology/Approach: The author performed a qualitative study where a compilation and selection of literature was carried out, in terms of defining the conceptual analysis, database and search strategy, consequently selecting 32 documents. Subsequently, the comparative analysis was carried out, in turn being able to determine the most used and efficient complementary technique in the last five years. Findings: The results of the comparative analysis depicted that in 2019 there was a greater impact of research based on NNA with 11 studies. 27 complementary updates and techniques were identified related to NNA, where deep neural network algorithms (DNN), convolutional neural network (CNN) and SMOTE neural network. Finally, the evaluation of effectiveness in the collected techniques achieved an average accuracy ranging between 79% and 98.74% with an overall accuracy value of 91.32%. Originality/Value: Being a technique which is applied and compared in diverse studies, ANNs uses a wide range of mechanisms concerning training and classification of data. According to the findings of this research, the complementary techniques contribute to the progress and optimization of algorithms regarding financial fraud detection, having a high degree of effectiveness concerning on-line and credit card fraud

    Adaptive credit card fraud prediction using Artificial Neural Network

    Get PDF
    Capstone Project submitted to the Department of Engineering, Ashesi University in partial fulfillment of the requirements for the award of Bachelor of Science degree in Computer Engineering, May 2020Currently, there is a growth in online transactions which has led to the immerse growth of the number of credit card fraud. A lot more people are opting to shop online due to convenience and therefore they make online payments to make a purchase that would be delivered to them and in some cases, they make payments online for a service rendered to them. With such an opportunity, fraudsters are also increasing their fraud activities online. Therefore, this study seeks to detect credit card fraud using an adaptive tool and also attempts to reduce the number of wrongly predicted valid transactions made by the model. Researchers have used tools such as K-nearest neighbour, logistic regression, random forest, decision trees and others however, this study uses an autoencoder neural network to detect credit card fraud. The study then evaluates the model using an appropriate evaluation metric. Keywords: Fraud detection, adaptable, autoencoder neural network, credit card, online transactionsAshesi Universit

    Metodologias para seleção de variáveis explicativas e detecção de inconformidades de predição aplicadas à espectroscopia por fluorescência

    Get PDF
    A capacidade de predizer eventos futuros a partir de conhecimentos históricos é a base para a modelagem preditiva. Criar um modelo capaz de quantificar variáveis de interesse, classificar ocorrências ou prever comportamentos, acompanham a evolução dos algoritmos modernos de aprendizado de máquina. Na indústria de transformação, muitas das informações mais relevantes para o controle de processos ainda são adquiridas unicamente através de técnicas laboratoriais, que são custosas, destrutivas e morosas (como, por exemplo, concentração molecular de espécies de interesse, pureza de fármacos, lubricidade de óleos, teor de proteína em alimentos, etc.). Um possível caminho para automação destes sistemas é o estudo de novos sensores capazes de capturar uma informação auxiliar de fácil obtenção, que possa ser transformada matematicamente nas saídas de interesse. Surge então a aspiração por estudos que combinam a escolha de sensores adequados com metodologias capazes de extrair de maneira eficiente a informação útil contida nestes dados. Neste trabalho são apresentadas metodologias baseadas em diferentes estratégias para seleção de variáveis explicativas e otimização de modelos empíricos. Ainda, é proposta uma metodologia para qualificação de inconformidades em novas leituras utilizando redes neurais. É apresentada a metodologia AnTSbe, um algoritmo híbrido baseado nas meta-heurísticas Colônia de Formigas (ACO) e Busca Tabu (TS), desenvolvido para otimizar a seleção de variáveis de entrada em problemas combinatórios complexos. A hibridização das meta-heurísticas visa evitar a estagnação precoce e a ciclagem de subgrupos, comuns nessas metodologias. O algoritmo também introduz o uso da expansão polinomial e combinatória das variáveis de entrada, em um esforço para incrementar o poder preditivo dos modelos. Como estudo de caso, espectroscopia por fluorescência é utilizada para predizer concentração de enxofre em diesel combustível. Os modelos preditivos ajustados foram superiores a outras técnicas descritas na literatura, com erros absolutos percentuais médios de predição menores que 4%. As adaptações propostas se mostraram eficientes, quando comparadas a pesquisas prévias com a mesma base de dados. Uma adaptação é proposta ao algoritmo AnTSbe, focada para dados de fluorescência, com o conceito de Delta Pair. Uma nova camada de otimização é introduzida no algoritmo a fim de selecionar um par Excitação/Emissão que serve como regulador do meio, tendo sua intensidade de fluorescência decrescida de todos outros os pontos do espectro. Neste estudo, são acompanhados três processos distintos de envelhecimento de cachaça, com o intuito de predizer a concentração de fenólicos na bebida ao longo do tempo, com base em dados fluorescência. A adaptação Delta Pair se mostrou especialmente funcional quando combinada com expansão de bases e para predição de cachaças envelhecidas comerciais, que não participaram da etapa de calibração dos modelos. A seguir, matrizes excitação – emissão de fluorescência captadas in situ em fermentações com S. cerevisiae foram utilizadas para calibrar uma rede neural convolucional residual, como intuito de predizer glicose, etanol e biomassa no meio biológico. Em paralelo, foi desenvolvida uma metodologia baseada em redes neurais do tipo autoencoder (AE), capazes de corretamente reconstruir os espectros originais. A metodologia utiliza o erro de reconstrução da rede AE treinada para triagem não supervisionada de novos espectros, conseguindo identificar espectros com inconformidades, e qualificar a confiança que se pode atribuir a um novo dado, baseado na magnitude deste erro. Por fim, a metodologia AnTSbe é utilizada para predizer impurezas nas correntes de uma unidade de separação de propano/propeno, expandindo o uso da metodologia para casos da indústria petroquímica com base em dados simulados de processo (e não de fluorescência). A metodologia se mostrou capaz de corretamente predizer os perfis de concentração das três colunas de separação do processo com erros absolutos percentuais médios inferiores a 5%, com foco especial para quantificação dos contaminantes em cada corrente, que precisam ser mantidos sob controle para garantir a lucratividade da operação. Os artigos desenvolvidos demonstram, inclusive na ordem apresentada, o sucesso das metodologias propostas em aprofundar a seleção de variáveis significativas e otimização de modelos empíricos preditivos. A sucessão dos casos estudados parte do desenvolvimento do algoritmo estocástico base, segue para a busca de um reforço na capacidade de generalização dos modelos otimizados baseados em espectroscopia por fluorescência, apresenta uma técnica para qualificação de novas amostras e conclui com o uso dos algoritmos desenvolvidos em um caso industrial.The ability to predict future events from historical observations is the basis for predictive modeling. Creating a model capable of quantifying variables of interest, classifying occurrences or predicting behavior, follows the evolution of modern machine learning algorithms. In the manufacturing industry, much of the most relevant information for process control is still acquired only through laboratory techniques, which are costly, destructive and time-consuming (such as, for example, molecular concentration of species, purity of drugs, lubricity of oils, protein content in food, etc.). A possible way to automate these systems is the study of new sensors capable of capturing auxiliary information of easy application, which can be mathematically transformed in the outputs of interest. This is the aspiration for studies that combine the choice of skilled sensors with methodologies capable of efficiently extracting the useful information contained in the data. In this work we propose methodologies based on different machine learning methods for the optimization of empirical models. AnTSbe methodology is presented, a hybrid algorithm based on Ant Colony (ACO) and Tabu Search (TS) metaheuristics, developed to optimize the selection of input variables in complex combinatorial problems. The hybridization of metaheuristics aims to avoid early stagnation and cycling of subgroups, common in these methodologies. The algorithm also introduces the use of polynomial and combinatorial expansion of the input variables, in an effort to increase the predictive power of the models. As a case study, fluorescence spectroscopy is used to predict sulfur concentration in diesel fuel. The adjusted predictive models were superior to other techniques from literature, with mean absolute percentage errors of prediction smaller than 4%. The proposed adaptations were efficient, when compared to previous researches with the same database. An adaptation is proposed to the AnTSbe algorithm, focused on fluorescence data, with the concept of DeltaPair. A new optimization layer is introduced in the algorithm in order to select an Excitation/Emission pair that serves as a medium regulator, having its fluorescence intensity decreased from all other points in the spectrum. In this study, three distinct cachaça aging processes are followed, in order to predict the concentration of phenolics in the spirit over time, based on fluorescence data. The DeltaPair adaptation is especially functional when combined with base expansion and for the prediction of aged commercial cachaças, which does not participate in the calibration stage of the models. Following, fluorescence excitation - emission matrices, collected in situ in fermentations with S. cerevisiae, were used to calibrate a residual convolutional neural network, in order to predict glucose, ethanol and biomass in the biological environment. In parallel, a methodology based on autoencoder-type neural networks (AE) was developed, capable of correctly reconstructing the original spectra. The methodology uses the trained AE reconstruction error for unsupervised screening of new spectra, managing to identify abnormal spectra, and to qualify the confidence that can be attributed to a new data, based on the magnitude of this error. Despite the focus on fluorescence spectroscopy data, most of the methodologies were designed to be of general use, whatever the data source, with little or no modification. Finally, the AnTSbe methodology is used to predict impurities in the streams of a propane/propylene splitter unit, expanding the use of the methodology for cases in the petrochemical industry based on simulated process data (and not fluorescence). The methodology proved to be capable of correctly predicting the concentration profiles of the three process’ separation columns with mean absolute percentage errors below 5%, with a special focus on quantifying the contaminants in each stream, which need to be kept under control to ensure profitability of the operation. The articles developed demonstrate, in the order presented, the success of the proposed methodologies in deepening the selection of significant variables and the optimization of predictive empirical models. The succession of the studied cases starts from the development of the base stochastic algorithm, goes on to seek a reinforcement in the generalizability of the optimized models based on fluorescence spectroscopy, presents a technique for qualifying new samples and concludes with the use of the algorithms developed in an industrial case

    A new feature engineering framework for financial cyber fraud detection using machine learning and deep learning

    Get PDF
    As online payment system advances, the total losses via online banking in the United Kingdom have increased because fraudulent techniques have also progressed and used advanced technology. Using traditional fraud detection models with only raw transaction data cannot cope with the emerging new and innovative scheme to deceive financial institutions. Many studies published by both academic and commercial organisations introduce new fraud detection models using various machine learning algorithms, however, financial fraud losses via the online banking have been still increasing. This thesis looks at the holistic views of feature engineering for classification and machine learning (ML) and deep learning (DL) algorithms for fraud detection to understand their capabilities and how to deal with input data in each algorithm. And then, proposes a new feature engineering framework that can produce the most effective features set for any ML and DL algorithms by taking both methods of feature engineering and features selection into a new framework. The framework consists of two main components: feature creation and feature selection. The purpose of feature creation component is to create many effective feature candidates by feature aggregation and transformation based on customer’s behaviour. The purpose of feature selection is to evaluate all features and to drop irrelevant features and very high correlated features from the dataset. In the experiment, I proved the effect of using a new feature engineering framework by using a real-life banking transactional data provided by a private European bank and evaluating performances of the built fraud detection models in an appropriate way. Machine Learning and Deep learning models perform at their best when the created features set by the new framework are applied with higher scores in all evaluation metrics compared to the scores of the models built with the original dataset
    corecore