33 research outputs found

    Augmenting Data with Generative Adversarial Networks to Improve Machine Learning-Based Fraud Detection

    Get PDF
    While current machine learning methods can detect financial fraud more effectively, they suffer from a common problem: dataset imbalance, i.e. there are substantially more non-fraud than fraud cases. In this paper, we propose the application of generative adversarial networks (GANs) to generate synthetic fraud cases on a dataset of public firms convicted by the United States Securities and Exchange Commission for accounting malpractice. This approach aims to increase the prediction accuracy of a downstream logit, support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost) classifier by training on a more well-balanced dataset. While the results indicate that a state-of-the-art machine learning model like XGBoost can outperform previous fraud detection models on the same data, generating synthetic fraud cases before applying a machine learning model does not improve performance

    Earnings Prediction with Deep Leaning

    Full text link
    In the financial sector, a reliable forecast the future financial performance of a company is of great importance for investors' investment decisions. In this paper we compare long-term short-term memory (LSTM) networks to temporal convolution network (TCNs) in the prediction of future earnings per share (EPS). The experimental analysis is based on quarterly financial reporting data and daily stock market returns. For a broad sample of US firms, we find that both LSTMs outperform the naive persistent model with up to 30.0% more accurate predictions, while TCNs achieve and an improvement of 30.8%. Both types of networks are at least as accurate as analysts and exceed them by up to 12.2% (LSTM) and 13.2% (TCN).Comment: 7 pages, 4 figures, 2 tables, submitted to KI202

    Evaluation Of Machine Learning Tools For Distinguishing Fraud From Error

    Get PDF
    Fraud and error are two underlying sources of misstated financial statements. Modern machine learning techniques provide a potential direction to distinguish the two factors in such statements. In this paper, a thorough evaluation is conducted evaluation on how the off-the-shelf machine learning tools perform for fraud/error classification. In particular, the task is treated as a standard binary classification problem; i.e., mapping from an input vector of financial indices to a class label which is either error or fraud. With a real dataset of financial restatements, this study empirically evaluates and analyzes five state-of-the-art classifiers, including logistic regression, artificial neural network, support vector machines, decision trees, and bagging. There are several important observations from the experimental results. First, it is observed that bagging performs the best among these commonly used general purpose machine learning tools. Second, the results show that the underlying relationship from the statement indices to the fraud/error decision is likely to be non-linear. Third, it is very challenging to distinguish error from fraud, and general machine learning approaches, though perform better than pure chance, leave much room for improvement. The results suggest that more advanced or task-specific solutions are needed for fraud/error classification

    INTERNAL CONTROL AND FRAUD PREVENTION: PRIOR RESEARCH ANALYSIS

    Get PDF
    The focus of this study is to analyze prior research on fraud detection and prevention. Most researchers agree that strong internal controls are an influencing factor on fair financial reporting and fraud prevention and detection. Financial statement and employee fraud can be very expensive to businesses and the economy as a whole. The establishment and evaluation of the internal control methods and procedures can decrease fraudulent events and losses. Accounting professionals, CPA’s, and tax preparers are the first to detect “red flags” in business activities and must work together with boards of directors, CFO’s, and small business owners. Simple methods, such as ratio analyses can help to signal early signs of fraudulent events and prevent future damages.  Implementation of fraud prevention measures are the most efficient deterrent. Some of the most effective controls like, job rotation, mandatory vacations, training, fraud hotlines, and surprise audits, need not be expensive and should be employed by all businesses. Unfortunately, the most important and effective fraud prevention techniques are seldom applied by businesses. Surprisingly, the least effective and most expensive measures, like external audits, are more frequently employed. As reported in this review of the literature, most businesses focus on fraud detection, while fraud prevention and implementing proper internal controls would result in better prevention of financial losses. DOI: https://doi.org/10.15544/ssaf.2014.1

    Expressing uncertainty in security analytics research: a demonstration of Bayesian analysis applied to binary classification problems

    Get PDF
    A common application of security analytics is binary classification problems, which are typically assessed using measures derived from signal detection theory, such as accuracy, sensitivity, and specificity. However, these measures fail to incorporate the uncertainty inherent to many contexts into the results. We propose that the types of binary classification problems studied by security researchers can be described based on the level of uncertainty present in the data. We demonstrate the use of Bayes data analysis in security contexts with varying levels of uncertainty and conclude that Bayesian analysis is particularly relevant in applications characterized by high uncertainty. We discuss how to apply similar analyses to other information security research

    Detecting financial statement frauds in Malaysia: comparing the abilities of Beneish and Dechow models

    Get PDF
    Financial statement frauds (FSF) are becoming rampant phenomena in current economic and financial landscapes. One of the ways to curb FSF is to detect them early so that preventive measures can be applied. This study aims to empirically investigate the abilities of two financial-based models namely the Beneish’s M-score and Dechow’s F-score, to detect and predict FSF for Malaysian companies. In addition, this study compares the accuracy including the error rates between the two models. Financial data of Malaysian listed companies from 2001 to 2014 are used using a matched pair in this study. The findings reveal that both Beneish and Dechow models are effective in predicting both the fraudulent and non-fraudulent companies with average accuracy at 73.17% and 76.22%, respectively. The results also indicate that Dechow F-score model outperforms the Beneish M-score model in the sensitivity of predicting fraud cases with 73.17% compared to 69.51%. On the efficiency aspect, the Dechow F Score model is found to have lower type II error (26.83%) than Beneish M Score model (30.49%). This finding suggests that Dechow F Score model is a better model that can be used by the regulators to detect FSF among companies in Malaysia

    A Model for Detecting Accounting Frauds by using Machine Learning

    Get PDF
    This paper aims to develop a machine learning model that enables to predict signs of financial statement frauds by combining the domain knowledge of machine learning and accounting. Inputs of this model is a published dataset of financial statements, and outputs involve the conclusions whether the predicted financial statements indicate the signs of financial statement frauds or not. Currently, XGBoost is recognized as one of the most popular classification methods with fast performance, flexibility, and scalability. However, its default properties are not suitable for fraudulent detecting of imbalanced datasets. To overcome this drawback, this research introduces a new machine learning model based on XGBoost technique, called f(raud)-XGBoost. The proposed model not only inherits XGBoost advantages but also enables it to detect financial statement frauds. We apply the Area Under the Receiver Operating Characteristics Curve and NDCG@k to perform the evaluation process. The experimental results show that the new model performs slightly better than three existing models including logistic regression model that is based on financial ratios, Support-vector-machine model, and RUSBoost mode

    THE DETECTION OF FRAUDULENT FINANCIAL STATEMENTS: AN INTEGRATED LANGUAGE MODEL

    Get PDF
    Among the growing number of Chinese companies that went public overseas, many have been detected and alleged as conducting financial fraud by market research firms or U.S. Securities and Exchange Commission (SEC). Then investors lost money and even confidence to all overseas-listed Chinese companies. Likewise, these companies suffered serious stock sank or were even delisted from the stock exchange. Conventional auditing practices failed in these cases when misleading financial reports presented. This is partly because existing auditing practices and academic researches primarily focus on statistical analysis of structured financial ratios and market activity data in auditing process, while ignoring large amount of textual information about those companies in financial statements. In this paper, we build integrated language model, which combines statistical language model (SLM) and latent semantic analysis (LSA), to detect the strategic use of deceptive language in financial statements. By integrating SLM with LSA framework, the integrated model not only overcomes SLM’s inability to capture long-span information, but also extracts the semantic patterns which distinguish fraudulent financial statements from non-fraudulent ones. Four different modes of the integrated model are also studied and compared. With application to assess fraud risk in overseas-listed Chinese companies, the integrated model shows high accuracy to flag fraudulent financial statements

    Economic Aspects of the Missing Data Problem - the Case of the Patient Registry

    Get PDF
    Registries are indispensable in medical studies and provide the basis for reliable study results for research questions. Depending on the purpose of use, a high quality of data is a prerequisite. However, with increasing registry quality, costs also increase accordingly. Considering these time and cost factors, this work is an attempt to estimate the cost advantages of applying statistical tools to existing registry data, including quality evaluation. Results for quality analysis showed that there are unquestionable savings of millions in study costs by reducing the time horizon and saving on average EUR 523,126 for every reduced year. Replacing additionally the over 25 % missing data in some variables, data quality was immensely improved. To conclude, our findings showed dearly the importance of data quality and statistical input in avoiding biased conclusions due to incomplete data.O
    corecore