1,689 research outputs found

    Financial fraud detection by using grammar-based multiobjective genetic programming with ensemble learning

    Full text link
    Financial fraud is a criminal act, which violates the law, rules or policy to gain unauthorized financial benefit. As an increasingly serious problem, it has attracted a lot of concerns. The major consequences are loss of billions of dollars each year, investor confidence and corporate reputation. Therefore, a study area called Financial Fraud Detection (FFD) is obligatory, in order to prevent the destructive results caused by financial fraud. In general, traditional modeling approaches are applied and based on pre-defined hypothesis testing of causes and effects for FFD problems. In addition, the evaluation criteria are often based on variable significance level or Goodness-of-fit only. FFD has many common features like other data mining problems. It has accumulated vast amounts of data records of different forms (e.g. financial statements or annual reports) over a period of time. It is very difficult to observe the interesting information just by relying on traditional statistical methods. However, data mining techniques can be used to extract implicit, previously unknown and potentially useful patterns, rules or relations from massive data repositories. Such discovered patterns are appropriate to executive leadership, stakeholders and related regulatory agencies to reduce or avoid the losses. As real-life problems, it is not sufficient for FFD to consider only a single criterion (e.g. Goodness-of-fit or accuracy). Instead, FFD can also seek multiple objectives (e.g. accuracy versus interestingness). It is not easy to consider multiple objectives at the same time unless applying combination methods (e.g. linear combination) by assigning different weights to present the importance for each criterion by using data mining techniques with a single evaluation criterion. For example, accuracy is more important than interestingness with weights of 0.9:0.1. But it is still difficult to decide the appropriate or exact values for weights. There-fore, multi-objective data-mining techniques are required to tackle FFD problems. In this study, FFD is targeted, and comprehensively evaluated by a number of methods. The proposed method is based on Grammar-Based Genetic Programming (GBGP), which has been proven to be a powerful data mining technique to generate compact and straightforward results. The major contributions are three improvements of GBGP for FFD problems. First, multi-criteria are considered by integrating the concept of multi-objectives into GBGP. Second, minority prediction is applied to demonstrate the class prediction with unmatched rows in their rules. Lastly, a new meta-heuristic approach is introduced for ensemble learning in order to help users to select patterns from a pool of models to facilitate final decision-making. The experimental results showed the effectiveness of the new approach in four FFD problems including two real-life problems. The major implications and significances of the study can concretely generalize for three points. First, it suggests a new ensemble learning technique with GBGP. Second, it demonstrates the usability of classification rules generated by the proposed method. Third, it provides an efficient multi-objective method for solving FFD problems

    An academic review: applications of data mining techniques in finance industry

    Get PDF
    With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance

    Algorithms in future capital markets: A survey on AI, ML and associated algorithms in capital markets

    Get PDF
    This paper reviews Artificial Intelligence (AI), Machine Learning (ML) and associated algorithms in future Capital Markets. New AI algorithms are constantly emerging, with each 'strain' mimicking a new form of human learning, reasoning, knowledge, and decisionmaking. The current main disrupting forms of learning include Deep Learning, Adversarial Learning, Transfer and Meta Learning. Albeit these modes of learning have been in the AI/ML field more than a decade, they now are more applicable due to the availability of data, computing power and infrastructure. These forms of learning have produced new models (e.g., Long Short-Term Memory, Generative Adversarial Networks) and leverage important applications (e.g., Natural Language Processing, Adversarial Examples, Deep Fakes, etc.). These new models and applications will drive changes in future Capital Markets, so it is important to understand their computational strengths and weaknesses. Since ML algorithms effectively self-program and evolve dynamically, financial institutions and regulators are becoming increasingly concerned with ensuring there remains a modicum of human control, focusing on Algorithmic Interpretability/Explainability, Robustness and Legality. For example, the concern is that, in the future, an ecology of trading algorithms across different institutions may 'conspire' and become unintentionally fraudulent (cf. LIBOR) or subject to subversion through compromised datasets (e.g. Microsoft Tay). New and unique forms of systemic risks can emerge, potentially coming from excessive algorithmic complexity. The contribution of this paper is to review AI, ML and associated algorithms, their computational strengths and weaknesses, and discuss their future impact on the Capital Markets

    Data mining in computational finance

    Get PDF
    Computational finance is a relatively new discipline whose birth can be traced back to early 1950s. Its major objective is to develop and study practical models focusing on techniques that apply directly to financial analyses. The large number of decisions and computationally intensive problems involved in this discipline make data mining and machine learning models an integral part to improve, automate, and expand the current processes. One of the objectives of this research is to present a state-of-the-art of the data mining and machine learning techniques applied in the core areas of computational finance. Next, detailed analysis of public and private finance datasets is performed in an attempt to find interesting facts from data and draw conclusions regarding the usefulness of features within the datasets. Credit risk evaluation is one of the crucial modern concerns in this field. Credit scoring is essentially a classification problem where models are built using the information about past applicants to categorise new applicants as ‘creditworthy’ or ‘non-creditworthy’. We appraise the performance of a few classical machine learning algorithms for the problem of credit scoring. Typically, credit scoring databases are large and characterised by redundant and irrelevant features, making the classification task more computationally-demanding. Feature selection is the process of selecting an optimal subset of relevant features. We propose an improved information-gain directed wrapper feature selection method using genetic algorithms and successfully evaluate its effectiveness against baseline and generic wrapper methods using three benchmark datasets. One of the tasks of financial analysts is to estimate a company’s worth. In the last piece of work, this study predicts the growth rate for earnings of companies using three machine learning techniques. We employed the technique of lagged features, which allowed varying amounts of recent history to be brought into the prediction task, and transformed the time series forecasting problem into a supervised learning problem. This work was applied on a private time series dataset

    Automated Feature Engineering for Deep Neural Networks with Genetic Programming

    Get PDF
    Feature engineering is a process that augments the feature vector of a machine learning model with calculated values that are designed to enhance the accuracy of a model’s predictions. Research has shown that the accuracy of models such as deep neural networks, support vector machines, and tree/forest-based algorithms sometimes benefit from feature engineering. Expressions that combine one or more of the original features usually create these engineered features. The choice of the exact structure of an engineered feature is dependent on the type of machine learning model in use. Previous research demonstrated that various model families benefit from different types of engineered feature. Random forests, gradient-boosting machines, or other tree-based models might not see the same accuracy gain that an engineered feature allowed neural networks, generalized linear models, or other dot-product based models to achieve on the same data set. This dissertation presents a genetic programming-based algorithm that automatically engineers features that increase the accuracy of deep neural networks for some data sets. For a genetic programming algorithm to be effective, it must prioritize the search space and efficiently evaluate what it finds. This dissertation algorithm faced a potential search space composed of all possible mathematical combinations of the original feature vector. Five experiments were designed to guide the search process to efficiently evolve good engineered features. The result of this dissertation is an automated feature engineering (AFE) algorithm that is computationally efficient, even though a neural network is used to evaluate each candidate feature. This approach gave the algorithm a greater opportunity to specifically target deep neural networks in its search for engineered features that improve accuracy. Finally, a sixth experiment empirically demonstrated the degree to which this algorithm improved the accuracy of neural networks on data sets augmented by the algorithm’s engineered features

    A Corpus Driven Computational Intelligence Framework for Deception Detection in Financial Text

    Get PDF
    Financial fraud rampages onwards seemingly uncontained. The annual cost of fraud in the UK is estimated to be as high as £193bn a year [1] . From a data science perspective and hitherto less explored this thesis demonstrates how the use of linguistic features to drive data mining algorithms can aid in unravelling fraud. To this end, the spotlight is turned on Financial Statement Fraud (FSF), known to be the costliest type of fraud [2]. A new corpus of 6.3 million words is composed of102 annual reports/10-K (narrative sections) from firms formally indicted for FSF juxtaposed with 306 non-fraud firms of similar size and industrial grouping. Differently from other similar studies, this thesis uniquely takes a wide angled view and extracts a range of features of different categories from the corpus. These linguistic correlates of deception are uncovered using a variety of techniques and tools. Corpus linguistics methodology is applied to extract keywords and to examine linguistic structure. N-grams are extracted to draw out collocations. Readability measurement in financial text is advanced through the extraction of new indices that probe the text at a deeper level. Cognitive and perceptual processes are also picked out. Tone, intention and liquidity are gauged using customised word lists. Linguistic ratios are derived from grammatical constructs and word categories. An attempt is also made to determine ‘what’ was said as opposed to ‘how’. Further a new module is developed to condense synonyms into concepts. Lastly frequency counts from keywords unearthed from a previous content analysis study on financial narrative are also used. These features are then used to drive machine learning based classification and clustering algorithms to determine if they aid in discriminating a fraud from a non-fraud firm. The results derived from the battery of models built typically exceed classification accuracy of 70%. The above process is amalgamated into a framework. The process outlined, driven by empirical data demonstrates in a practical way how linguistic analysis could aid in fraud detection and also constitutes a unique contribution made to deception detection studies

    A literature review on the application of evolutionary computing to credit scoring

    Get PDF
    The last years have seen the development of many credit scoring models for assessing the creditworthiness of loan applicants. Traditional credit scoring methodology has involved the use of statistical and mathematical programming techniques such as discriminant analysis, linear and logistic regression, linear and quadratic programming, or decision trees. However, the importance of credit grant decisions for financial institutions has caused growing interest in using a variety of computational intelligence techniques. This paper concentrates on evolutionary computing, which is viewed as one of the most promising paradigms of computational intelligence. Taking into account the synergistic relationship between the communities of Economics and Computer Science, the aim of this paper is to summarize the most recent developments in the application of evolutionary algorithms to credit scoring by means of a thorough review of scientific articles published during the period 2000–2012.This work has partially been supported by the Spanish Ministry of Education and Science under grant TIN2009-14205 and the Generalitat Valenciana under grant PROMETEO/2010/028

    UTB/TSC Legacy Degree Programs and Courses 2013 – 2014

    Get PDF
    https://scholarworks.utrgv.edu/brownsvillelegacycatalogs/1029/thumbnail.jp
    • …
    corecore