274 research outputs found
Cost-Sensitive Metaheuristic Optimization-Based Neural Network with Ensemble Learning for Financial Distress Prediction
Financial distress prediction is crucial in the financial domain because of its implications
for banks, businesses, and corporations. Serious financial losses may occur because of poor financial
distress prediction. As a result, significant efforts have been made to develop prediction models
that can assist decision-makers to anticipate events before they occur and avoid bankruptcy, thereby
helping to improve the quality of such tasks. Because of the usual highly imbalanced distribution
of data, financial distress prediction is a challenging task. Hence, a wide range of methods and
algorithms have been developed over recent decades to address the classification of imbalanced
datasets. Metaheuristic optimization-based artificial neural networks have shown exciting results in a
variety of applications, as well as classification problems. However, less consideration has been paid to
using a cost sensitivity fitness function in metaheuristic optimization-based artificial neural networks
to solve the financial distress prediction problem. In this work, we propose ENS_PSONNcost and
ENS_CSONNcost: metaheuristic optimization-based artificial neural networks that utilize a particle
swarm optimizer and a competitive swarm optimizer and five cost sensitivity fitness functions as
the base learners in a majority voting ensemble learning paradigm. Three extremely imbalanced
datasets from Spanish, Taiwanese, and Polish companies were considered to avoid dataset bias.
The results showed significant improvements in the g-mean (the geometric mean of sensitivity and
specificity) metric and the F1 score (the harmonic mean of precision and sensitivity) while maintaining
adequately high accuracy.Spanish Government PID2020-115570GB-C2
Novel Computationally Intelligent Machine Learning Algorithms for Data Mining and Knowledge Discovery
This thesis addresses three major issues in data mining regarding feature subset selection in large dimensionality domains, plausible reconstruction of incomplete data in cross-sectional applications, and forecasting univariate time series. For the automated selection of an optimal subset of features in real time, we present an improved hybrid algorithm: SAGA. SAGA combines the ability to avoid being trapped in local minima of Simulated Annealing with the very high convergence rate of the crossover operator of Genetic Algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks (GRNN). For imputing missing values and forecasting univariate time series, we propose a homogeneous neural network ensemble. The proposed ensemble consists of a committee of Generalized Regression Neural Networks (GRNNs) trained on different subsets of features generated by SAGA and the predictions of base classifiers are combined by a fusion rule. This approach makes it possible to discover all important interrelations between the values of the target variable and the input features. The proposed ensemble scheme has two innovative features which make it stand out amongst ensemble learning algorithms: (1) the ensemble makeup is optimized automatically by SAGA; and (2) GRNN is used for both base classifiers and the top level combiner classifier. Because of GRNN, the proposed ensemble is a dynamic weighting scheme. This is in contrast to the existing ensemble approaches which belong to the simple voting and static weighting strategy. The basic idea of the dynamic weighting procedure is to give a higher reliability weight to those scenarios that are similar to the new ones. The simulation results demonstrate the validity of the proposed ensemble model
Evolutionary multivariate time series prediction
Multivariate time series (MTS) prediction plays a significant role in many practical data mining applications, such as finance, energy supply, and medical care domains. Over the years, various prediction models have been developed to obtain robust and accurate prediction. However, this is not an easy task by considering a variety of key challenges. First, not all channels (each channel represents one time series) are informative (channel selection). Considering the complexity of each selected time series, it is difficult to predefine a time window used for inputs. Second, since the selected time series may come from cross domains collected with different devices, they may require different feature extraction techniques by considering suitable parameters to extract meaningful features (feature extraction), which influences the selection and configuration of the predictor, i.e., prediction (configuration). The challenge arising from channel selection, feature extraction, and prediction (configuration) is to perform them jointly to improve prediction performance. Third, we resort to ensemble learning to solve the MTS prediction problem composed of the previously mentioned operations,  where the challenge is to obtain a set of models satisfied both accurate and diversity. Each of these challenges leads to an NP-hard combinatorial optimization problem, which is impossible to be solved using the traditional methods since it is non-differentiable. Evolutionary algorithm (EA), as an efficient metaheuristic stochastic search technique, which is highly competent to solve complex combinatorial optimization problems having mixed types of decision variables, may provide an effective way to address the challenges arising from MTS prediction. The main contributions are supported by the following investigations. First, we propose a discrete evolutionary model, which mainly focuses on seeking the influential subset of channels of MTS and the optimal time windows for each of the selected channels for the MTS prediction task. A comprehensively experimental study on a real-world electricity consumption data with auxiliary environmental factors demonstrates the efficiency and effectiveness of the proposed method in searching for the informative time series and respective time windows and parameters in a predictor in comparison to the result obtained through enumeration. Subsequently, we define the basic MTS prediction pipeline containing channel selection, feature extraction, and prediction (configuration). To perform these key operations, we propose an evolutionary model construction (EMC) framework to seek the optimal subset of channels of MTS, suitable feature extraction methods and respective time windows applied to the selected channels, and parameter settings in the predictor simultaneously for the best prediction performance. To implement EMC, a two-step EA is proposed, where the first step EA mainly focuses on channel selection while in the second step, a specially designed EA works on feature extraction and prediction (configuration). A real-world electricity data with exogenous environmental information is used and the whole dataset is split into another two datasets according to holiday and nonholiday events. The performance of EMC is demonstrated on all three datasets in comparison to hybrid models and some existing methods. Then, based on the prediction pipeline defined previously, we propose an evolutionary multi-objective ensemble learning model (EMOEL) by employing multi-objective evolutionary algorithm (MOEA) subjected to two conflicting objectives, i.e., accuracy and model diversity. MOEA leads to a pareto front (PF) composed of non-dominated optimal solutions, where each of them represents the optimal subset of the selected channels, the selected feature extraction methods and the selected time windows, and the selected parameters in the predictor. To boost ultimate prediction accuracy, the models with respect to these optimal solutions are linearly combined with combination coefficients being optimized via a single-objective task-oriented EA. The superiority of EMOEL is identified on electricity consumption data with climate information in comparison to several state-of-the-art models. We also propose a multi-resolution selective ensemble learning model, where multiple resolutions are constructed from the minimal granularity using statistics. At the current time stamp, the preceding time series data is sampled at different time intervals (i.e., resolutions) to constitute the time windows. For each resolution, multiple base learners with different parameters are first trained. Feature selection technique is applied to search for the optimal set of trained base learners and least square regression is used to combine them. The performance of the proposed ensemble model is verified on the electricity consumption data for the next-step and next-day prediction. Finally, based on EMOEL and multi-resolution, instead of only combining the models generated from each PF, we propose an evolutionary ensemble learning (EEL) framework, where multiple PFs are aggregated to produce a composite PF (CPF) after removing the same solutions in PFs and being sorted into different levels of non-dominated fronts (NDFs). Feature selection techniques are applied to exploit the optimal subset of models in level-accumulated NDF and least square is used to combine the selected models. The performance of EEL that chooses three different predictors as base learners is evaluated by the comprehensive analysis of the parameter sensitivity. The superiority of EEL is demonstrated in comparison to the best result from single-objective EA and the best individual from the PF, and several state-of-the-art models across electricity consumption and air quality datasets, both of which use the environmental factors from other domains as the auxiliary factors. In summary, this thesis provides studies on how to build efficient and effective models for MTS prediction. The built frameworks investigate the influential factors, consider the pipeline composed of channel selection, feature extraction, and prediction (configuration) simultaneously, and keep good generalization and accuracy across different applications. The proposed algorithms to implement the frameworks use techniques from evolutionary computation (single-objective EA and MOEA), machine learning and data mining areas. We believe that this research provides a significant step towards constructing robust and accurate models for solving MTS prediction problems. In addition, with the case study on electricity consumption prediction, it will contribute to helping decision-makers in determining the trend of future energy consumption for scheduling and planning of the operations of the energy supply system
COMPARISON OF DOUBLE RANDOM FOREST AND LONG SHORT-TERM MEMORY METHODS FOR ANALYZING ECONOMIC INDICATOR DATA
The performance of machine learning in analyzing time series data is being widely discussed. A new ensemble method Double Random Forest (DRF), which considers supervised learning currently developed. This method has been claimed to be able to improve the performance of Random Forest (RF) if the data is under-fitting. Another machine learning method, Long Short-Term Memory Networks (LSTMs) have capability to analyze nonlinear data. Since the study compare both methods has not been existed in literature, it is interesting to compare the performance of both methods using Indonesian data, especially economic indicator data which have been found to be under-fitting, non-underfitting, and nonlinear data. The indicators used in this study are Export, Import, Official Reserves Asset, and Exchange Rate data. The results showed that overall, the LSTMs method outperforms DRF method in analyzing the data
A Survey on Reservoir Computing and its Interdisciplinary Applications Beyond Traditional Machine Learning
Reservoir computing (RC), first applied to temporal signal processing, is a
recurrent neural network in which neurons are randomly connected. Once
initialized, the connection strengths remain unchanged. Such a simple structure
turns RC into a non-linear dynamical system that maps low-dimensional inputs
into a high-dimensional space. The model's rich dynamics, linear separability,
and memory capacity then enable a simple linear readout to generate adequate
responses for various applications. RC spans areas far beyond machine learning,
since it has been shown that the complex dynamics can be realized in various
physical hardware implementations and biological devices. This yields greater
flexibility and shorter computation time. Moreover, the neuronal responses
triggered by the model's dynamics shed light on understanding brain mechanisms
that also exploit similar dynamical processes. While the literature on RC is
vast and fragmented, here we conduct a unified review of RC's recent
developments from machine learning to physics, biology, and neuroscience. We
first review the early RC models, and then survey the state-of-the-art models
and their applications. We further introduce studies on modeling the brain's
mechanisms by RC. Finally, we offer new perspectives on RC development,
including reservoir design, coding frameworks unification, physical RC
implementations, and interaction between RC, cognitive neuroscience and
evolution.Comment: 51 pages, 19 figures, IEEE Acces
A Comprehensive Survey on Enterprise Financial Risk Analysis: Problems, Methods, Spotlights and Applications
Enterprise financial risk analysis aims at predicting the enterprises' future
financial risk.Due to the wide application, enterprise financial risk analysis
has always been a core research issue in finance. Although there are already
some valuable and impressive surveys on risk management, these surveys
introduce approaches in a relatively isolated way and lack the recent advances
in enterprise financial risk analysis. Due to the rapid expansion of the
enterprise financial risk analysis, especially from the computer science and
big data perspective, it is both necessary and challenging to comprehensively
review the relevant studies. This survey attempts to connect and systematize
the existing enterprise financial risk researches, as well as to summarize and
interpret the mechanisms and the strategies of enterprise financial risk
analysis in a comprehensive way, which may help readers have a better
understanding of the current research status and ideas. This paper provides a
systematic literature review of over 300 articles published on enterprise risk
analysis modelling over a 50-year period, 1968 to 2022. We first introduce the
formal definition of enterprise risk as well as the related concepts. Then, we
categorized the representative works in terms of risk type and summarized the
three aspects of risk analysis. Finally, we compared the analysis methods used
to model the enterprise financial risk. Our goal is to clarify current
cutting-edge research and its possible future directions to model enterprise
risk, aiming to fully understand the mechanisms of enterprise risk
communication and influence and its application on corporate governance,
financial institution and government regulation
Gene expression programming for Efficient Time-series Financial Forecasting
Stock market prediction is of immense interest to trading companies and buyers due to
high profit margins. The majority of successful buying or selling activities occur close
to stock price turning trends. This makes the prediction of stock indices and analysis a
crucial factor in the determination that whether the stocks will increase or decrease the
next day. Additionally, precise prediction of the measure of increase or decrease of
stock prices also plays an important role in buying/selling activities. This research
presents two core aspects of stock-market prediction. Firstly, it presents a Networkbased
Fuzzy Inference System (ANFIS) methodology to integrate the capabilities of
neural networks with that of fuzzy logic. A specialised extension to this technique is
known as the genetic programming (GP) and gene expression programming (GEP) to
explore and investigate the outcome of the GEP criteria on the stock market price
prediction.
The research presented in this thesis aims at the modelling and prediction of short-tomedium
term stock value fluctuations in the market via genetically tuned stock market
parameters. The technique uses hierarchically defined GP and gene-expressionprogramming
(GEP) techniques to tune algebraic functions representing the fittest
equation for stock market activities. The technology achieves novelty by proposing a
fractional adaptive mutation rate Elitism (GEP-FAMR) technique to initiate a balance
between varied mutation rates between varied-fitness chromosomes thereby improving
prediction accuracy and fitness improvement rate. The methodology is evaluated
against five stock market companies with each having its own trading circumstances
during the past 20+ years. The proposed GEP/GP methodologies were evaluated based
on variable window/population sizes, selection methods, and Elitism, Rank and Roulette
selection methods. The Elitism-based approach showed promising results with a low
error-rate in the resultant pattern matching with an overall accuracy of 95.96% for
short-term 5-day and 95.35% for medium-term 56-day trading periods. The
contribution of this research to theory is that it presented a novel evolutionary
methodology with modified selection operators for the prediction of stock exchange
data via Gene expression programming. The methodology dynamically adapts the
mutation rate of different fitness groups in each generation to ensure a diversification
II
balance between high and low fitness solutions. The GEP-FAMR approach was
preferred to Neural and Fuzzy approaches because it can address well-reported
problems of over-fitting, algorithmic black-boxing, and data-snooping issues via GP
and GEP algorithmsSaudi Cultural Burea
- …