1,330 research outputs found
Machine Learning and Finance: A Review using Latent Dirichlet Allocation Technique (LDA)
The aim of this paper is provide a first comprehensive structuring of the literature applying machine learning to finance. We use a probabilistic topic modelling approach to make sense of this diverse body of research spanning across the disciplines of finance, economics, computer sciences, and decision sciences. Through the topic modelling approach, a Latent Dirichlet Allocation Technique (LDA), we can extract the 14 coherent research topics that are the focus of the 6,148 academic articles during the years 1990-2019 analysed. We first describe and structure these topics, and then further show how the topic focus has evolved over the last two decades. Our study thus provides a structured topography for finance researchers seeking to integrate machine learning research approaches in their exploration of finance phenomena. We also showcase the benefits to finance researchers of the method of probabilistic modelling of topics for deep comprehension of a body of literature, especially when that literature has diverse multi-disciplinary actors
Recommended from our members
A review of asset management literature on multi-asset systems
This article gives an overview of the literature on asset management for multi-unit systems with an emphasis on two multi-asset categories: fleet (a system of homogeneous assets) and portfolio (a system of heterogeneous assets). As asset systems become more complicated, researchers have employed different terms to refer to their specific problems. With an
objective to facilitate readers in searching conducive studies to their interests, this paper establishes a novel classification scheme for multi-unit systems in accordance with essential features such as diversity of assets and intervention options. Moreover, discerning differences in characteristics between cross-component and cross-asset interactions, we select three types of potential multi-component dependencies (performance, stochastic, and resource) and extend their notions to be applicable to multi-asset systems. The investigation into these dependencies enables the identification of problems that could exist in real industrial settings
but are yet to be determined in academia. Ultimately, we delve into modelling approaches adopted by previous researchers. This comprehensive information allows us to offer the insights into the current trends in multi-asset maintenance. We expect that the output of this review paper will not only stress research gaps on multi-asset systems, but more importantly
help systematise future studies on this aspect
Technical and Fundamental Features Analysis for Stock Market Prediction with Data Mining Methods
Predicting stock prices is an essential objective in the financial world. Forecasting stock returns and their risk represents one of the most critical concerns of market decision makers. This thesis investigates the stock price forecasting with three approaches from the data mining concept and shows how different elements in the stock price can help to enhance the accuracy of our prediction. For this reason, the first and second approaches capture many fundamental indicators from the stocks and implement them as explanatory variables to do stock price classification and forecasting. In the third approach, technical features from the candlestick representation of the share prices are extracted and used to enhance the accuracy of the forecasting. In each approach, different tools and techniques from data mining and machine learning are employed to justify why the forecasting is working.
Furthermore, since the idea is to evaluate the potential of features in the stock trend forecasting, therefore we diversify our experiments using both technical and fundamental features. Therefore, in the first approach, a three-stage methodology is developed while in the first step, a comprehensive investigation of all possible features which can be effective on stocks risk and return are identified. Then, in the next stage, risk and return are predicted by applying data mining techniques for the given features. Finally, we develop a hybrid algorithm, based on some filters and function-based clustering; and re-predicted the risk and return of stocks.
In the second approach, instead of using single classifiers, a fusion model is proposed based on the use of multiple diverse base classifiers that operate on a common input and a meta-classifier that learns from base classifiers’ outputs to obtain a more precise stock return and risk predictions. A set of diversity methods, including Bagging, Boosting, and AdaBoost, is applied to create diversity in classifier combinations. Moreover, the number and procedure for selecting base classifiers for fusion schemes are determined using a methodology based on dataset clustering and candidate classifiers’ accuracy.
Finally, in the third approach, a novel forecasting model for stock markets based on the wrapper ANFIS (Adaptive Neural Fuzzy Inference System) – ICA (Imperialist Competitive Algorithm) and technical analysis of Japanese Candlestick is presented. Two approaches of Raw-based and Signal-based are devised to extract the model’s input variables and buy and sell signals are considered as output variables.
To illustrate the methodologies, for the first and second approaches, Tehran Stock Exchange (TSE) data for the period from 2002 to 2012 are applied, while for the third approach, we used General Motors and Dow Jones indexes.Predicting stock prices is an essential objective in the financial world. Forecasting stock returns and their risk represents one of the most critical concerns of market decision makers. This thesis investigates the stock price forecasting with three approaches from the data mining concept and shows how different elements in the stock price can help to enhance the accuracy of our prediction. For this reason, the first and second approaches capture many fundamental indicators from the stocks and implement them as explanatory variables to do stock price classification and forecasting. In the third approach, technical features from the candlestick representation of the share prices are extracted and used to enhance the accuracy of the forecasting. In each approach, different tools and techniques from data mining and machine learning are employed to justify why the forecasting is working.
Furthermore, since the idea is to evaluate the potential of features in the stock trend forecasting, therefore we diversify our experiments using both technical and fundamental features. Therefore, in the first approach, a three-stage methodology is developed while in the first step, a comprehensive investigation of all possible features which can be effective on stocks risk and return are identified. Then, in the next stage, risk and return are predicted by applying data mining techniques for the given features. Finally, we develop a hybrid algorithm, based on some filters and function-based clustering; and re-predicted the risk and return of stocks.
In the second approach, instead of using single classifiers, a fusion model is proposed based on the use of multiple diverse base classifiers that operate on a common input and a meta-classifier that learns from base classifiers’ outputs to obtain a more precise stock return and risk predictions. A set of diversity methods, including Bagging, Boosting, and AdaBoost, is applied to create diversity in classifier combinations. Moreover, the number and procedure for selecting base classifiers for fusion schemes are determined using a methodology based on dataset clustering and candidate classifiers’ accuracy.
Finally, in the third approach, a novel forecasting model for stock markets based on the wrapper ANFIS (Adaptive Neural Fuzzy Inference System) – ICA (Imperialist Competitive Algorithm) and technical analysis of Japanese Candlestick is presented. Two approaches of Raw-based and Signal-based are devised to extract the model’s input variables and buy and sell signals are considered as output variables.
To illustrate the methodologies, for the first and second approaches, Tehran Stock Exchange (TSE) data for the period from 2002 to 2012 are applied, while for the third approach, we used General Motors and Dow Jones indexes.154 - Katedra financívyhově
Reinforcement Learning Applied to Trading Systems: A Survey
Financial domain tasks, such as trading in market exchanges, are challenging
and have long attracted researchers. The recent achievements and the consequent
notoriety of Reinforcement Learning (RL) have also increased its adoption in
trading tasks. RL uses a framework with well-established formal concepts, which
raises its attractiveness in learning profitable trading strategies. However,
RL use without due attention in the financial area can prevent new researchers
from following standards or failing to adopt relevant conceptual guidelines. In
this work, we embrace the seminal RL technical fundamentals, concepts, and
recommendations to perform a unified, theoretically-grounded examination and
comparison of previous research that could serve as a structuring guide for the
field of study. A selection of twenty-nine articles was reviewed under our
classification that considers RL's most common formulations and design patterns
from a large volume of available studies. This classification allowed for
precise inspection of the most relevant aspects regarding data input,
preprocessing, state and action composition, adopted RL techniques, evaluation
setups, and overall results. Our analysis approach organized around fundamental
RL concepts allowed for a clear identification of current system design best
practices, gaps that require further investigation, and promising research
opportunities. Finally, this review attempts to promote the development of this
field of study by facilitating researchers' commitment to standards adherence
and helping them to avoid straying away from the RL constructs' firm ground.Comment: 38 page
The Stock Exchange Prediction using Machine Learning Techniques: A Comprehensive and Systematic Literature Review
This literature review identifies and analyzes research topic trends, types of data sets, learning algorithm, methods improvements, and frameworks used in stock exchange prediction. A total of 81 studies were investigated, which were published regarding stock predictions in the period January 2015 to June 2020 which took into account the inclusion and exclusion criteria. The literature review methodology is carried out in three major phases: review planning, implementation, and report preparation, in nine steps from defining systematic review requirements to presentation of results. Estimation or regression, clustering, association, classification, and preprocessing analysis of data sets are the five main focuses revealed in the main study of stock prediction research. The classification method gets a share of 35.80% from related studies, the estimation method is 56.79%, data analytics is 4.94%, the rest is clustering and association is 1.23%. Furthermore, the use of the technical indicator data set is 74.07%, the rest are combinations of datasets. To develop a stock prediction model 48 different methods have been applied, 9 of the most widely applied methods were identified. The best method in terms of accuracy and also small error rate such as SVM, DNN, CNN, RNN, LSTM, bagging ensembles such as RF, boosting ensembles such as XGBoost, ensemble majority vote and the meta-learner approach is ensemble Stacking. Several techniques are proposed to improve prediction accuracy by combining several methods, using boosting algorithms, adding feature selection and using parameter and hyper-parameter optimization
From metaheuristics to learnheuristics: Applications to logistics, finance, and computing
Un gran nombre de processos de presa de decisions en sectors estratègics com el transport i la producció representen problemes NP-difícils. Sovint, aquests processos es caracteritzen per alts nivells d'incertesa i dinamisme. Les metaheurístiques són mètodes populars per a resoldre problemes d'optimització difícils en temps de càlcul raonables. No obstant això, sovint assumeixen que els inputs, les funcions objectiu, i les restriccions són deterministes i conegudes. Aquests constitueixen supòsits forts que obliguen a treballar amb problemes simplificats. Com a conseqüència, les solucions poden conduir a resultats pobres. Les simheurístiques integren la simulació a les metaheurístiques per resoldre problemes estocàstics d'una manera natural. Anàlogament, les learnheurístiques combinen l'estadística amb les metaheurístiques per fer front a problemes en entorns dinàmics, en què els inputs poden dependre de l'estructura de la solució. En aquest context, les principals contribucions d'aquesta tesi són: el disseny de les learnheurístiques, una classificació dels treballs que combinen l'estadística / l'aprenentatge automàtic i les metaheurístiques, i diverses aplicacions en transport, producció, finances i computació.Un gran número de procesos de toma de decisiones en sectores estratégicos como el transporte y la producción representan problemas NP-difíciles. Frecuentemente, estos problemas se caracterizan por altos niveles de incertidumbre y dinamismo. Las metaheurísticas son métodos populares para resolver problemas difíciles de optimización de manera rápida. Sin embargo, suelen asumir que los inputs, las funciones objetivo y las restricciones son deterministas y se conocen de antemano. Estas fuertes suposiciones conducen a trabajar con problemas simplificados. Como consecuencia, las soluciones obtenidas pueden tener un pobre rendimiento. Las simheurísticas integran simulación en metaheurísticas para resolver problemas estocásticos de una manera natural. De manera similar, las learnheurísticas combinan aprendizaje estadístico y metaheurísticas para abordar problemas en entornos dinámicos, donde los inputs pueden depender de la estructura de la solución. En este contexto, las principales aportaciones de esta tesis son: el diseño de las learnheurísticas, una clasificación de trabajos que combinan estadística / aprendizaje automático y metaheurísticas, y varias aplicaciones en transporte, producción, finanzas y computación.A large number of decision-making processes in strategic sectors such as transport and production involve NP-hard problems, which are frequently characterized by high levels of uncertainty and dynamism. Metaheuristics have become the predominant method for solving challenging optimization problems in reasonable computing times. However, they frequently assume that inputs, objective functions and constraints are deterministic and known in advance. These strong assumptions lead to work on oversimplified problems, and the solutions may demonstrate poor performance when implemented. Simheuristics, in turn, integrate simulation into metaheuristics as a way to naturally solve stochastic problems, and, in a similar fashion, learnheuristics combine statistical learning and metaheuristics to tackle problems in dynamic environments, where inputs may depend on the structure of the solution. The main contributions of this thesis include (i) a design for learnheuristics; (ii) a classification of works that hybridize statistical and machine learning and metaheuristics; and (iii) several applications for the fields of transport, production, finance and computing
Fuzzy time series analysis and prediction using swarm optimized hybrid model.
Time series forecasting has an extensive trajectory record in the fields of business, economics, energy, population dynamics, tourism, etc. where factor models, neural network models, Bayesian models are exceedingly applied for effective prediction. It has been exemplified in numerous forecasting surveys that finding an individual forecasting model to achieve the best performances for all potential situations is inadequate. Moreover, modern research endeavour has focused on a deeper understanding of the grounds. Rather than aim for designing a single superior model, it focused on the forecasting methods that are effective under certain situations. For instance, due to the qualitative nature of forecasting, a business can come up with diverse scenarios depending on the interpretation of data. Therefore, the organizations never rely on any individual forecasting model solely, rather focused on sets of individual models to attain the best possible knowledge of the future. The time series forecasting model has a great impact in terms of prediction. Many forecasting models related to fuzzy time series were proposed in the past decades. These models were widely applied to various problem domains, especially in dealing with forecasting problems where historical data are linguistic values. A hybrid forecasting method can be effective to improve forecast accuracy by merging sets of the individual forecasting models. Numerous hybrid forecasting models have been proposed last couple of years that combined fuzzy time series with the evolutionary algorithms, but the performance of the models is not quite satisfactory. In this research, a novel hybrid fuzzy time series forecasting model is proposed that used the historical data as the universe of discourse and the automatic clustering algorithm to cluster the universe of discourse by adjusting the clusters into intervals. Furthermore, the particle swarm optimization algorithm is also examined to improve forecasted accuracy. The proposed method is considered to forecast student enrolment of the University of Alabama. The model achieves a significant improvement in forecast accuracy as compared to state-of-the-art hybrid fuzzy time series forecasting models. It is obvious from the literature that no forecasting technique is appropriate for all situations. There is substantial evidence to demonstrate that combining individual forecasts produces gains in forecasting accuracy. The addition of quantitative forecasts to qualitative forecasts may reduce forecast accuracy. Individual forecasts are combined based on either the simple arithmetic average method or an artificial neural network. Research has not yet revealed the conditions for the optimal forecast combinations. This thesis provides a few contributions to enhance the existing combination model. A set of Individual forecasting models is used to form a novel combination forecasting model based on the characteristics of resulting forecasts. All methods derived in this thesis are thoroughly tested on several standard datasets. The related characteristics of the resulting forecasts are observed to have different error decompositions both for hybrid and combination forecasting model. Advanced combination structures are investigated to take advantage of the knowledge of the forecast generation processes
Recommended from our members
Computational intelligence techniques in asset risk analysis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The problem of asset risk analysis is positioned within the computational intelligence paradigm. We suggest an algorithm for reformulating asset pricing, which involves incorporating imprecise information into the pricing factors through fuzzy variables as well as a calibration procedure for their possibility distributions. Then fuzzy mathematics is used to process the imprecise factors and obtain an asset evaluation. This evaluation is further automated using neural networks with sign restrictions on their weights. While such type of networks has been only used for up to two network inputs and hypothetical data, here we apply thirty-six inputs and empirical data. To achieve successful training, we modify the Levenberg-Marquart backpropagation algorithm. The intermediate result achieved is that the fuzzy asset evaluation inherits features of the factor imprecision and provides the basis for risk analysis. Next, we formulate a risk measure and a risk robustness measure based on the fuzzy asset evaluation under different characteristics of the pricing factors as well as different calibrations. Our database, extracted from DataStream, includes thirty-five companies traded on the London Stock Exchange. For each company, the risk and robustness measures are evaluated and an asset risk analysis is carried out through these values, indicating the implications they have on company performance. A comparative company risk analysis is also provided. Then, we employ both risk measures to formulate a two-step asset ranking method. The assets are initially rated according to the investors' risk preference. In addition, an algorithm is suggested to incorporate the asset robustness information and refine further the ranking benefiting market analysts. The rationale provided by the ranking technique serves as a point of departure in designing an asset risk classifier. We identify the fuzzy neural network structure of the classifier and develop an evolutionary training algorithm. The algorithm starts with suggesting preliminary heuristics in constructing a sufficient training set of assets with various characteristics revealed by the values of the pricing factors and the asset risk values. Then, the training algorithm works at two levels, the inner level targets weight optimization, while the outer level efficiently guides the exploration of the search space. The latter is achieved by automatically decomposing the training set into subsets of decreasing complexity and then incrementing backward the corresponding subpopulations of partially trained networks. The empirical results prove that the developed algorithm is capable of training the identified fuzzy network structure. This is a problem of such complexity that prevents single-level evolution from attaining meaningful results. The final outcome is an automatic asset classifier, based on the investors’ perceptions of acceptable risk. All the steps described above constitute our approach to reformulating asset risk analysis within the approximate reasoning framework through the fusion of various computational intelligence techniques
- …