54 research outputs found

    Statistical models for time sequences data mining

    Get PDF
    In this paper, we present an adaptive modelling technique for studying past behaviors of objects and predicting the near future events. Our approach is to define a sliding window (of different window sizes) over a time sequence and build autoregression models from subsequences in different windows. The models are representations of past behaviors of the sequence objects. We can use the AR coefficients as features to index subsequences to facilitate the query of subsequences with similar behaviors. We can use a clustering algorithm to group time sequences on their similarity in the feature space. We can also use the AR models for prediction within different windows. Our experiments show that the adaptive model can give better prediction than non-adaptive models.published_or_final_versio

    Using Text Mining to Analyze Quality Aspects of Unstructured Data: A Case Study for “stock-touting” Spam Emails

    Get PDF
    The growth in the utilization of text mining tools and techniques in the last decade has been primarily driven by the increase in the sheer volume of unstructured texts and the need to extract useful and more importantly, quality information from them. The impetus to analyse unstructured data efficiently and effectively as part of the decision making processes within an organization has further motivated the need to better understand how to use text mining tools and techniques. This paper describes a case study of a stock spam e-mail architecture that demonstrates the process of refining linguistic resources to extract relevant, high quality information including stock profile, financial key words, stock and company news (positive/negative), and compound phrases from stock spam e-mails. The context of such a study is to identify high quality information patterns that can be used to support relevant authorities in detecting and analyzing fraudulent activities

    Explorations in Evolutionary Design of Online Auction Market Mechanisms

    No full text
    This paper describes the use of a genetic algorithm (GA) to find optimal parameter-values for trading agents that operate in virtual online auction “e-marketplaces”, where the rules of those marketplaces are also under simultaneous control of the GA. The aim is to use the GA to automatically design new mechanisms for agent-based e-marketplaces that are more efficient than online markets designed by (or populated by) humans. The space of possible auction-types explored by the GA includes the Continuous Double Auction (CDA) mechanism (as used in most of the world’s financial exchanges), and also two purely one-sided mechanisms. Surprisingly, the GA did not always settle on the CDA as an optimum. Instead, novel hybrid auction mechanisms were evolved, which are unlike any existing market mechanisms. In this paper we show that, when the market supply and demand schedules undergo sudden “shock” changes partway through the evaluation process, two-sided hybrid market mechanisms can evolve which may be unlike any human-designed auction and yet may also be significantly more efficient than any human designed market mechanism

    Short-term Overreaction in American Depository Receipts

    Get PDF
    In this paper we examine for the first time the short-term predictability of American Depository Receipts (ADRs) in reaction to extreme price movements. Based on an analysis of 2,911 extreme price movements that took place within either normal trading hours or after-hours in the period 2001-2019, we conclude that those extreme returns were on average followed by significant reversals. This response represents an overreaction in prices, which challenges the weak version of the efficient market hypothesis. Price reversals are especially pronounced following extreme returns observed during after-hours, which lends support to the assertion that ADR markets are particularly inefficient during this trading period. These findings carry important implications for both market practitioners and regulators

    A contribution to exchange rate forecasting based on machine learning techniques

    Get PDF
    El propòsit d'aquesta tesi és examinar les aportacions a l'estudi de la predicció de la taxa de canvi basada en l'ús de tècniques d'aprenentatge automàtic. Aquestes aportacions es veuen facilitades i millorades per l'ús de variables econòmiques, indicadors tècnics i variables de tipus ‘business and consumer survey’. Aquesta investigació s’organitza entorn d’una recopilació de quatre articles. L'objectiu de cadascun dels quatre treballs de recerca d'aquesta tesi és el de contribuir a l'avanç del coneixement sobre els efectes i mecanismes mitjançant els quals l'ús de variables econòmiques, indicadors tècnics, variables de tipus ‘business and consumer survey’, i la selecció dels paràmetres de models predictius són capaços de millorar les prediccions de la taxa de canvi. Fent ús d'una tècnica de predicció no lineal, el primer article d'aquesta tesi es centra majoritàriament en l'impacte que tenen l'ús de variables econòmiques i la selecció dels paràmetres dels models en les prediccions de la taxa de canvi per a dos països. L'últim experiment d'aquest primer article fa ús de la taxa de canvi del període anterior i d'indicadors econòmics com a variables d'entrada en els models predictius. El segon article d'aquesta tesi analitza com la combinació de mitjanes mòbils, variables de tipus ‘business and consumer survey’ i la selecció dels paràmetres dels models milloren les prediccions del canvi per a dos països. A diferència del primer article, aquest segon treball de recerca afegeix mitjanes mòbils i variables de tipus ‘business and consumer survey’ com a variables d'entrada en els models predictius, i descarta l'ús de variables econòmiques. Un dels objectius d'aquest segon article és determinar el possible impacte de les variables de tipus ‘business and consumer survey’ en les taxes de canvi. El tercer article d'aquesta tesi té els mateixos objectius que el segon, però amb l'excepció que l'anàlisi abasta les taxes de canvi de set països. El quart article de la tesi compta amb els mateixos objectius que l'article anterior, però amb la diferència que fa ús d'un sol indicador tècnic. En general, l'enfocament d'aquesta tesi pretén examinar diferents alternatives per a millorar les prediccions del tipus de canvi a través de l'ús de màquines de suport vectorial. Una combinació de variables i la selecció dels paràmetres dels models predictius ajudaran a aconseguir aquest propòsit.El propósito de esta tesis es examinar las aportaciones al estudio de la predicción de la tasa de cambio basada en el uso de técnicas de aprendizaje automático. Dichas aportaciones se ven facilitadas y mejoradas por el uso de variables económicas, indicadores técnicos y variables de tipo ‘business and consumer survey’. Esta investigación está organizada en un compendio de cuatro artículos. El objetivo de cada uno de los cuatro trabajos de investigación de esta tesis es el de contribuir al avance del conocimiento sobre los efectos y mecanismos mediante los cuales el uso de variables económicas, indicadores técnicos, variables de tipo ‘business and consumer survey’, y la selección de los parámetros de modelos predictivos son capaces de mejorar las predicciones de la tasa de cambio. Haciendo uso de una técnica de predicción no lineal, el primer artículo de esta tesis se centra mayoritariamente en el impacto que tienen el uso de variables económicas y la selección de los parámetros de los modelos en las predicciones de la tasa de cambio para dos países. El último experimento de este primer artículo hace uso de la tasa de cambio del periodo anterior y de indicadores económicos como variables de entrada en los modelos predictivos. El segundo artículo de esta tesis analiza cómo la combinación de medias móviles, variables de tipo ‘business and consumer survey’ y la selección de los parámetros de los modelos mejoran las predicciones del cambio para dos países. A diferencia del primer artículo, este segundo trabajo de investigación añade medias móviles y variables de tipo ‘business and consumer survey’ como variables de entrada en los modelos predictivos, y descarta el uso de variables económicas. Uno de los objetivos de este segundo artículo es determinar el posible impacto de las variables de tipo ‘business and consumer survey’ en las tasas de cambio. El tercer artículo de esta tesis tiene los mismos objetivos que el segundo, pero con la salvedad de que el análisis abarca las tasas de cambio de siete países. El cuarto artículo de esta tesis cuenta con los mismos objetivos que el artículo anterior, pero con la diferencia de que hace uso de un solo indicador técnico. En general, el enfoque de esta tesis pretende examinar diferentes alternativas para mejorar las predicciones del tipo de cambio a través del uso de máquinas de soporte vectorial. Una combinación de variables y la selección de los parámetros de los modelos predictivos ayudarán a conseguir este propósito.The purpose of this thesis is to examine the contribution made by machine learning techniques on exchange rate forecasting. Such contributions are facilitated and enhanced by the use of fundamental economic variables, technical indicators and business and consumer survey variables as inputs in the forecasting models selected. This research has been organized in a compendium of four articles. The aim of each of these four articles is to contribute to advance our knowledge on the effects and means by which the use of fundamental economic variables, technical indicators, business and consumer surveys, and a model’s free-parameters selection is capable of improving exchange rate predictions. Through the use of a non-linear forecasting technique, one research paper examines the effect of fundamental economic variables and a model’s parameters selection on exchange rate forecasts, whereas the other three articles concentrate on the effect of technical indicators, a model’s parameters selection and business and consumer surveys variables on exchange rate forecasting. The first paper of this thesis has the objective of examining fundamental economic variables and a forecasting model’s parameters in an effort to understand the possible advantages or disadvantages these variables may bring to the exchange rate predictions in terms of forecasting performance and accuracy. The second paper of this thesis analyses how the combination of moving averages, business and consumer surveys and a forecasting model’s parameters improves exchange rate predictions. Compared to the first paper, this second paper adds moving averages and business and consumer surveys variables as inputs to the forecasting model, and disregards the use of fundamental economic variables. One of the goals of this paper is to determine the possible effects of business and consumer surveys on exchange rates. The third paper of this thesis has the same objectives as the second paper, but its analysis is expanded by taking into account the exchange rates of 7 countries. The fourth paper in this thesis takes a similar approach as the second and third papers, but makes use of a single technical indicator. In general, this thesis focuses on the improvement of exchange rate predictions through the use of support vector machines. A combination of variables and a model’s parameters selection enhances the way to achieve this purpose

    Comparative Analysis of Different Distributions Dataset by Using Data Mining Techniques on Credit Card Fraud Detection

    Get PDF
    Banks suffer multimillion-dollars losses each year for several reasons, the most important of which is due to credit card fraud. The issue is how to cope with the challenges we face with this kind of fraud. Skewed "class imbalance" is a very important challenge that faces this kind of fraud. Therefore, in this study, we explore four data mining techniques, namely naïve Bayesian (NB),Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Random Forest (RF), on actual credit card transactions from European cardholders. This paper offers four major contributions. First, we used under-sampling to balance the dataset because of the high imbalance class, implying skewed distribution. Second, we applied NB, SVM, KNN, and RF to under-sampled class to classify the transactions into fraudulent and genuine followed by testing the performance measures using a confusion matrix and comparing them. Third, we adopted cross-validation (CV) with 10 folds to test the accuracy of the four models with a standard deviation followed by comparing the results for all our models. Next, we examined these models against the entire dataset (skewed) using the confusion matrix and AUC (Area Under the ROC Curve) ranking measure to conclude the final results to determine which would be the best model for us to use with a particular type of fraud. The results showing the best accuracy for the NB, SVM, KNN and RF classifiers are 97,80%; 97,46%; 98,16% and 98,23%, respectively. The comparative results have been done by using four-division datasets (75:25), (90:10), (66:34) and (80:20) displayed that the RF performs better than NB, SVM, and KNN, and the results when utilizing our proposed models on the entire dataset (skewed), achieved preferable outcomes to the under-sampled dataset

    Sovereign Debt and Currency Crises Prediction Models Using Machine Learning Techniques.

    Get PDF
    This research was funded by Cátedra de Economía y Finanzas Sostenibles, Universidad de Málaga, Spain. Partial funding for open access charge: Universidad de MálagaSovereign debt and currencies play an increasingly influential role in the development of any country, given the need to obtain financing and establish international relations. A recurring theme in the literature on financial crises has been the prediction of sovereign debt and currency crises due to their extreme importance in international economic activity. Nevertheless, the limitations of the existing models are related to accuracy and the literature calls for more investigation on the subject and lacks geographic diversity in the samples used. This article presents new models for the prediction of sovereign debt and currency crises, using various computational techniques, which increase their precision. Also, these models present experiences with a wide global sample of the main geographical world zones, such as Africa and the Middle East, Latin America, Asia, Europe, and globally. Our models demonstrate the superiority of computational techniques concerning statistics in terms of the level of precision, which are the best methods for the sovereign debt crisis: fuzzy decision trees, AdaBoost, extreme gradient boosting, and deep learning neural decision trees, and for forecasting the currency crisis: deep learning neural decision trees, extreme gradient boosting, random forests, and deep belief network. Our research has a large and potentially significant impact on the macroeconomic policy adequacy of the countries against the risks arising from financial crises and provides instruments that make it possible to improve the balance in the finance of the countries

    Do artificial neural networks provide improved volatility forecasts:Evidence from Asian markets

    Get PDF
    This paper enters the ongoing volatility forecasting debate by examining the ability of a wide range of Machine Learning methods (ML), and specifically Artificial Neural Network (ANN) models. The ANN models are compared against traditional econometric models for ten Asian markets using daily data for the time period from 12 September 1994 to 05 March 2018. The empirical results indicate that ML algorithms, across the range of countries, can better approximate dependencies compared to traditional benchmark models. Notably, the predictive performance of such deep learning models is superior perhaps due to its ability in capturing long-range dependencies. For example, the Neuro Fuzzy models of ANFIS and CANFIS, which outperform the EGARCH model, are more flexible in modelling both asymmetry and long memory properties. This offers new insights for Asian markets. In addition to standard statistics forecast metrics, we also consider risk management measures including the value-at-risk (VaR) average failure rate, the Kupiec LR test, the Christoffersen independence test, the expected shortfall (ES) and the dynamic quantile test. The study concludes that ML algorithms provide improving volatility forecasts in the stock markets of Asia and suggest that this may be a fruitful approach for risk management.</p

    Do artificial neural networks provide improved volatility forecasts:Evidence from Asian markets

    Get PDF
    This paper enters the ongoing volatility forecasting debate by examining the ability of a wide range of Machine Learning methods (ML), and specifically Artificial Neural Network (ANN) models. The ANN models are compared against traditional econometric models for ten Asian markets using daily data for the time period from 12 September 1994 to 05 March 2018. The empirical results indicate that ML algorithms, across the range of countries, can better approximate dependencies compared to traditional benchmark models. Notably, the predictive performance of such deep learning models is superior perhaps due to its ability in capturing long-range dependencies. For example, the Neuro Fuzzy models of ANFIS and CANFIS, which outperform the EGARCH model, are more flexible in modelling both asymmetry and long memory properties. This offers new insights for Asian markets. In addition to standard statistics forecast metrics, we also consider risk management measures including the value-at-risk (VaR) average failure rate, the Kupiec LR test, the Christoffersen independence test, the expected shortfall (ES) and the dynamic quantile test. The study concludes that ML algorithms provide improving volatility forecasts in the stock markets of Asia and suggest that this may be a fruitful approach for risk management.</p
    corecore