2,295 research outputs found
Identification des régimes et regroupement des séquences pour la prévision des marchés financiers
Abstract : Regime switching analysis is extensively advocated to capture complex behaviors
underlying financial time series for market prediction. Two main disadvantages in
current approaches of regime identification are raised in the literature: 1) the lack of
a mechanism for identifying regimes dynamically, restricting them to switching among
a fixed set of regimes with a static transition probability matrix; 2) failure to utilize
cross-sectional regime dependencies among time series, since not all the time series are
synchronized to the same regime. As the numerical time series can be symbolized into
categorical sequences, a third issue raises: 3) the lack of a meaningful and effective
measure of the similarity between chronological dependent categorical values, in order
to identify sequence clusters that could serve as regimes for market forecasting. In this
thesis, we propose a dynamic regime identification model that can identify regimes
dynamically with a time-varying transition probability, to address the first issue. For
the second issue, we propose a cluster-based regime identification model to account
for the cross-sectional regime dependencies underlying financial time series for market
forecasting. For the last issue, we develop a dynamic order Markov model, making
use of information underlying frequent consecutive patterns and sparse patterns, to
identify the clusters that could serve as regimes identified on categorized financial time
series. Experiments on synthetic and real-world datasets show that our two regime
models show good performance on both regime identification and forecasting, while
our dynamic order Markov clustering model also demonstrates good performance on
identifying clusters from categorical sequences.L'analyse de changement de régime est largement préconisée pour capturer les comportements complexes sous-jacents aux séries chronologiques financières pour la prédiction du marché. Deux principaux problèmes des approches actuelles d'identifica-tion de régime sont soulevés dans la littérature. Il s’agit de: 1) l'absence d'un mécanisme d'identification dynamique des régimes. Ceci limite la commutation entre un ensemble fixe de régimes avec une matrice de probabilité de transition statique; 2) l’incapacité à utiliser les dépendances transversales des régimes entre les séries chronologiques, car toutes les séries chronologiques ne sont pas synchronisées sur le même régime. Étant donné que les séries temporelles numériques peuvent être symbolisées en séquences catégorielles, un troisième problème se pose: 3) l'absence d'une mesure significative et efficace de la similarité entre les séries chronologiques dépendant des valeurs catégorielles pour identifier les clusters de séquences qui pourraient servir de régimes de prévision du marché. Dans cette thèse, nous proposons un modèle d'identification de régime dynamique qui identifie dynamiquement des régimes avec une probabilité de transition variable dans le temps afin de répondre au premier problème. Ensuite, pour adresser le deuxième problème, nous proposons un modèle d'identification de régime basé sur les clusters. Notre modèle considère les dépendances transversales des régimes sous-jacents aux séries chronologiques financières avant d’effectuer la prévision du marché. Pour terminer, nous abordons le troisième problème en développant un modèle de Markov d'ordre dynamique, en utilisant les informations sous-jacentes aux motifs consécutifs fréquents et aux motifs clairsemés, pour identifier les clusters qui peuvent servir de régimes identifiés sur des séries chronologiques financières catégorisées. Nous avons mené des expériences sur des ensembles de données synthétiques et du monde réel. Nous démontrons que nos deux modèles de régime présentent de bonnes performances à la fois en termes d'identification et de prévision de régime, et notre modèle de clustering de Markov d'ordre dynamique produit également de bonnes performances dans l'identification de clusters à partir de séquences catégorielles
NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA
Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002
Predicting the Future
Due to the increased capabilities of microprocessors and the advent of graphics processing units (GPUs) in recent decades, the use of machine learning methodologies has become popular in many fields of science and technology. This fact, together with the availability of large amounts of information, has meant that machine learning and Big Data have an important presence in the field of Energy. This Special Issue entitled “Predicting the Future—Big Data and Machine Learning” is focused on applications of machine learning methodologies in the field of energy. Topics include but are not limited to the following: big data architectures of power supply systems, energy-saving and efficiency models, environmental effects of energy consumption, prediction of occupational health and safety outcomes in the energy industry, price forecast prediction of raw materials, and energy management of smart buildings
Peramalan Status Siaga Banjir Berdasarkan Data Curah Hujan (Arr) Dan Tinggi Muka Air (Awlr) Menggunakan Metode Fuzzy Time Series (Studi Kasus: Perum Jasa Tirta I)
Banjir merupakan keadaan dimana aliran air lebih tinggi dari keadaan normal
muka air sehingga menggenangi daerah disekitarnya. Gelombang banjir mengalir
dari hulu ke hilir dan berinteraksi dengan meningkatnya kapasitas air muara. Banjir
bisa terjadi karena curah hujan yang tinggi, luapan dari sungai, faktor hancurnya
retensi Daerah Aliran Sungai (DAS). Dari hal tersebut, diperlukan sistem yang
dapat melakukan peramalan untuk memudahkan dalam menganalisa status siaga
banjir di masa mendatang. Metode regresi yang digunakan dalam penelitian ini
yaitu Fuzzy Time Series. Metode FTS merupakan sebuah model yang biasanya
digunakan untuk melakukan peramalan data berdasarkan urutan waktu.
Penelitian ini memiliki tujuan untuk melakukan peramalan siaga banjir di Stasiun
Kambing pada DAS Brantas. Hasil dari pengujian menunjukkan peramalan siaga
banjir pada pada data tinggi muka air (AWLR) yaitu pada bulan Desember 2016
didapatkan nilai error (RMSE) sebesar 2.89 dan data curah hujan (ARR) pada bulan
Februari 2015 didapatkan nilai error (RMSE) sebesar 16.04. Kedua data tersebut
menghasilkan peramalan siaga banjir berupa Siaga Normal
Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
Recommended from our members
Identification and prediction of abnormal behaviour activities of daily living in intelligent environments
The aim of this research is to investigate efficient mining of useful information from a sensor network forming an Ambient Intelligence (AmI) environment. In this thesis, we investigate methods for supporting independent living of the elderly (and specifically patients who are suffering from dementia) by means of equipping their home with a simple sensor network to monitor their behaviour and identify their Activities of Daily Living (ADL). Dementia is considered to be one of the most important causes of disability in the elderly. Mostpatients would prefer to use non-intrusive technology to help them tomaintain their independence. Such monitoring and prediction would allow the caregiver to see any trend in the behaviour of the elderly person and to be informed of any abnormal behaviour
- …