4 research outputs found
Incremental Market Behavior Classification in Presence of Recurring Concepts
In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or
slow to adapt to these changes. Ensemble-based systems are widely known for their good results
predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF
(Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles
recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest
for evolving data streams, adding on top a mechanism to store and handle a shared collection of
inactive trees, called concept history, which holds memories of the way market operators reacted
in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by
replacing active trees with the best available alternative: either a previously stored tree from the
concept history or a newly trained background tree. Both mechanisms are designed to provide fast
reaction times and are thus applicable to high-frequency data. The experimental validation of the
algorithm is based on the prediction of price movement directions one second ahead in the SPDR
(Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked
against other popular methods from the incremental online machine learning literature and is able to
achieve competitive results.This research was funded by the Spanish Ministry of Economy and Competitiveness under grant
number ENE2014-56126-C2-2-R
Efficient adaptive query processing on large database systems available in the cloud environment
Tese de Doutoramento em InformáticaNowadays, many companies are migrating their applications and data to cloud service
providers, mainly because of their ability to answer quickly to business requirements.
Thereby, the performance is an important requirement for most customers when they
wish to migrate their applications to the cloud.
Therefore, in cloud environments, resources should be acquired and released
automatically and quickly at runtime. Moreover, the users and service providers expect
to get answers in time to ensure the service SLA (Service Level Agreement).
Consequently, ensuring the QoS (Quality of Service) is a great challenge and it
increases when we have large amounts of data to be manipulated in this environment.
To resolve this kind of problems, several researches have been focused on shorter
execution time using adaptive query processing and/or prediction of resources based
on current system status. However, they present important limitations. For example,
most of these works does not use monitoring during query execution and/or presents
intrusive solutions, i.e. applied to the particular context.
The aim of this thesis is the development of new solutions/strategies to efficient
adaptive query processing on large databases available in a cloud environment. It must
integrate adaptive re-optimization at query runtime and their costs are based on the
SRT (Service Response Time – SLA QoS performance parameter). Finally, the proposed
solution will be evaluated on large scale with large volume of data, machines and
queries in a cloud computing infrastructure.
Finally, this work also proposes a new model to estimate the SRT for different request
types (database access requests). This model will allow the cloud service provider and
its customers to establish an appropriate SLA relative to the expected performance of
the services available in the cloud.Atualmente, muitas companhias têm migrado suas aplicações e dados para
fornecedores de serviços em nuvem, pois um dos principais benefÃcios dessa
tecnologia é a capacidade de responder rapidamente às necessidades do negócio.
Assim, o desempenho é um dos mais importantes requisitos para a maioria dos
clientes que desejam migrar suas aplicações para a nuvem.
Em ambiente de nuvem, os recursos devem ser adquiridos e libertados
automaticamente e rapidamente em tempo de execução. Além disso, os utilizadores e
fornecedores de serviços esperam sempre garantir o contrato SLA (Acordo de NÃvel de
Serviço). Consequentemente, garantir o QoS (Qualidade de Serviço) é um grande
desafio, que se torna mais complexo quando existe uma grande quantidade de dados a
serem manipulados neste ambiente.
Para resolver estes tipos de problemas, diversas pesquisas têm sido realizadas
focando o menor tempo de execução dos pedidos do utilizador na nuvem usando
técnicas de processamento adaptativo de consultas e/ou utilizando técnicas de
predição de recursos baseados no estado atual do sistema. Contudo, esses trabalhos
apresentam limitações importantes. Por exemplo, a maioria desses trabalhos não
utiliza monitorazação durante a execução da consulta e/ou apresenta soluções
intrusivas, isto é, aplicadas a um contexto particular.
Portanto, o objetivo desta tese consiste no desenvolvimento de uma nova
solução/estratégia para o processamento eficiente (adaptativo) de consultas sobre
grandes bases de dados disponÃveis em ambiente de nuvem. Ela irá integrar técnicas
de otimização adaptativas em tempo de execução da consulta e seus custos são
baseados no SRT (Tempo de Resposta do Serviço – parâmetro QoS de desempenho do
SLA). A solução proposta será avaliada em larga escala utilizando uma grande base de
dados, máquinas e consultas em um ambiente real de computação na nuvem.
Finalmente, este trabalho também propõe um novo modelo para estimar o SRT para
diferentes tipos de pedidos (pedidos de acesso a banco de dados). Este modelo
permitirá que um fornecedor de serviços em nuvem e seus clientes possam
estabelecer um contrato SLA adequado, relativo ao desempenho esperado dos serviços
disponÃveis em nuvem
Adaptive Algorithms For Classification On High-Frequency Data Streams: Application To Finance
Mención Internacional en el tÃtulo de doctorIn recent years, the problem of concept drift has gained importance in the financial
domain. The succession of manias, panics and crashes have stressed the nonstationary
nature and the likelihood of drastic structural changes in financial markets.
The most recent literature suggests the use of conventional machine learning and statistical
approaches for this. However, these techniques are unable or slow to adapt
to non-stationarities and may require re-training over time, which is computationally
expensive and brings financial risks.
This thesis proposes a set of adaptive algorithms to deal with high-frequency data
streams and applies these to the financial domain. We present approaches to handle
different types of concept drifts and perform predictions using up-to-date models.
These mechanisms are designed to provide fast reaction times and are thus applicable
to high-frequency data. The core experiments of this thesis are based on the prediction
of the price movement direction at different intraday resolutions in the SPDR S&P 500
exchange-traded fund. The proposed algorithms are benchmarked against other popular
methods from the data stream mining literature and achieve competitive results.
We believe that this thesis opens good research prospects for financial forecasting
during market instability and structural breaks. Results have shown that our proposed
methods can improve prediction accuracy in many of these scenarios. Indeed, the
results obtained are compatible with ideas against the efficient market hypothesis.
However, we cannot claim that we can beat consistently buy and hold; therefore, we
cannot reject it.Programa de Doctorado en Ciencia y TecnologÃa Informática por la Universidad Carlos III de MadridPresidente: Gustavo Recio Isasi.- Secretario: Pedro Isasi Viñuela.- Vocal: Sandra GarcÃa RodrÃgue