4 research outputs found

    Incremental Market Behavior Classification in Presence of Recurring Concepts

    Get PDF
    In recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the non-stationary nature and the likelihood of drastic structural or concept changes in the markets. Traditional systems are unable or slow to adapt to these changes. Ensemble-based systems are widely known for their good results predicting both cyclic and non-stationary data such as stock prices. In this work, we propose RCARF (Recurring Concepts Adaptive Random Forests), an ensemble tree-based online classifier that handles recurring concepts explicitly. The algorithm extends the capabilities of a version of Random Forest for evolving data streams, adding on top a mechanism to store and handle a shared collection of inactive trees, called concept history, which holds memories of the way market operators reacted in similar circumstances. This works in conjunction with a decision strategy that reacts to drift by replacing active trees with the best available alternative: either a previously stored tree from the concept history or a newly trained background tree. Both mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The experimental validation of the algorithm is based on the prediction of price movement directions one second ahead in the SPDR (Standard & Poor's Depositary Receipts) S&P 500 Exchange-Traded Fund. RCARF is benchmarked against other popular methods from the incremental online machine learning literature and is able to achieve competitive results.This research was funded by the Spanish Ministry of Economy and Competitiveness under grant number ENE2014-56126-C2-2-R

    Efficient adaptive query processing on large database systems available in the cloud environment

    Get PDF
    Tese de Doutoramento em InformáticaNowadays, many companies are migrating their applications and data to cloud service providers, mainly because of their ability to answer quickly to business requirements. Thereby, the performance is an important requirement for most customers when they wish to migrate their applications to the cloud. Therefore, in cloud environments, resources should be acquired and released automatically and quickly at runtime. Moreover, the users and service providers expect to get answers in time to ensure the service SLA (Service Level Agreement). Consequently, ensuring the QoS (Quality of Service) is a great challenge and it increases when we have large amounts of data to be manipulated in this environment. To resolve this kind of problems, several researches have been focused on shorter execution time using adaptive query processing and/or prediction of resources based on current system status. However, they present important limitations. For example, most of these works does not use monitoring during query execution and/or presents intrusive solutions, i.e. applied to the particular context. The aim of this thesis is the development of new solutions/strategies to efficient adaptive query processing on large databases available in a cloud environment. It must integrate adaptive re-optimization at query runtime and their costs are based on the SRT (Service Response Time – SLA QoS performance parameter). Finally, the proposed solution will be evaluated on large scale with large volume of data, machines and queries in a cloud computing infrastructure. Finally, this work also proposes a new model to estimate the SRT for different request types (database access requests). This model will allow the cloud service provider and its customers to establish an appropriate SLA relative to the expected performance of the services available in the cloud.Atualmente, muitas companhias têm migrado suas aplicações e dados para fornecedores de serviços em nuvem, pois um dos principais benefícios dessa tecnologia é a capacidade de responder rapidamente às necessidades do negócio. Assim, o desempenho é um dos mais importantes requisitos para a maioria dos clientes que desejam migrar suas aplicações para a nuvem. Em ambiente de nuvem, os recursos devem ser adquiridos e libertados automaticamente e rapidamente em tempo de execução. Além disso, os utilizadores e fornecedores de serviços esperam sempre garantir o contrato SLA (Acordo de Nível de Serviço). Consequentemente, garantir o QoS (Qualidade de Serviço) é um grande desafio, que se torna mais complexo quando existe uma grande quantidade de dados a serem manipulados neste ambiente. Para resolver estes tipos de problemas, diversas pesquisas têm sido realizadas focando o menor tempo de execução dos pedidos do utilizador na nuvem usando técnicas de processamento adaptativo de consultas e/ou utilizando técnicas de predição de recursos baseados no estado atual do sistema. Contudo, esses trabalhos apresentam limitações importantes. Por exemplo, a maioria desses trabalhos não utiliza monitorazação durante a execução da consulta e/ou apresenta soluções intrusivas, isto é, aplicadas a um contexto particular. Portanto, o objetivo desta tese consiste no desenvolvimento de uma nova solução/estratégia para o processamento eficiente (adaptativo) de consultas sobre grandes bases de dados disponíveis em ambiente de nuvem. Ela irá integrar técnicas de otimização adaptativas em tempo de execução da consulta e seus custos são baseados no SRT (Tempo de Resposta do Serviço – parâmetro QoS de desempenho do SLA). A solução proposta será avaliada em larga escala utilizando uma grande base de dados, máquinas e consultas em um ambiente real de computação na nuvem. Finalmente, este trabalho também propõe um novo modelo para estimar o SRT para diferentes tipos de pedidos (pedidos de acesso a banco de dados). Este modelo permitirá que um fornecedor de serviços em nuvem e seus clientes possam estabelecer um contrato SLA adequado, relativo ao desempenho esperado dos serviços disponíveis em nuvem

    Adaptive Algorithms For Classification On High-Frequency Data Streams: Application To Finance

    Get PDF
    Mención Internacional en el título de doctorIn recent years, the problem of concept drift has gained importance in the financial domain. The succession of manias, panics and crashes have stressed the nonstationary nature and the likelihood of drastic structural changes in financial markets. The most recent literature suggests the use of conventional machine learning and statistical approaches for this. However, these techniques are unable or slow to adapt to non-stationarities and may require re-training over time, which is computationally expensive and brings financial risks. This thesis proposes a set of adaptive algorithms to deal with high-frequency data streams and applies these to the financial domain. We present approaches to handle different types of concept drifts and perform predictions using up-to-date models. These mechanisms are designed to provide fast reaction times and are thus applicable to high-frequency data. The core experiments of this thesis are based on the prediction of the price movement direction at different intraday resolutions in the SPDR S&P 500 exchange-traded fund. The proposed algorithms are benchmarked against other popular methods from the data stream mining literature and achieve competitive results. We believe that this thesis opens good research prospects for financial forecasting during market instability and structural breaks. Results have shown that our proposed methods can improve prediction accuracy in many of these scenarios. Indeed, the results obtained are compatible with ideas against the efficient market hypothesis. However, we cannot claim that we can beat consistently buy and hold; therefore, we cannot reject it.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Gustavo Recio Isasi.- Secretario: Pedro Isasi Viñuela.- Vocal: Sandra García Rodrígue
    corecore