3 research outputs found

    Recomendação de algoritmos em fluxos de dados com mudança de conceito

    Get PDF
    Trabalho de conclusão de curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2020.Muitas companhias vêm tirando proveito de mineração de dados para identificar infor- mações valiosas em conjuntos de dados massivos gerados em alta frequência, o chamado Big Data. Técnicas de Aprendizado de Máquina podem ser aplicadas para descoberta de informação, visto que podem extrair padrões dos dados para induzir modelos que preverão eventos futuros. Entretanto, ambientes dinâmicos e progressivos comumente geram fluxos de dados não estacionários. Logo, modelos treinados nesse cenário costumam perecer com o tempo pela sazonalidade ou mudança de conceito. O retreinamento periódico pode aju- dar, mas um espaço de hipóteses fixo pode não ser o mais apropriado ao fenômeno. Uma solução alternativa é usar meta-aprendizado para uma contínua seleção de algoritmos em ambientes que mudam com o tempo, escolhendo o viés que melhor condiz com os dados atuais. Nesse trabalho, apresentamos um framework aprimorado para seleção de algorit- mos em fluxos de dados baseado no MetaStream. Nossa abordagem usa meta-aprendizado e aprendizado incremental para ativamente selecionar o melhor algoritmo para o presente conceito em ambientes que mudam com o tempo. Ao contrário de trabalhos prévios, nós usamos uma coleção diversificada de meta-atributos estado-da-arte e uma abordagem de aprendizado incremental para o nível meta baseada no algoritmo LightGBM. Os resul- tados mostram que essa nova estratégia pode aprimorar a acurácia de recomendação do melhor algoritmo em dados que mudam com o tempo.In the last decades, many companies have had a growing interest in the “digital oil”, also called Big Data. Data mining has been applied in these massive volumes of data to obtain valuable information for clients and industries worldwide. Machine Learning, a prominent technique for data mining, can be used to extract patterns from data and induce models to predict future events. Still, complex environments that are constantly evolving usually generate non-stationary data streams. Thus, these models may perish in this scenario due to concept drift. Retraining periodically can help, but the algorithm bias may no longer be appropriate. A response to this is to use meta-learning for regular algorithm selection in time-changing environments, choosing the hypothesis space that best suits the current data. In this work, we enhanced MetaStream, a framework for data stream algorithm selection, though a rich set of state-of-the-art meta-features, and an incremental learning approach in the meta-level based on LightGBM, combining this to actively select the best algorithm for the current concept in a time-changing environment. The results show that this new strategy can improve the recommendation accuracy of the best algorithm in time-changing data

    A Review of Meta-level Learning in the Context of Multi-component, Multi-level Evolving Prediction Systems.

    Get PDF
    The exponential growth of volume, variety and velocity of data is raising the need for investigations of automated or semi-automated ways to extract useful patterns from the data. It requires deep expert knowledge and extensive computational resources to find the most appropriate mapping of learning methods for a given problem. It becomes a challenge in the presence of numerous configurations of learning algorithms on massive amounts of data. So there is a need for an intelligent recommendation engine that can advise what is the best learning algorithm for a dataset. The techniques that are commonly used by experts are based on a trial and error approach evaluating and comparing a number of possible solutions against each other, using their prior experience on a specific domain, etc. The trial and error approach combined with the expert’s prior knowledge, though computationally and time expensive, have been often shown to work for stationary problems where the processing is usually performed off-line. However, this approach would not normally be feasible to apply on non-stationary problems where streams of data are continuously arriving. Furthermore, in a non-stationary environment the manual analysis of data and testing of various methods every time when there is a change in the underlying data distribution would be very difficult or simply infeasible. In that scenario and within an on-line predictive system, there are several tasks where Meta-learning can be used to effectively facilitate best recommendations including: 1) pre processing steps, 2) learning algorithms or their combination, 3) adaptivity mechanisms and their parameters, 4) recurring concept extraction, and 5) concept drift detection. However, while conceptually very attractive and promising, the Meta-learning leads to several challenges with the appropriate representation of the problem at a meta-level being one of the key ones. The goal of this review and our research is, therefore, to investigate Meta learning in general and the associated challenges in the context of automating the building, deployment and adaptation of multi-level and multi-component predictive system that evolve over time
    corecore