    Generating What-If Scenarios for Time Series Data

    Time series data has become a ubiquitous and important data source in many application domains. Most companies and organizations strongly rely on this data for critical tasks like decision-making, planning, predictions, and analytics in general. While all these tasks generally focus on actual data representing organization and business processes, it is also desirable to apply them to alternative scenarios in order to prepare for developments that diverge from expectations or assess the robustness of current strategies. When it comes to the construction of such what-if scenarios, existing tools either focus on scalar data or they address highly specific scenarios. In this work, we propose a generally applicable and easy-to-use method for the generation of what-if scenarios on time series data. Our approach extracts descriptive features of a data set and allows the construction of an alternate version by means of filtering and modification of these features

    Forecasting large collections of time series: feature-based methods

    In economics and many other forecasting domains, the real world problems are too complex for a single model that assumes a specific data generation process. The forecasting performance of different methods changes depending on the nature of the time series. When forecasting large collections of time series, two lines of approaches have been developed using time series features, namely feature-based model selection and feature-based model combination. This chapter discusses the state-of-the-art feature-based methods, with reference to open-source software implementations

    Anomaly Detection in Cloud-Native systems

    In recent years, microservices have gained popularity due to their benefits such as increased maintainability and scalability of the system. The microservice architectural pattern was adopted for the development of a large scale system which is commonly deployed on public and private clouds, and therefore the aim is to ensure that it always maintains an optimal level of performance. Consequently, the system is monitored by collecting different metrics including performancerelated metrics. The first part of this thesis focuses on the creation of a dataset of realistic time series with anomalies at deterministic locations. This dataset addresses the lack of labeled data for training of supervised models and the absence of publicly available data, in fact the data are not usually shared due to privacy concerns. The second part consists of an empirical study on the detection of anomalies occurring in the different services that compose the system. Specifically, the aim is to understand if it is possible to predict the anomalies in order to perform actions before system failures or performance degradation. Consequently, eight different classification-based Machine Learning algorithms were compared by collecting accuracy, training time and testing time, to figure out which technique might be most suitable for reducing system overload. The results showed that there are strong correlations between metrics and that it is possible to predict the anomalies in the system with approximately 90% of accuracy. The most important outcome is that performance-related anomalies can be detected by monitoring a limited number of metrics collected at runtime with a short training time. Future work includes the adoption of prediction-based approaches and the development of some tools for the prediction of anomalies in cloud native environments

    Feature-based Time Series Analytics

    Time series analytics is a fundamental prerequisite for decision-making as well as automation and occurs in several applications such as energy load control, weather research, and consumer behavior analysis. It encompasses time series engineering, i.e., the representation of time series exhibiting important characteristics, and data mining, i.e., the application of the representation to a specific task. Due to the exhaustive data gathering, which results from the ``Industry 4.0'' vision and its shift towards automation and digitalization, time series analytics is undergoing a revolution. Big datasets with very long time series are gathered, which is challenging for engineering techniques. Traditionally, one focus has been on raw-data-based or shape-based engineering. They assess the time series' similarity in shape, which is only suitable for short time series. Another focus has been on model-based engineering. It assesses the time series' similarity in structure, which is suitable for long time series but requires larger models or a time-consuming modeling. Feature-based engineering tackles these challenges by efficiently representing time series and comparing their similarity in structure. However, current feature-based techniques are unsatisfactory as they are designed for specific data-mining tasks. In this work, we introduce a novel feature-based engineering technique. It efficiently provides a short representation of time series, focusing on their structural similarity. Based on a design rationale, we derive important time series characteristics such as the long-term and cyclically repeated characteristics as well as distribution and correlation characteristics. Moreover, we define a feature-based distance measure for their comparison. Both the representation technique and the distance measure provide desirable properties regarding storage and runtime. Subsequently, we introduce techniques based on our feature-based engineering and apply them to important data-mining tasks such as time series generation, time series matching, time series classification, and time series clustering. First, our feature-based generation technique outperforms state-of-the-art techniques regarding the accuracy of evolved datasets. Second, with our features, a matching method retrieves a match for a time series query much faster than with current representations. Third, our features provide discriminative characteristics to classify datasets as accurately as state-of-the-art techniques, but orders of magnitude faster. Finally, our features recommend an appropriate clustering of time series which is crucial for subsequent data-mining tasks. All these techniques are assessed on datasets from the energy, weather, and economic domains, and thus, demonstrate the applicability to real-world use cases. The findings demonstrate the versatility of our feature-based engineering and suggest several courses of action in order to design and improve analytical systems for the paradigm shift of Industry 4.0

    Localização de delivery lockers para logística urbana em uma cidade brasileira de porte médio: o caso de Divinópolis, Minas Gerais

    With the advancement of technologies and the spread of the internet around the world, there was a strong expansion of negotiations carried out through electronic commerce in all areas. Consequently, there is a growth in the flow of logistical operations that are responsible for the delivery of products acquired through this means, causing problems related to the transport of these goods to appear. Among the transport problems caused in the last stage of the distribution chain, known as the last mile, or last mile, we can highlight: failure to deliver products, excessive travel, high operating costs, poorly sized transport resources, etc. . Therefore, research in the area evaluates possibilities to alleviate these difficulties in cargo distribution so that logistical operations are more efficient and offer a good level of service to its users. An alternative that has been used in different regions of the world for this purpose are collection and delivery points, which are stations where customers pick up their products purchased over the internet on their own. Collection and delivery points (PCE's) can be automated, known as Delivery Lockers (DL), or not. The literature indicates that one of the difficulties encountered in installing DL's is knowing how to define which location will best serve consumers in each specific region. Therefore, this study aimed to propose suitable locations for the installation of DL's in a medium-sized city in Minas Gerais, as well as to analyze which are the most influential factors for the use of these devices according to the opinion of the local consumer. It was observed that the opening hours, distance from central regions and the safety of these operations are the main factors mentioned by consumers. A multicriteria mathematical model based on the AHP method was developed to help choose the establishments that would best meet the evaluated criteria. It was concluded that several scenarios can satisfy the problem, however those that presented alternatives that are located in central regions or in small commercial centers were preferred by the model. Establishments that work outside conventional hours, such as supermarkets and hypermarkets, were positive highlights in the analyzed scenarios, unlike bank branches, which did not show good results due to their limited space and hours. Finally, it is noted that the use of DL's is also valid for medium-sized cities, however, for this to work effectively, it is ideal that e-commerce users are aware of the benefits that this practice can bring, both in financial and operational terms.Com o avanço das tecnologias e a difusão da internet no mundo, houve uma forte expansão das negociações realizadas por meio do comércio eletrônico em todos os âmbitos. Consequentemente, surge um crescimento no fluxo das operações logísticas que são responsáveis pela entrega de produtos adquiridos por este meio, fazendo com que apareçam problemas relacionados ao transporte dessas mercadorias. Dentre os problemas de transporte acarretados na última etapa da cadeia de distribuição, conhecida como última milha, ou last mile, pode-se destacar: falha na entrega de produtos, viagens excessivas, alto custo de operação, mal dimensionamento dos recursos de transporte, etc. Logo, pesquisas na área avaliam possibilidades que amenizem estas dificuldades na distribuição de cargas para que as operações logísticas sejam mais eficientes e ofereçam um bom nível de serviço a seus usuários. Uma alternativa que vem sendo utilizada em diferentes regiões do mundo para este fim são os pontos de coleta e entrega, que são estações onde os clientes retiram seus produtos adquiridos pela internet por conta própria. Os pontos de coleta e entrega (PCE’s) podem ser automatizados, conhecidos como Delivery Lockers (DL), ou não. A literatura indica que uma das dificuldades encontradas para instalação de DL’s é saber definir qual a localização que melhor irá anteder aos consumidores de cada região específica. Portanto, o presente trabalho teve como objetivo propor localizações adequadas para instalação de DL’s em uma cidade mineira de porte médio, bem como analisar quais são os fatores de maior influência para a utilização destes dispositivos de acordo com a opinião do consumidor local. Foram observados que o horário de funcionamento, distância de regiões centrais e a segurança destas operações são os principais fatores apontados pelos consumidores. Um modelo matemático multicritério baseado no método AHP foi desenvolvido para auxiliar na escolha dos estabelecimentos que melhor atenderiam aos critérios avaliados. Concluiu-se que diversos cenários podem satisfazer o problema, entretanto aqueles que apresentaram alternativas que são localizadas em regiões centrais ou em pequenos centros comerciais foram preferidas pelo modelo. Os estabelecimentos que funcionam fora dos horários convencionais, como supermercados e hipermercados foram destaques positivos nos cenários analisados, diferente das agências bancárias, que não apresentaram bons resultados devido suas limitações de espaço e horários. Por fim, nota-se que a utilização de DL’s também é válida para cidades de porte médio, entretanto, para que isto funcione de maneira eficaz, é ideal que os usuários do e-commerce tenham conhecimento dos benefícios que esta prática pode trazer, tanto em termos financeiros, quanto operacionais