6 research outputs found
Generating What-If Scenarios for Time Series Data
Time series data has become a ubiquitous and important data source in many application domains. Most companies and organizations strongly rely on this data for critical tasks like decision-making, planning, predictions, and analytics in general. While all these tasks generally focus on actual data representing organization and business processes, it is also desirable to apply them to alternative scenarios in order to prepare for developments that diverge from expectations or assess the robustness of current strategies. When it comes to the construction of such what-if scenarios, existing tools either focus on scalar data or they address highly specific scenarios. In this work, we propose a generally applicable and easy-to-use method for the generation of what-if scenarios on time series data. Our approach extracts descriptive features of a data set and allows the construction of an alternate version by means of filtering and modification of these features
Forecasting large collections of time series: feature-based methods
In economics and many other forecasting domains, the real world problems are
too complex for a single model that assumes a specific data generation process.
The forecasting performance of different methods changes depending on the
nature of the time series. When forecasting large collections of time series,
two lines of approaches have been developed using time series features, namely
feature-based model selection and feature-based model combination. This chapter
discusses the state-of-the-art feature-based methods, with reference to
open-source software implementations
Anomaly Detection in Cloud-Native systems
In recent years, microservices have gained popularity due to their benefits such as increased maintainability and scalability of the system. The microservice architectural pattern was adopted for the development of a large scale system which is commonly deployed on public and private clouds, and therefore the aim is to ensure that it always maintains an optimal level of performance. Consequently, the system is monitored by collecting different metrics including performancerelated metrics.
The first part of this thesis focuses on the creation of a dataset of realistic time series with anomalies at deterministic locations. This dataset addresses the lack of labeled data for training of supervised models and the absence of publicly available data, in fact the data are not usually shared due to privacy concerns.
The second part consists of an empirical study on the detection of anomalies occurring in the different services that compose the system. Specifically, the aim is to understand if it is possible to predict the anomalies in order to perform actions before system failures or performance degradation. Consequently, eight different classification-based Machine Learning algorithms were compared by collecting accuracy, training time and testing time, to figure out which technique might be most suitable for reducing system overload.
The results showed that there are strong correlations between metrics and that it is possible to predict the anomalies in the system with approximately 90% of accuracy. The most important outcome is that performance-related anomalies can be detected by monitoring a limited number of metrics collected at runtime with a short training time. Future work includes the adoption of prediction-based approaches and the development of some tools for the prediction of anomalies in cloud native environments
Feature-based Time Series Analytics
Time series analytics is a fundamental prerequisite for decision-making as well as automation and occurs in several applications such as energy load control, weather research, and consumer behavior analysis. It encompasses time series engineering, i.e., the representation of time series exhibiting important characteristics, and data mining, i.e., the application of the representation to a specific task. Due to the exhaustive data gathering, which results from the ``Industry 4.0'' vision and its shift towards automation and digitalization, time series analytics is undergoing a revolution. Big datasets with very long time series are gathered, which is challenging for engineering techniques. Traditionally, one focus has been on raw-data-based or shape-based engineering. They assess the time series' similarity in shape, which is only suitable for short time series. Another focus has been on model-based engineering. It assesses the time series' similarity in structure, which is suitable for long time series but requires larger models or a time-consuming modeling. Feature-based engineering tackles these challenges by efficiently representing time series and comparing their similarity in structure. However, current feature-based techniques are unsatisfactory as they are designed for specific data-mining tasks.
In this work, we introduce a novel feature-based engineering technique. It efficiently provides a short representation of time series, focusing on their structural similarity. Based on a design rationale, we derive important time series characteristics such as the long-term and cyclically repeated characteristics as well as distribution and correlation characteristics. Moreover, we define a feature-based distance measure for their comparison. Both the representation technique and the distance measure provide desirable properties regarding storage and runtime.
Subsequently, we introduce techniques based on our feature-based engineering and apply them to important data-mining tasks such as time series generation, time series matching, time series classification, and time series clustering. First, our feature-based generation technique outperforms state-of-the-art techniques regarding the accuracy of evolved datasets. Second, with our features, a matching method retrieves a match for a time series query much faster than with current representations. Third, our features provide discriminative characteristics to classify datasets as accurately as state-of-the-art techniques, but orders of magnitude faster. Finally, our features recommend an appropriate clustering of time series which is crucial for subsequent data-mining tasks. All these techniques are assessed on datasets from the energy, weather, and economic domains, and thus, demonstrate the applicability to real-world use cases. The findings demonstrate the versatility of our feature-based engineering and suggest several courses of action in order to design and improve analytical systems for the paradigm shift of Industry 4.0
Localização de delivery lockers para logÃstica urbana em uma cidade brasileira de porte médio: o caso de Divinópolis, Minas Gerais
With the advancement of technologies and the spread of the internet around the world,
there was a strong expansion of negotiations carried out through electronic commerce in
all areas. Consequently, there is a growth in the flow of logistical operations that are
responsible for the delivery of products acquired through this means, causing problems
related to the transport of these goods to appear. Among the transport problems caused
in the last stage of the distribution chain, known as the last mile, or last mile, we can
highlight: failure to deliver products, excessive travel, high operating costs, poorly sized
transport resources, etc. . Therefore, research in the area evaluates possibilities to
alleviate these difficulties in cargo distribution so that logistical operations are more
efficient and offer a good level of service to its users. An alternative that has been used
in different regions of the world for this purpose are collection and delivery points,
which are stations where customers pick up their products purchased over the internet
on their own. Collection and delivery points (PCE's) can be automated, known as
Delivery Lockers (DL), or not. The literature indicates that one of the difficulties
encountered in installing DL's is knowing how to define which location will best serve
consumers in each specific region. Therefore, this study aimed to propose suitable
locations for the installation of DL's in a medium-sized city in Minas Gerais, as well as
to analyze which are the most influential factors for the use of these devices according
to the opinion of the local consumer. It was observed that the opening hours, distance
from central regions and the safety of these operations are the main factors mentioned
by consumers. A multicriteria mathematical model based on the AHP method was
developed to help choose the establishments that would best meet the evaluated criteria.
It was concluded that several scenarios can satisfy the problem, however those that
presented alternatives that are located in central regions or in small commercial centers
were preferred by the model. Establishments that work outside conventional hours, such
as supermarkets and hypermarkets, were positive highlights in the analyzed scenarios,
unlike bank branches, which did not show good results due to their limited space and
hours. Finally, it is noted that the use of DL's is also valid for medium-sized cities,
however, for this to work effectively, it is ideal that e-commerce users are aware of the
benefits that this practice can bring, both in financial and operational terms.Com o avanço das tecnologias e a difusão da internet no mundo, houve uma forte
expansão das negociações realizadas por meio do comércio eletrônico em todos os
âmbitos. Consequentemente, surge um crescimento no fluxo das operações logÃsticas
que são responsáveis pela entrega de produtos adquiridos por este meio, fazendo com
que apareçam problemas relacionados ao transporte dessas mercadorias. Dentre os
problemas de transporte acarretados na última etapa da cadeia de distribuição,
conhecida como última milha, ou last mile, pode-se destacar: falha na entrega de
produtos, viagens excessivas, alto custo de operação, mal dimensionamento dos
recursos de transporte, etc. Logo, pesquisas na área avaliam possibilidades que
amenizem estas dificuldades na distribuição de cargas para que as operações logÃsticas
sejam mais eficientes e ofereçam um bom nÃvel de serviço a seus usuários. Uma
alternativa que vem sendo utilizada em diferentes regiões do mundo para este fim são os
pontos de coleta e entrega, que são estações onde os clientes retiram seus produtos
adquiridos pela internet por conta própria. Os pontos de coleta e entrega (PCE’s) podem
ser automatizados, conhecidos como Delivery Lockers (DL), ou não. A literatura indica
que uma das dificuldades encontradas para instalação de DL’s é saber definir qual a
localização que melhor irá anteder aos consumidores de cada região especÃfica.
Portanto, o presente trabalho teve como objetivo propor localizações adequadas para
instalação de DL’s em uma cidade mineira de porte médio, bem como analisar quais são
os fatores de maior influência para a utilização destes dispositivos de acordo com a
opinião do consumidor local. Foram observados que o horário de funcionamento,
distância de regiões centrais e a segurança destas operações são os principais fatores
apontados pelos consumidores. Um modelo matemático multicritério baseado no
método AHP foi desenvolvido para auxiliar na escolha dos estabelecimentos que melhor
atenderiam aos critérios avaliados. Concluiu-se que diversos cenários podem satisfazer
o problema, entretanto aqueles que apresentaram alternativas que são localizadas em
regiões centrais ou em pequenos centros comerciais foram preferidas pelo modelo. Os
estabelecimentos que funcionam fora dos horários convencionais, como supermercados
e hipermercados foram destaques positivos nos cenários analisados, diferente das
agências bancárias, que não apresentaram bons resultados devido suas limitações de
espaço e horários. Por fim, nota-se que a utilização de DL’s também é válida para
cidades de porte médio, entretanto, para que isto funcione de maneira eficaz, é ideal que
os usuários do e-commerce tenham conhecimento dos benefÃcios que esta prática pode
trazer, tanto em termos financeiros, quanto operacionais