    Orientador: Fernando Antonio Campos GomideTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Recentemente tem-se observado um crescente interesse em abordagens de modelagem computacional para lidar com fluxos de dados do mundo real. Métodos e algoritmos têm sido propostos para obtenção de conhecimento a partir de conjuntos de dados muito grandes e, a princípio, sem valor aparente. Este trabalho apresenta uma plataforma computacional para modelagem granular evolutiva de fluxos de dados incertos. Sistemas granulares evolutivos abrangem uma variedade de abordagens para modelagem on-line inspiradas na forma com que os humanos lidam com a complexidade. Esses sistemas exploram o fluxo de informação em ambiente dinâmico e extrai disso modelos que podem ser linguisticamente entendidos. Particularmente, a granulação da informação é uma técnica natural para dispensar atenção a detalhes desnecessários e enfatizar transparência, interpretabilidade e escalabilidade de sistemas de informação. Dados incertos (granulares) surgem a partir de percepções ou descrições imprecisas do valor de uma variável. De maneira geral, vários fatores podem afetar a escolha da representação dos dados tal que o objeto representativo reflita o significado do conceito que ele está sendo usado para representar. Neste trabalho são considerados dados numéricos, intervalares e fuzzy; e modelos intervalares, fuzzy e neuro-fuzzy. A aprendizagem de sistemas granulares é baseada em algoritmos incrementais que constroem a estrutura do modelo sem conhecimento anterior sobre o processo e adapta os parâmetros do modelo sempre que necessário. Este paradigma de aprendizagem é particularmente importante uma vez que ele evita a reconstrução e o retreinamento do modelo quando o ambiente muda. Exemplos de aplicação em classificação, aproximação de função, predição de séries temporais e controle usando dados sintéticos e reais ilustram a utilidade das abordagens de modelagem granular propostas. O comportamento de fluxos de dados não-estacionários com mudanças graduais e abruptas de regime é também analisado dentro do paradigma de computação granular evolutiva. Realçamos o papel da computação intervalar, fuzzy e neuro-fuzzy em processar dados incertos e prover soluções aproximadas de alta qualidade e sumário de regras de conjuntos de dados de entrada e saída. As abordagens e o paradigma introduzidos constituem uma extensão natural de sistemas inteligentes evolutivos para processamento de dados numéricos a sistemas granulares evolutivos para processamento de dados granularesAbstract: In recent years there has been increasing interest in computational modeling approaches to deal with real-world data streams. Methods and algorithms have been proposed to uncover meaningful knowledge from very large (often unbounded) data sets in principle with no apparent value. This thesis introduces a framework for evolving granular modeling of uncertain data streams. Evolving granular systems comprise an array of online modeling approaches inspired by the way in which humans deal with complexity. These systems explore the information flow in dynamic environments and derive from it models that can be linguistically understood. Particularly, information granulation is a natural technique to dispense unnecessary details and emphasize transparency, interpretability and scalability of information systems. Uncertain (granular) data arise from imprecise perception or description of the value of a variable. Broadly stated, various factors can affect one's choice of data representation such that the representing object conveys the meaning of the concept it is being used to represent. Of particular concern to this work are numerical, interval, and fuzzy types of granular data; and interval, fuzzy, and neurofuzzy modeling frameworks. Learning in evolving granular systems is based on incremental algorithms that build model structure from scratch on a per-sample basis and adapt model parameters whenever necessary. This learning paradigm is meaningful once it avoids redesigning and retraining models all along if the system changes. Application examples in classification, function approximation, time-series prediction and control using real and synthetic data illustrate the usefulness of the granular approaches and framework proposed. The behavior of nonstationary data streams with gradual and abrupt regime shifts is also analyzed in the realm of evolving granular computing. We shed light upon the role of interval, fuzzy, and neurofuzzy computing in processing uncertain data and providing high-quality approximate solutions and rule summary of input-output data sets. The approaches and framework introduced constitute a natural extension of evolving intelligent systems over numeric data streams to evolving granular systems over granular data streamsDoutoradoAutomaçãoDoutor em Engenharia Elétric

    With the development of Internet techniques, data volumes are doubling every two years, faster than predicted by Moore’s Law. Big Data Analytics becomes particularly important for enterprise business. Modern computational technologies will provide effective tools to help understand hugely accumulated data and leverage this information to get insights into the finance industry. In order to get actionable insights into the business, data has become most valuable asset of financial organisations, as there are no physical products in finance industry to manufacture. This is where data mining techniques come to their rescue by allowing access to the right information at the right time. These techniques are used by the finance industry in various areas such as fraud detection, intelligent forecasting, credit rating, loan management, customer profiling, money laundering, marketing and prediction of price movements to name a few. This work aims to survey the research on data mining techniques applied to the finance industry from 2010 to 2015.The review finds that Stock prediction and Credit rating have received most attention of researchers, compared to Loan prediction, Money Laundering and Time Series prediction. Due to the dynamics, uncertainty and variety of data, nonlinear mapping techniques have been deeply studied than linear techniques. Also it has been proved that hybrid methods are more accurate in prediction, closely followed by Neural Network technique. This survey could provide a clue of applications of data mining techniques for finance industry, and a summary of methodologies for researchers in this area. Especially, it could provide a good vision of Data Mining Techniques in computational finance for beginners who want to work in the field of computational finance

    Interval-valued time series (ITS) is a collection of interval-valued data whose entires are ordered by time. The modeling of ITS is an ongoing issue pursued by many researchers. There are diverse ITS models showing better performance. This paper proposes a new ITS model using possibility measure-based encoding-decoding mechanism involved in fuzzy theory. The proposed model consists of four modules, say, linguistic variable generation module, encoding module, inference module and decoding module. The linguistic variable generation module can provide a series of linguistic variables expressed in fuzzy sets used to described dynamic characteristics of ITS. The encoding module encodes ITS into some embedding vectors with semantics with the aid of possibility measure and linguistic variables formed by linguistic variable generation module. The inference module uses artificial neural network to capture relationship implied in those embedding vectors with semantic. The decoding module decodes for the outputs of the inference module to produce the output of linguistic and interval formats by using the possibility measure-based encoding-decoding mechanism. In comparison with existing ITS models, the proposed model can not only produce the output of linguistic format, but also exhibit better numeric performance

    In this work we provide a review of basic ideas and novel developments about Conformal Prediction -- an innovative distribution-free, non-parametric forecasting method, based on minimal assumptions -- that is able to yield in a very straightforward way predictions sets that are valid in a statistical sense also in in the finite sample case. The in-depth discussion provided in the paper covers the theoretical underpinnings of Conformal Prediction, and then proceeds to list the more advanced developments and adaptations of the original idea.Comment: arXiv admin note: text overlap with arXiv:0706.3188, arXiv:1604.04173, arXiv:1709.06233, arXiv:1203.5422 by other author

    With the pressing need to improve the poorly rated transportation infrastructure, asset managers leverage predictive maintenance strategies to lower the life cycle costs while maximizing or maintaining the performance of highways. Hence, the limitations of prediction models can highly impact prioritizing maintenance tasks and allocating budget. This study aims to investigate the potential of different predictive models in reaching an effective and efficient maintenance plan. This paper reviews the literature on predictive analytics for a set of highway assets. It also highlights the gaps and limitations of the current methodologies, such as subjective assumptions and simplifications applied in deterministic and probabilistic approaches. This article additionally discusses how these shortcomings impact the application and accuracy of the methods, and how advanced predictive analytics can mitigate the challenges. In this review, we discuss how advancements in technologies coupled with ever-increasing computing power are creating opportunities for a paradigm shift in predictive analytics. We also propose new research directions including the application of advanced machine learning to develop extensible and scalable prediction models and leveraging emerging sensing technologies for collecting, storing and analyzing the data. Finally, we addressed future directions of predictive analysis associated with the data-rich era that will potentially help transportation agencies to become information-rich

    Face à la complexité qui caractérise les problèmes d'optimisation de grande taille l'exploration complète de l'espace des solutions devient rapidement un objectif inaccessible. En effet, à mesure que la taille des problèmes augmente, des méthodes de solution de plus en plus sophistiquées sont exigées afin d'assurer un certain niveau d 'efficacité. Ceci a amené une grande partie de la communauté scientifique vers le développement d'outils spécifiques pour la résolution de problèmes de grande taille tels que les méthodes hybrides. Cependant, malgré les efforts consentis dans le développement d'approches hybrides, la majorité des travaux se sont concentrés sur l'adaptation de deux ou plusieurs méthodes spécifiques, en compensant les points faibles des unes par les points forts des autres ou bien en les adaptant afin de collaborer ensemble. Au meilleur de notre connaissance, aucun travail à date n'à été effectué pour développer un cadre conceptuel pour la résolution efficace de problèmes d'optimisation de grande taille, qui soit à la fois flexible, basé sur l'échange d'information et indépendant des méthodes qui le composent. L'objectif de cette thèse est d'explorer cette avenue de recherche en proposant un cadre conceptuel pour les méthodes hybrides, intitulé la recherche itérative de l'espace restreint, ±Iterative Restricted Space Search (IRSS)>>, dont, la principale idée est la définition et l'exploration successives de régions restreintes de l'espace de solutions. Ces régions, qui contiennent de bonnes solutions et qui sont assez petites pour être complètement explorées, sont appelées espaces restreints "Restricted Spaces (RS)". Ainsi, l'IRSS est une approche de solution générique, basée sur l'interaction de deux phases algorithmiques ayant des objectifs complémentaires. La première phase consiste à identifier une région restreinte intéressante et la deuxième phase consiste à l'explorer. Le schéma hybride de l'approche de solution permet d'alterner entre les deux phases pour un nombre fixe d'itérations ou jusqu'à l'atteinte d'une certaine limite de temps. Les concepts clés associées au développement de ce cadre conceptuel et leur validation seront introduits et validés graduellement dans cette thèse. Ils sont présentés de manière à permettre au lecteur de comprendre les problèmes que nous avons rencontrés en cours de développement et comment les solutions ont été conçues et implémentées. À cette fin, la thèse a été divisée en quatre parties. La première est consacrée à la synthèse de l'état de l'art dans le domaine de recherche sur les méthodes hybrides. Elle présente les principales approches hybrides développées et leurs applications. Une brève description des approches utilisant le concept de restriction d'espace est aussi présentée dans cette partie. La deuxième partie présente les concepts clés de ce cadre conceptuel. Il s'agit du processus d'identification des régions restreintes et des deux phases de recherche. Ces concepts sont mis en oeuvre dans un schéma hybride heuristique et méthode exacte. L'approche a été appliquée à un problème d'ordonnancement avec deux niveaux de décision, relié au contexte des pâtes et papier: "Pulp Production Scheduling Problem". La troisième partie a permit d'approfondir les concepts développés et ajuster les limitations identifiées dans la deuxième partie, en proposant une recherche itérative appliquée pour l'exploration de RS de grande taille et une structure en arbre binaire pour l'exploration de plusieurs RS. Cette structure a l'avantage d'éviter l'exploration d 'un espace déjà exploré précédemment tout en assurant une diversification naturelle à la méthode. Cette extension de la méthode a été testée sur un problème de localisation et d'allocation en utilisant un schéma d'hybridation heuristique-exact de manière itérative. La quatrième partie généralise les concepts préalablement développés et conçoit un cadre général qui est flexible, indépendant des méthodes utilisées et basé sur un échange d'informations entre les phases. Ce cadre a l'avantage d'être général et pourrait être appliqué à une large gamme de problèmes

