33 research outputs found

    A Survey on Concept Drift Adaptation

    Get PDF
    Concept drift primarily refers to an online supervised learning scenario when the relation between the in- put data and the target variable changes over time. Assuming a general knowledge of supervised learning in this paper we characterize adaptive learning process, categorize existing strategies for handling concept drift, discuss the most representative, distinct and popular techniques and algorithms, discuss evaluation methodology of adaptive algorithms, and present a set of illustrative applications. This introduction to the concept drift adaptation presents the state of the art techniques and a collection of benchmarks for re- searchers, industry analysts and practitioners. The survey aims at covering the different facets of concept drift in an integrated way to reflect on the existing scattered state-of-the-art

    Booleovská faktorová analýza atraktorovou neuronovou sítí

    Get PDF
    Import 23/08/2017Methods for the discovery of hidden structures of high-dimensional binary data rank among the most important challenges facing the community of machine learning researchers at present. There are many approaches in the literature that try to solve this hitherto rather ill-defined task. The Boolean factor analysis (BFA) studied in this work represents a hidden structure of binary data as Boolean superposition of binary factors complied with the BFA generative model of signals, and the criterion of optimality of BFA solution is given. In these terms, the BFA is a well-defined task completely analogous to linear factor analysis. The main contributions of the dissertation thesis are as follows: Firstly, an efficient BFA method, based on the original attractor neural network with increasing activity (ANNIA), which is subsequently improved through a combination with the expectation-maximization method(EM),so LANNIA method has been developed. Secondly, the characteristics of the ANNIA that are important for LANNIA and ANNIA methods functioning were analyzed. Then the functioning of both methods was validated on artificially generated data sets. Next, the method was applied to real-world data from different areas of science to demonstrate their contribution to this type of analysis. Finally, the BFA method was compared with related methods, including applicability analysis.Jednou z nejdůležitějších výzev současnosti, která stojí před komunitou badatelů z oblasti strojového učení je výzkum metod pro analýzu vysoce-dimenzionálních binárních dat s cílem odhalení jejich skryté struktury. V literatuře můžeme nalézt mnoho přístupů, které se snaží tuto doposud poněkud vágně definovanou úlohu řešit. Booleovská Faktorová Analýza (BFA), jež je předmětem této práce, předpokládá, že skrytou strukturu binárních dat lze reprezentovat jako booleovskou superpozici binárních faktorů tak, aby co nejlépe odpovídala generativnímu modelu signálů BFA a danému kritériu optimálnosti. Za těchto podmínek je BFA dob��e definovaná úloha zcela analogická lineární faktorové analýze. Hlavní přínosy disertační práce, jsou následující: Za prvé byl vyvinut efektivní způsob BFA založený na původní atraktorové neuronové síti s rostoucí aktivitou (ANNIA), která byla následně zlepšena kombinací s metodou expectation–maximization (EM)a tak vytvo5ena metoda LANNIA. Dále byly provedeny analýzy charakteristik ANNIA, které jsou důležité pro fungování obou metod. Funkčnost obou metod byla také ověřena na uměle vytvořených souborech dat pokrývajících celou škálu parametrů generativního modelu. Dále je v práci ukázáno použití metod na reálných datech z různých oblastí vědy s cílem prokázat jejich přínos pro tento typ analýzy. A konečně bylo provedeno i srovnání metod BFA se podobnými metodami včetně analýzy jejich použitelnosti.460 - Katedra informatikyvyhově

    Previsão do deslocamento de tempestades severas : abordagens por aprendizado de máquina

    Get PDF
    Orientador: Prof. Dr. Paulo Henrique SiqueiraCoorientador: Dr. Cesar Augustus Assis BenetiDissertação (mestrado) - Universidade Federal do Paraná, Setor de Tecnologia, Programa de Pós-Graduação em Métodos Numéricos em Engenharia. Defesa : Curitiba, 03/08/2018Inclui referências: 97-102Área de concentração: Programação MatemáticaResumo: A previsao de tempestades severas pode auxiliar no processo de tomada de decisao e nas medidas operacionais, bem como ajudar a mitigar e ate mesmo antecipar os danos, permitindo que as acoes possiveis sejam tomadas. Portanto, existe a necessidade de tecnicas confiaveis e rapidas para o monitoramento de tempestades, que consiste em tres processos principais: identificacao de celulas de tempestades ativas, rastreamento, e tambem a previsao de seu deslocamento. O foco deste trabalho e o terceiro passo, com o objetivo de estudar metodos de aprendizado de maquina para previsao de tempestades de curto prazo em celulas identificadas e rastreadas pelo sistema TITAN (Identificacao, Rastreamento, Analise e Previsao de Tempestades) em diferentes estagios. A analise ocorre na regiao sul e sudeste do Brasil e usa dados de radares meteorologicos e descargas eletricas atmosfericas. Devido a natureza dos fenomenos representados neste trabalho, metodos de aprendizado de maquina foram escolhidos porque eles sao capazes de entender e aprender com os recursos e seus relacionamentos. Alem disso, uma vez que o modelo e aprendido pelo metodo escolhido, o processamento das novas entradas ocorre rapidamente. Dois tipos de tecnicas de regressao sao estudadas: Ensemble e Modelo Linear. Foram aplicados os seguintes metodos para a previsao: Bagging, Random Forest, Extra Trees, Theil Sen e Bayesian Ridge. A avaliacao dos resultados e feita comparando-os com a previsao fornecida pelo TITAN para cada celula, uma vez que e uma ferramenta bem estabelecida na area. O melhor desempenho foi obtido com o Algoritmo Random Forest. Seus resultados mostraram-se satisfatorios para a predicao de deslocamento, mostrando-se uma boa alternativa ao software padrao. Alem disso, uma contribuicao mais evidente dos metodos propostos e encontrada para a previsao do tamanho das tempestades. Palavras chaves: Aprendizado de Maquina. Regressao. Previsao de Tempestades. Aprendizado Agrupado. Modelo Linear.Abstract: Thunderstorm forecast can help in the decision-making process and operational measures, as well as help mitigate and even anticipate damage, allowing those decision to be taken. Therefore, there is a need for trustworthy and fast techniques for storms monitoring, consisting of three main processes: identification of active storm cells, tracking, and also their forecast their displacement. The focus of this work is the third step, aiming to study machine learning methods for short-term storm forecast on cells identified and tracked by TITAN (Thunderstorm Identification, Tracking, Analysis, and Nowcasting) system in different stages. The analysis takes place in the discussed region and uses data from meteorological radars and atmospheric electrical discharges. Due to the nature of the phenomena represented in this work, machine learning methods are chosen because they are able to better understand and learn from the features and their relationships. Moreover, once the model is learned by the chosen method, the processing of the new entries occurs fastly. Two types of regression techniques are studied: Ensemble and Linear Model. In totally, it was applied the following methods for the forecast: Bagging, Random Forest, Extra Trees, Theil Sen and Bayesian Ridge. The evaluation of the results is done by comparing them with the forecast provided by TITAN for each cell, since that is a well-established tool in the area. The best performance was achieved with the Random Forest Algorithm, and its results proved to be satisfactory for the prediction of displacement, shown to good alternative to the standard software. In addition, a more evident contribution of the proposed methods was found to the prediction of the storms' shape. Keywords: Machine Learning. Regression. Thunderstorm Forecasting. Ensembles. Linear Models

    Semantic enrichment of knowledge sources supported by domain ontologies

    Get PDF
    This thesis introduces a novel conceptual framework to support the creation of knowledge representations based on enriched Semantic Vectors, using the classical vector space model approach extended with ontological support. One of the primary research challenges addressed here relates to the process of formalization and representation of document contents, where most existing approaches are limited and only take into account the explicit, word-based information in the document. This research explores how traditional knowledge representations can be enriched through incorporation of implicit information derived from the complex relationships (semantic associations) modelled by domain ontologies with the addition of information presented in documents. The relevant achievements pursued by this thesis are the following: (i) conceptualization of a model that enables the semantic enrichment of knowledge sources supported by domain experts; (ii) development of a method for extending the traditional vector space, using domain ontologies; (iii) development of a method to support ontology learning, based on the discovery of new ontological relations expressed in non-structured information sources; (iv) development of a process to evaluate the semantic enrichment; (v) implementation of a proof-of-concept, named SENSE (Semantic Enrichment kNowledge SourcEs), which enables to validate the ideas established under the scope of this thesis; (vi) publication of several scientific articles and the support to 4 master dissertations carried out by the department of Electrical and Computer Engineering from FCT/UNL. It is worth mentioning that the work developed under the semantic referential covered by this thesis has reused relevant achievements within the scope of research European projects, in order to address approaches which are considered scientifically sound and coherent and avoid “reinventing the wheel”.European research projects - CoSpaces (IST-5-034245), CRESCENDO (FP7-234344) and MobiS (FP7-318452

    Fast Data Analytics by Learning

    Full text link
    Today, we collect a large amount of data, and the volume of the data we collect is projected to grow faster than the growth of the computational power. This rapid growth of data inevitably increases query latencies, and horizontal scaling alone is not sufficient for real-time data analytics of big data. Approximate query processing (AQP) speeds up data analytics at the cost of small quality losses in query answers. AQP produces query answers based on synopses of the original data. The sizes of the synopses are smaller than the original data; thus, AQP requires less computational efforts for producing query answers, thus can produce answers more quickly. In AQP, there is a general tradeoff between query latencies and the quality of query answers; obtaining higher-quality answers requires longer query latencies. In this dissertation, we show we can speed up the approximate query processing without reducing the quality of the query answers by optimizing the synopses using two approaches. The two approaches we employ for optimizing the synopses are as follows: 1. Exploiting past computations: We exploit the answers to the past queries. This approach relies on the fact that, if two aggregation involve common or correlated values, the aggregated results must also be correlated. We formally capture this idea using a probabilistic distribution function, which is then used to refine the answers to new queries. 2. Building task-aware synopses: By optimizing synopses for a few common types of data analytics, we can produce higher quality answers (or more quickly for certain target quality) to those data analytics tasks. We use this approach for constructing synopses optimized for searching and visualizations. For exploiting past computations and building task-aware synopses, our work incorporates statistical inference and optimization techniques. The contributions in this dissertation resulted in up to 20x speedups for real-world data analytics workloads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138598/1/pyongjoo_1.pd

    Communicating Science in 20th Century Europe. A Survey on Research and Comparative Perspectives

    Get PDF
    corecore