463 research outputs found

    Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods

    Get PDF

    Sistemas granulares evolutivos

    Get PDF
    Orientador: Fernando Antonio Campos GomideTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia ElĂ©trica e de ComputaçãoResumo: Recentemente tem-se observado um crescente interesse em abordagens de modelagem computacional para lidar com fluxos de dados do mundo real. MĂ©todos e algoritmos tĂȘm sido propostos para obtenção de conhecimento a partir de conjuntos de dados muito grandes e, a princĂ­pio, sem valor aparente. Este trabalho apresenta uma plataforma computacional para modelagem granular evolutiva de fluxos de dados incertos. Sistemas granulares evolutivos abrangem uma variedade de abordagens para modelagem on-line inspiradas na forma com que os humanos lidam com a complexidade. Esses sistemas exploram o fluxo de informação em ambiente dinĂąmico e extrai disso modelos que podem ser linguisticamente entendidos. Particularmente, a granulação da informação Ă© uma tĂ©cnica natural para dispensar atenção a detalhes desnecessĂĄrios e enfatizar transparĂȘncia, interpretabilidade e escalabilidade de sistemas de informação. Dados incertos (granulares) surgem a partir de percepçÔes ou descriçÔes imprecisas do valor de uma variĂĄvel. De maneira geral, vĂĄrios fatores podem afetar a escolha da representação dos dados tal que o objeto representativo reflita o significado do conceito que ele estĂĄ sendo usado para representar. Neste trabalho sĂŁo considerados dados numĂ©ricos, intervalares e fuzzy; e modelos intervalares, fuzzy e neuro-fuzzy. A aprendizagem de sistemas granulares Ă© baseada em algoritmos incrementais que constroem a estrutura do modelo sem conhecimento anterior sobre o processo e adapta os parĂąmetros do modelo sempre que necessĂĄrio. Este paradigma de aprendizagem Ă© particularmente importante uma vez que ele evita a reconstrução e o retreinamento do modelo quando o ambiente muda. Exemplos de aplicação em classificação, aproximação de função, predição de sĂ©ries temporais e controle usando dados sintĂ©ticos e reais ilustram a utilidade das abordagens de modelagem granular propostas. O comportamento de fluxos de dados nĂŁo-estacionĂĄrios com mudanças graduais e abruptas de regime Ă© tambĂ©m analisado dentro do paradigma de computação granular evolutiva. Realçamos o papel da computação intervalar, fuzzy e neuro-fuzzy em processar dados incertos e prover soluçÔes aproximadas de alta qualidade e sumĂĄrio de regras de conjuntos de dados de entrada e saĂ­da. As abordagens e o paradigma introduzidos constituem uma extensĂŁo natural de sistemas inteligentes evolutivos para processamento de dados numĂ©ricos a sistemas granulares evolutivos para processamento de dados granularesAbstract: In recent years there has been increasing interest in computational modeling approaches to deal with real-world data streams. Methods and algorithms have been proposed to uncover meaningful knowledge from very large (often unbounded) data sets in principle with no apparent value. This thesis introduces a framework for evolving granular modeling of uncertain data streams. Evolving granular systems comprise an array of online modeling approaches inspired by the way in which humans deal with complexity. These systems explore the information flow in dynamic environments and derive from it models that can be linguistically understood. Particularly, information granulation is a natural technique to dispense unnecessary details and emphasize transparency, interpretability and scalability of information systems. Uncertain (granular) data arise from imprecise perception or description of the value of a variable. Broadly stated, various factors can affect one's choice of data representation such that the representing object conveys the meaning of the concept it is being used to represent. Of particular concern to this work are numerical, interval, and fuzzy types of granular data; and interval, fuzzy, and neurofuzzy modeling frameworks. Learning in evolving granular systems is based on incremental algorithms that build model structure from scratch on a per-sample basis and adapt model parameters whenever necessary. This learning paradigm is meaningful once it avoids redesigning and retraining models all along if the system changes. Application examples in classification, function approximation, time-series prediction and control using real and synthetic data illustrate the usefulness of the granular approaches and framework proposed. The behavior of nonstationary data streams with gradual and abrupt regime shifts is also analyzed in the realm of evolving granular computing. We shed light upon the role of interval, fuzzy, and neurofuzzy computing in processing uncertain data and providing high-quality approximate solutions and rule summary of input-output data sets. The approaches and framework introduced constitute a natural extension of evolving intelligent systems over numeric data streams to evolving granular systems over granular data streamsDoutoradoAutomaçãoDoutor em Engenharia ElĂ©tric

    A linguistic multi-criteria decision-aiding system to support university career services

    Get PDF
    In this paper we introduce a linguistic multi-criteria decision-aiding model to support college students with the internship job market application. It considers a fuzzy ordered weighted averaging (FOWA) operator in the matching to capture the inherent uncertainty and vague nature of personnel selection processes. The decision model is integrated in a software tool able to capture data from university student resume and internship databases. The application assesses position characteristics implicitly by means of linguistic descriptions according to each student's preferences. The software tool is enabled with the ability to propose positions according to student preferences. The system selects a reduced list of alternatives from the set of job offers, helping students to decide on which positions to focus their applications.Peer ReviewedPostprint (author's final draft

    A survey of qualitative spatial representations

    Get PDF
    Representation and reasoning with qualitative spatial relations is an important problem in artificial intelligence and has wide applications in the fields of geographic information system, computer vision, autonomous robot navigation, natural language understanding, spatial databases and so on. The reasons for this interest in using qualitative spatial relations include cognitive comprehensibility, efficiency and computational facility. This paper summarizes progress in qualitative spatial representation by describing key calculi representing different types of spatial relationships. The paper concludes with a discussion of current research and glimpse of future work

    Fuzzy rough and evolutionary approaches to instance selection

    Get PDF

    Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications

    Get PDF
    IoT comprises sensors and other small devices interconnected locally and via the Internet. Typical IoT devices collect data from the environment through sensors, analyze it and act back on the physical world through actuators. We can find them integrated into home appliances, Healthcare, Control systems, and wearables. This chapter presents a variety of applications where IoT devices are used for anomaly detection and correction. We review recent advancements in Machine/Deep Learning Models and Techniques for Anomaly Detection in IoT networks. We describe significant in-depth applications in various domains, Anomaly Detection for IoT Time-Series Data, Cybersecurity, Healthcare, Smart city, and more. The number of connected devices is increasing daily; by 2025, there will be approximately 85 billion IoT devices, spreading everywhere in Manufacturing (40%), Medical (30%), Retail, and Security (20%). This significant shift toward the Internet of Things (IoT) has created opportunities for future IoT applications. The chapter examines the security issues of IoT standards, protocols, and practical operations and identifies the hazards associated with the existing IoT model. It analyzes new security protocols and solutions to moderate these challenges. This chapter’s outcome can benefit the research community by encapsulating the Information related to IoT and proposing innovative solutions

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    On indexing highly dynamic multidimensional datasets for interactive analytics

    Get PDF
    Orientador : Prof. Dr. Luis Carlos Erpen de BonaTese (doutorado) - Universidade Federal do ParanĂĄ, Setor de CiĂȘncias Exatas, Programa de PĂłs-Graduação em InformĂĄtica. Defesa: Curitiba, 15/04/2016Inclui referĂȘncias : f. 77-91Área de concentração : CiĂȘncia da computaçãoResumo: Indexação de dados multidimensionais tem sido extensivamente pesquisada nas Ășltimas dĂ©cadas. Neste trabalho, um novo workload OLAP identificado no Facebook Ă© apresentado, caracterizado por (a) alta dinamicidade e dimensionalidade, (b) escala e (c) interatividade e simplicidade de consultas, inadequado para os SGBDs OLAP e tĂ©cnicas de indexação de dados multidimensionais atuais. Baseado nesse caso de uso, uma nova estratĂ©gia de indexação e organização de dados multidimensionais para SGBDs em memĂłria chamada Granular Partitioning Ă© proposta. Essa tĂ©cnica extende a visĂŁo tradicional de partitionamento em banco de dados, particionando por intervalo todas as dimensĂ”es do conjunto de dados e formando pequenos blocos que armazenam dados de forma nĂŁo coordenada e esparsa. Desta forma, Ă© possĂ­vel atingir altas taxas de ingestĂŁo de dados sem manter estrutura auxiliar alguma de indexação. Este trabalho tambĂ©m descreve como um SGBD OLAP capaz de suportar um modelo de dados composto por cubos, dimensĂ”es e mĂ©tricas, alĂ©m de operaçÔes como roll-ups, drill-downs e slice and dice (filtros) eficientes pode ser construĂ­do com base nessa nova tĂ©cnica de organização de dados. Com objetivo de validar experimentalmente a tĂ©cnica apresentada, este trabalho apresenta o Cubrick, um novo SGBD OLAP em memĂłria distribuĂ­da e otimizada para a execução de consultas analĂ­ticas baseado em Granular Partitioning, escritas desde a primeira linha de cĂłdigo para este trabalho. Finalmente, os resultados de uma avaliação experimental extensiva contendo conjuntos de dados e consultas coletadas de projetos pilotos que utilizam Cubrick Ă© apresentada; em seguida, Ă© mostrado que a escala desejada pode ser alcançada caso os dados sejam organizados de acordo com o Granular Partitioning e o projeto seja focado em simplicidade, ingerindo milhĂ”es de registros por segundo continuamente de uxos de dados em tempo real, e concorrentemente executando consultas com latĂȘncia inferior a 1 segundo.Abstrct: Indexing multidimensional data has been an active focus of research in the last few decades. In this work, we present a new type of OLAP workload found at Facebook and characterized by (a) high dynamicity and dimensionality, (b) scale and (c) interactivity and simplicity of queries, that is unsuited for most current OLAP DBMSs and multidimensional indexing techniques. To address this use case, we propose a novel multidimensional data organization and indexing strategy for in-memory DBMSs called Granular Partitioning. This technique extends the traditional view of database partitioning by range partitioning every dimension of the dataset and organizing the data within small containers in an unordered and sparse fashion, in such a way to provide high ingestion rates and indexed access through every dimension without maintaining any auxiliary data structures. We also describe how an OLAP DBMS able to support a multidimensional data model composed of cubes, dimensions and metrics and operations such as roll-up, drill-down as well as efficient slice and dice filtering) can be built on top of this new data organization technique. In order to experimentally validate the described technique we present Cubrick, a new in-memory distributed OLAP DBMS for interactive analytics based on Granular Partitioning we have written from the ground up at Facebook. Finally, we present results from a thorough experimental evaluation that leveraged datasets and queries collected from a few pilot Cubrick deployments. We show that by properly organizing the dataset according to Granular Partitioning and focusing the design on simplicity, we are able to achieve the target scale and store tens of terabytes of in-memory data, continuously ingest millions of records per second from realtime data streams and still execute sub-second queries
    • 

    corecore