2,636 research outputs found

    Data-driven Soft Sensors in the Process Industry

    Get PDF
    In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

    Nature-Inspired Adaptive Architecture for Soft Sensor Modelling

    Get PDF
    This paper gives a general overview of the challenges present in the research field of Soft Sensor building and proposes a novel architecture for building of Soft Sensors, which copes with the identified challenges. The architecture is inspired and making use of nature-related techniques for computational intelligence. Another aspect, which is addressed by the proposed architecture, are the identified characteristics of the process industry data. The data recorded in the process industry consist usually of certain amount of missing values or sample exceeding meaningful values of the measurements, called data outliers. Other process industry data properties causing problems for the modelling are the collinearity of the data, drifting data and the different sampling rates of the particular hardware sensors. It is these characteristics which are the source of the need for an adaptive behaviour of Soft Sensors. The architecture reflects this need and provides mechanisms for the adaptation and evolution of the Soft Sensor at different levels. The adaptation capabilities are provided by maintaining a variety of rather simple models. These particular models, called paths in terms of the architecture, can for example focus on different partition of the input data space, or provide different adaptation speeds to changes in the data. The actual modelling techniques involved into the architecture are data-driven computational learning approaches like artificial neural networks, principal component regression, etc

    Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees

    Get PDF
    This paper investigates an important problem in stream mining, i.e., classification under streaming emerging new classes or SENC. The common approach is to treat it as a classification problem and solve it using either a supervised learner or a semi-supervised learner. We propose an alternative approach by using unsupervised learning as the basis to solve this problem. The SENC problem can be decomposed into three sub problems: detecting emerging new classes, classifying for known classes, and updating models to enable classification of instances of the new class and detection of more emerging new classes. The proposed method employs completely random trees which have been shown to work well in unsupervised learning and supervised learning independently in the literature. This is the first time, as far as we know, that completely random trees are used as a single common core to solve all three sub problems: unsupervised learning, supervised learning and model update in data streams. We show that the proposed unsupervised-learning-focused method often achieves significantly better outcomes than existing classification-focused methods

    One or Two Things We know about Concept Drift -- A Survey on Monitoring Evolving Environments

    Full text link
    The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this paper, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on drift detection. Besides, it covers the current state of research on drift localization in a systematic way. In addition to providing a systematic literature review, this work provides precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different strategies for detection and localization. Thereby, the suitability of different schemes can be analyzed systematically and guidelines for their usage in real-world scenarios can be provided. Finally, there is a section on the emerging topic of explaining concept drift

    A dependability framework for WSN-based aquatic monitoring systems

    Get PDF
    Wireless Sensor Networks (WSN) are being progressively used in several application areas, particularly to collect data and monitor physical processes. Moreover, sensor nodes used in environmental monitoring applications, such as the aquatic sensor networks, are often subject to harsh environmental conditions while monitoring complex phenomena. Non-functional requirements, like reliability, security or availability, are increasingly important and must be accounted for in the application development. For that purpose, there is a large body of knowledge on dependability techniques for distributed systems, which provides a good basis to understand how to satisfy these non-functional requirements of WSN-based monitoring applications. Given the data-centric nature of monitoring applications, it is of particular importance to ensure that data is reliable or, more generically, that it has the necessary quality. The problem of ensuring the desired quality of data for dependable monitoring using WSNs is studied herein. With a dependability-oriented perspective, it is reviewed the possible impairments to dependability and the prominent existing solutions to solve or mitigate these impairments. Despite the variety of components that may form a WSN-based monitoring system, it is given particular attention to understanding which faults can affect sensors, how they can affect the quality of the information, and how this quality can be improved and quantified. Open research issues for the specific case of aquatic monitoring applications are also discussed. One of the challenges in achieving a dependable system behavior is to overcome the external disturbances affecting sensor measurements and detect the failure patterns in sensor data. This is a particular problem in environmental monitoring, due to the difficulty in distinguishing a faulty behavior from the representation of a natural phenomenon. Existing solutions for failure detection assume that physical processes can be accurately modeled, or that there are large deviations that may be detected using coarse techniques, or more commonly that it is a high-density sensor network with value redundant sensors. This thesis aims at defining a new methodology for dependable data quality in environmental monitoring systems, aiming to detect faulty measurements and increase the sensors data quality. The framework of the methodology is overviewed through a generically applicable design, which can be employed to any environment sensor network dataset. The methodology is evaluated in various datasets of different WSNs, where it is used machine learning to model each sensor behavior, exploiting the existence of correlated data provided by neighbor sensors. It is intended to explore the data fusion strategies in order to effectively detect potential failures for each sensor and, simultaneously, distinguish truly abnormal measurements from deviations due to natural phenomena. This is accomplished with the successful application of the methodology to detect and correct outliers, offset and drifting failures in real monitoring networks datasets. In the future, the methodology can be applied to optimize the data quality control processes of new and already operating monitoring networks, and assist in the networks maintenance operations.As redes de sensores sem fios (RSSF) têm vindo cada vez mais a serem utilizadas em diversas áreas de aplicação, em especial para monitorizar e capturar informação de processos físicos em meios naturais. Neste contexto, os sensores que estão em contacto direto com o respectivo meio ambiente, como por exemplo os sensores em meios aquáticos, estão sujeitos a condições adversas e complexas durante o seu funcionamento. Esta complexidade conduz à necessidade de considerarmos, durante o desenvolvimento destas redes, os requisitos não funcionais da confiabilidade, da segurança ou da disponibilidade elevada. Para percebermos como satisfazer estes requisitos da monitorização com base em RSSF para aplicações ambientais, já existe uma boa base de conhecimento sobre técnicas de confiabilidade em sistemas distribuídos. Devido ao foco na obtenção de dados deste tipo de aplicações de RSSF, é particularmente importante garantir que os dados obtidos na monitorização sejam confiáveis ou, de uma forma mais geral, que tenham a qualidade necessária para o objetivo pretendido. Esta tese estuda o problema de garantir a qualidade de dados necessária para uma monitorização confiável usando RSSF. Com o foco na confiabilidade, revemos os possíveis impedimentos à obtenção de dados confiáveis e as soluções existentes capazes de corrigir ou mitigar esses impedimentos. Apesar de existir uma grande variedade de componentes que formam ou podem formar um sistema de monitorização com base em RSSF, prestamos particular atenção à compreensão das possíveis faltas que podem afetar os sensores, a como estas faltas afetam a qualidade dos dados recolhidos pelos sensores e a como podemos melhorar os dados e quantificar a sua qualidade. Tendo em conta o caso específico dos sistemas de monitorização em meios aquáticos, discutimos ainda as várias linhas de investigação em aberto neste tópico. Um dos desafios para se atingir um sistema de monitorização confiável é a deteção da influência de fatores externos relacionados com o ambiente monitorizado, que afetam as medições obtidas pelos sensores, bem como a deteção de comportamentos de falha nas medições. Este desafio é um problema particular na monitorização em ambientes naturais adversos devido à dificuldade da distinção entre os comportamentos associados às falhas nos sensores e os comportamentos dos sensores afetados pela à influência de um evento natural. As soluções existentes para este problema, relacionadas com deteção de faltas, assumem que os processos físicos a monitorizar podem ser modelados de forma eficaz, ou que os comportamentos de falha são caraterizados por desvios elevados do comportamento expectável de forma a serem facilmente detetáveis. Mais frequentemente, as soluções assumem que as redes de sensores contêm um número suficientemente elevado de sensores na área monitorizada e, consequentemente, que existem sensores redundantes relativamente à medição. Esta tese tem como objetivo a definição de uma nova metodologia para a obtenção de qualidade de dados confiável em sistemas de monitorização ambientais, com o intuito de detetar a presença de faltas nas medições e aumentar a qualidade dos dados dos sensores. Esta metodologia tem uma estrutura genérica de forma a ser aplicada a uma qualquer rede de sensores ambiental ou ao respectivo conjunto de dados obtido pelos sensores desta. A metodologia é avaliada através de vários conjuntos de dados de diferentes RSSF, em que aplicámos técnicas de aprendizagem automática para modelar o comportamento de cada sensor, com base na exploração das correlações existentes entre os dados obtidos pelos sensores da rede. O objetivo é a aplicação de estratégias de fusão de dados para a deteção de potenciais falhas em cada sensor e, simultaneamente, a distinção de medições verdadeiramente defeituosas de desvios derivados de eventos naturais. Este objectivo é cumprido através da aplicação bem sucedida da metodologia para detetar e corrigir outliers, offsets e drifts em conjuntos de dados reais obtidos por redes de sensores. No futuro, a metodologia pode ser aplicada para otimizar os processos de controlo da qualidade de dados quer de novos sistemas de monitorização, quer de redes de sensores já em funcionamento, bem como para auxiliar operações de manutenção das redes.Laboratório Nacional de Engenharia Civi

    A Comprehensive Survey of Data Mining-based Fraud Detection Research

    Full text link
    This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.Comment: 14 page
    corecore