13,282 research outputs found

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafĂ­o mas fundamental es la exploraciĂłn de los grandes volĂșmenes de datos y la extracciĂłn de conocimiento Ăștil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisiĂłn sistemĂĄtica de los asuntos de calidad de datos en las ĂĄreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrĂ­cola conocida como la roya del cafĂ©.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Data-driven Soft Sensors in the Process Industry

    Get PDF
    In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

    Nature-Inspired Adaptive Architecture for Soft Sensor Modelling

    Get PDF
    This paper gives a general overview of the challenges present in the research field of Soft Sensor building and proposes a novel architecture for building of Soft Sensors, which copes with the identified challenges. The architecture is inspired and making use of nature-related techniques for computational intelligence. Another aspect, which is addressed by the proposed architecture, are the identified characteristics of the process industry data. The data recorded in the process industry consist usually of certain amount of missing values or sample exceeding meaningful values of the measurements, called data outliers. Other process industry data properties causing problems for the modelling are the collinearity of the data, drifting data and the different sampling rates of the particular hardware sensors. It is these characteristics which are the source of the need for an adaptive behaviour of Soft Sensors. The architecture reflects this need and provides mechanisms for the adaptation and evolution of the Soft Sensor at different levels. The adaptation capabilities are provided by maintaining a variety of rather simple models. These particular models, called paths in terms of the architecture, can for example focus on different partition of the input data space, or provide different adaptation speeds to changes in the data. The actual modelling techniques involved into the architecture are data-driven computational learning approaches like artificial neural networks, principal component regression, etc

    Application of Computational Intelligence Techniques to Process Industry Problems

    Get PDF
    In the last two decades there has been a large progress in the computational intelligence research field. The fruits of the effort spent on the research in the discussed field are powerful techniques for pattern recognition, data mining, data modelling, etc. These techniques achieve high performance on traditional data sets like the UCI machine learning database. Unfortunately, this kind of data sources usually represent clean data without any problems like data outliers, missing values, feature co-linearity, etc. common to real-life industrial data. The presence of faulty data samples can have very harmful effects on the models, for example if presented during the training of the models, it can either cause sub-optimal performance of the trained model or in the worst case destroy the so far learnt knowledge of the model. For these reasons the application of present modelling techniques to industrial problems has developed into a research field on its own. Based on the discussion of the properties and issues of the data and the state-of-the-art modelling techniques in the process industry, in this paper a novel unified approach to the development of predictive models in the process industry is presented

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Designing algorithms to aid discovery by chemical robots

    Get PDF
    Recently, automated robotic systems have become very efficient, thanks to improved coupling between sensor systems and algorithms, of which the latter have been gaining significance thanks to the increase in computing power over the past few decades. However, intelligent automated chemistry platforms for discovery orientated tasks need to be able to cope with the unknown, which is a profoundly hard problem. In this Outlook, we describe how recent advances in the design and application of algorithms, coupled with the increased amount of chemical data available, and automation and control systems may allow more productive chemical research and the development of chemical robots able to target discovery. This is shown through examples of workflow and data processing with automation and control, and through the use of both well-used and cutting-edge algorithms illustrated using recent studies in chemistry. Finally, several algorithms are presented in relation to chemical robots and chemical intelligence for knowledge discovery

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
    • 

    corecore