Search CORE

13,282 research outputs found

A systematic review of data quality issues in knowledge discovery tasks

Author: Corrales David Camilo
Corrales Juan Carlos
Ledezma Agapito Ismael
Publication venue: 'Universidad de Medellin'
Publication date: 07/11/2015
Field of study

Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad de Medellín: Revistas Científicas

Repositorio Institucional Universidad de Medellín

DIALNET

Data-driven Soft Sensors in the Process Industry

Author: Abdi
Alhoniemi
Angelov
Angelov
Angelov
Arazo-Bravo
Atkeson
Bastin
Bauer
Bishop
Bogdan Gabrys
Bonne
Breiman
Bro
Casali
Chen
Chen
Chen
Chen
Chen
Choi
Chruy
Davies
Dayal
De Wolf
Desai
Devogelaere
Ding
Dong
Dong
Dote
Doyle
Dunia
Dunia
Dunia
Eriksson
Fellner
Fortuna
Fortuna
Frank
Freund
Funahashi
Gabrielsson
Gabrys
Gabrys
Gabrys
Gabrys
Gama
Geladi
Gomez
Gonzalez
Gonzalez
Goodwin
Gosset
Guyon
Han
Hastie
He
Hodge
Hotelling
Jackson
James
Jang
Jiang
Jolliffe
Jordaan
Jos de Assis
Kadlec
Kadlec
Kalos
Kampjarvi
Kittler
Kohavi
Kohonen
Kordon
Kourti
Kourti
Krogh
Kuncheva
Lee
Lee
Lee
Lee
Li
Li
Lin
Lin
Luo
Macias
Mandic
Marjanovic
Meleiro
Menold
Nauck
Neogi
Nomikos
Nomikos
Nomikos
Opitz
Park
Pearson
Pearson
Petr Kadlec
Poggio
Prasad
Principe
Qin
Qin
Qin
Qin
Radhakrishnan
Rnnar
Rong
Rotem
Ruta
Ruta
Schafer
Scheffer
Serneels
Sibylle Strandt
Stanimirova
Su
Tzanakou
van Sprang
van Sprang
Vapnik
Venkatasubramanian
Venkatasubramanian
Venkatasubramanian
Vilalta
Walczak
Walczak
Walczak
Wang
Wang
Wang
Wang
Warne
Weiss
Widmer
Wold
Wold
Wold
Wolpert
Yan
Yang
Zadeh
Zamprogna
Zamprogna
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/04/2009
Field of study

In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work

Crossref

Bournemouth University Research Online

Nature-Inspired Adaptive Architecture for Soft Sensor Modelling

Author: Gabrys Bogdan
Kadlec Petr
Publication venue
Publication date
Field of study

This paper gives a general overview of the challenges present in the research field of Soft Sensor building and proposes a novel architecture for building of Soft Sensors, which copes with the identified challenges. The architecture is inspired and making use of nature-related techniques for computational intelligence. Another aspect, which is addressed by the proposed architecture, are the identified characteristics of the process industry data. The data recorded in the process industry consist usually of certain amount of missing values or sample exceeding meaningful values of the measurements, called data outliers. Other process industry data properties causing problems for the modelling are the collinearity of the data, drifting data and the different sampling rates of the particular hardware sensors. It is these characteristics which are the source of the need for an adaptive behaviour of Soft Sensors. The architecture reflects this need and provides mechanisms for the adaptation and evolution of the Soft Sensor at different levels. The adaptation capabilities are provided by maintaining a variety of rather simple models. These particular models, called paths in terms of the architecture, can for example focus on different partition of the input data space, or provide different adaptation speeds to changes in the data. The actual modelling techniques involved into the architecture are data-driven computational learning approaches like artificial neural networks, principal component regression, etc

Bournemouth University Research Online

Application of Computational Intelligence Techniques to Process Industry Problems

Author: Gabrys Bogdan
Kadlec Petr
Publication venue: EXIT Publishing House
Publication date: 01/01/2008
Field of study

In the last two decades there has been a large progress in the computational intelligence research field. The fruits of the effort spent on the research in the discussed field are powerful techniques for pattern recognition, data mining, data modelling, etc. These techniques achieve high performance on traditional data sets like the UCI machine learning database. Unfortunately, this kind of data sources usually represent clean data without any problems like data outliers, missing values, feature co-linearity, etc. common to real-life industrial data. The presence of faulty data samples can have very harmful effects on the models, for example if presented during the training of the models, it can either cause sub-optimal performance of the trained model or in the worst case destroy the so far learnt knowledge of the model. For these reasons the application of present modelling techniques to industrial problems has developed into a research field on its own. Based on the discussion of the properties and issues of the data and the state-of-the-art modelling techniques in the process industry, in this paper a novel unified approach to the development of predictive models in the process industry is presented

Bournemouth University Research Online

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

Designing algorithms to aid discovery by chemical robots

Author: Cronin Leroy
Gromski Piotr S.
Henson Alon B.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 25/07/2018
Field of study

Recently, automated robotic systems have become very efficient, thanks to improved coupling between sensor systems and algorithms, of which the latter have been gaining significance thanks to the increase in computing power over the past few decades. However, intelligent automated chemistry platforms for discovery orientated tasks need to be able to cope with the unknown, which is a profoundly hard problem. In this Outlook, we describe how recent advances in the design and application of algorithms, coupled with the increased amount of chemical data available, and automation and control systems may allow more productive chemical research and the development of chemical robots able to target discovery. This is shown through examples of workflow and data processing with automation and control, and through the use of both well-used and cutting-edge algorithms illustrated using recent studies in chemistry. Finally, several algorithms are presented in relation to chemical robots and chemical intelligence for knowledge discovery

Enlighten

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato