3,615 research outputs found

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafĂ­o mas fundamental es la exploraciĂłn de los grandes volĂșmenes de datos y la extracciĂłn de conocimiento Ăștil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisiĂłn sistemĂĄtica de los asuntos de calidad de datos en las ĂĄreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrĂ­cola conocida como la roya del cafĂ©.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

    Fuzzy Modeling of Geospatial Patterns

    Get PDF

    Machine learning techniques implementation in power optimization, data processing, and bio-medical applications

    Get PDF
    The rapid progress and development in machine-learning algorithms becomes a key factor in determining the future of humanity. These algorithms and techniques were utilized to solve a wide spectrum of problems extended from data mining and knowledge discovery to unsupervised learning and optimization. This dissertation consists of two study areas. The first area investigates the use of reinforcement learning and adaptive critic design algorithms in the field of power grid control. The second area in this dissertation, consisting of three papers, focuses on developing and applying clustering algorithms on biomedical data. The first paper presents a novel modelling approach for demand side management of electric water heaters using Q-learning and action-dependent heuristic dynamic programming. The implemented approaches provide an efficient load management mechanism that reduces the overall power cost and smooths grid load profile. The second paper implements an ensemble statistical and subspace-clustering model for analyzing the heterogeneous data of the autism spectrum disorder. The paper implements a novel k-dimensional algorithm that shows efficiency in handling heterogeneous dataset. The third paper provides a unified learning model for clustering neuroimaging data to identify the potential risk factors for suboptimal brain aging. In the last paper, clustering and clustering validation indices are utilized to identify the groups of compounds that are responsible for plant uptake and contaminant transportation from roots to plants edible parts --Abstract, page iv

    An Experimental Study on Microarray Expression Data from Plants under Salt Stress by using Clustering Methods

    Get PDF
    Current Genome-wide advancements in Gene chips technology provide in the “Omics (genomics, proteomics and transcriptomics) research”, an opportunity to analyze the expression levels of thousand of genes across multiple experiments. In this regard, many machine learning approaches were proposed to deal with this deluge of information. Clustering methods are one of these approaches. Their process consists of grouping data (gene profiles) into homogeneous clusters using distance measurements. Various clustering techniques are applied, but there is no consensus for the best one. In this context, a comparison of seven clustering algorithms was performed and tested against the gene expression datasets of three model plants under salt stress. These techniques are evaluated by internal and relative validity measures. It appears that the AGNES algorithm is the best one for internal validity measures for the three plant datasets. Also, K-Means profiles a trend for relative validity measures for these datasets

    QUALITATIVE ANSWERING SURVEYS AND SOFT COMPUTING

    Get PDF
    In this work, we reflect on some questions about the measurement problem in economics and, especially, their relationship with the scientific method. Statistical sources frequently used by economists contain qualitative information obtained from verbal expressions of individuals by means of surveys, and we discuss the reasons why it would be more adequately analyzed with soft methods than with traditional ones. Some comments on the most commonly applied techniques in the analysis of these types of data with verbal answers are followed by our proposal to compute with words. In our view, an alternative use of the well known Income Evaluation Question seems especially suggestive for a computing with words approach, since it would facilitate an empirical estimation of the corresponding linguistic variable adjectives. A new treatment of the information contained in such surveys would avoid some questions incorporated in the so called Leyden approach that do not fit to the actual world.Computing with words, Leyden approach, qualitative answering surveys, fuzzy logic

    Diagnosis of Parkinson’s Disease using Fuzzy C-Means Clustering and Pattern Recognition

    Get PDF
    Parkinson’s disease (PD) is a global public health problem of enormous dimension. In this study, we aimed to discriminate between healthy people and people with Parkinson’s disease (PD). Various studies revealed, that voice is one of the earliest indicator of PD, and for that reason, Parkinson dataset that contains biomedical voice of human is used. The main goal of this paper is to automatically detect whether the speech/voice of a person is affected by PD. We examined the performance of fuzzy c-means (FCM) clustering and pattern recognition methods on Parkinson’s disease dataset. The first method has the main aim to distinguish performance between two classes, when trying to differentiate between normal speaking persons and speakers with PD. This method could greatly be improved by classifying data first and then testing new data using these two patterns. Thus, second method used here is pattern recognition. The experimental results have demonstrated that the combination of the fuzzy c-means method and pattern recognition obtained promising results for the classification of PD

    A Simulation Model for Strategic Planning In Asset Management of Electricity Distribution Network

    Get PDF
    Asset management of electricity distribution network is required in order to improve the network reliability so as to reduce electricity energy distribution losses. Due to strategic asset management requires long-term predictions; it would require a simulation model. Simulation of asset management is an approach to predict the consequences of long-term financing on maintenance and renewal strategies in electrical energy distribution networks. In this research, the simulation method used is System Dynamics based on consideration that this method enables us to consider internal and external influenced factors. To obtain the model parameter, we utilized PLN Pamekasan for the case study. The results showed the reduction of low voltage network assets condition on average in the range 6% per year, the average decline in the transformer condition is approximately 6.6% per year, and the average decline in the condition of medium voltage network assets is approximately 4.4% per year. In general, the average technical losses average of 1,359,981.60 KWH / month or about 16,319,779.24 KWH / year
    • 

    corecore