3,340 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Data mining as a tool for environmental scientists

    Get PDF
    Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

    Development of an intelligent dynamic modelling system for the diagnosis of wastewater treatment processes

    Get PDF
    In the 21st Century, water is already a limited and valuable resource, in particular the limited availability of fresh water sources. The projected increase in global population from 6 billion people in 2010 to 9 billion in 2050 will only increase the need for additional water sources to be identified and used. This situation is common in many countries and is frequently exacerbated by drought conditions. Water management planning requires both the efficient use of water sources and, increasingly, the re-use of domestic and industrial wastewaters. A large body of published research spanning several decades is available, and this research study looks specifically at ways of improving the operation of wastewater treatment processes.Process fault diagnosis is a major challenge for the chemical and process industries, and is also important for wastewater treatment processes. Significant economic and environmental losses can be attributed to inappropriate Abnormal Event Management (AEM) in a chemical/processing operation, and this has been the focus of many researchers. Many researchers are now focusing on the application of several fault diagnosis techniques simultaneously in order to improve and overcome the limitations experienced by the individual techniques. This approach requires resolution of the conflicts ascribed to the individual methods, and incurs additional costs and resources when employing more than one technique. The research study presented in this thesis details a new method of using the available techniques. The proposal is to use different techniques in different roles within the diagnostic approach based upon their inherent individual strengths. The techniques that are excellent for the detection of a fault should be employed in the fault detection, and those best applied to diagnosis are used in the diagnosis section of a diagnostic system.Two different techniques are used here, namely a mathematical model and data mining are used for detection and diagnosis respectively. A mathematical model is used which is based upon the principal of analytical redundancy in order to establish the presence of a fault in a process (the fault detection), and data mining is used to produce production rules derived from the historical data for the diagnosis. A dataset from an industrial wastewater treatment facility is used in this study.A diagnostic algorithm has been developed that employs the techniques identified above. An application in Java was constructed which allows the algorithm to be applied, eventually producing an intelligent modelling agent. Thus the focus of this research work was to develop an intelligent dynamic modelling system (using components such as mathematical model, data mining, diagnostic algorithm, and the dataset) for simulation of, and diagnosis of faults in, a wastewater treatment process where different techniques will be assigned different roles in the diagnostic system.Results presented in Chapter 5 (section 5.5) show that the application of this combined technique yields better results for detection and diagnosis of faults in a process. Furthermore, the dynamic update of the set value for any process variable (presented in Chapter 5, section 5.2.1) makes possible the detection of any process disturbance for the algorithm, thereby mitigating the issue of false alarms. The successful embedding of both a detection and a diagnostic technique in a single algorithm is a key achievement of this work, thus reducing the time taken to detect and diagnose a fault. In addition, the implementation of the algorithm in the purposebuilt software platform proved its practical application and potential to be used in the chemical and processing industries

    aTLP: a color-based model of uncertainty to evaluate the risk of decisions based on prototypes

    Get PDF
    Clustering techniques find homogeneous and distinguishable prototypes. Careful interpretation of these prototypes is crucial to assist the experts to better organize this know-how and to really improve their decision-making processes. The Traffic Lights Panel was introduced in 2009 as a postprocessing tool to provide understanding of clustering prototypes. In this work, annotated Traffic Lights Panel (aTLP) is presented as an enrichment of the TLP to manage the intrinsic uncertainty related with prototypes themselves. The aTLP handles uncertainty through a quantification of the prototypes' purity based on the variation coefficients (VC) and an associated color-based uncertainty model, with two dimensions - tone and saturation - representing nominal trend and purity of the prototype. An application to a waste-water treatment plant in Slovenia, in a discrete and continuous approach, suggests that aTLP seems a useful and friendly tool able to reduce the gap between data mining and effective decision support, towards informed-decisions.Peer ReviewedPostprint (author's final draft

    Modelling activated sludge wastewater treatment plants using artificial intelligence techniques (fuzzy logic and neural networks)

    Get PDF
    Activated sludge process (ASP) is the most commonly used biological wastewater treatment system. Mathematical modelling of this process is important for improving its treatment efficiency and thus the quality of the effluent released into the receiving water body. This is because the models can help the operator to predict the performance of the plant in order to take cost-effective and timely remedial actions that would ensure consistent treatment efficiency and meeting discharge consents. However, due to the highly complex and non-linear characteristics of this biological system, traditional mathematical modelling of this treatment process has remained a challenge. This thesis presents the applications of Artificial Intelligence (AI) techniques for modelling the ASP. These include the Kohonen Self Organising Map (KSOM), backpropagation artificial neural networks (BPANN), and adaptive network based fuzzy inference system (ANFIS). A comparison between these techniques has been made and the possibility of the hybrids between them was also investigated and tested. The study demonstrated that AI techniques offer viable, flexible and effective modelling methodology alternative for the activated sludge system. The KSOM was found to be an attractive tool for data preparation because it can easily accommodate missing data and outliers and because of its power in extracting salient features from raw data. As a consequence of the latter, the KSOM offers an excellent tool for the visualisation of high dimensional data. In addition, the KSOM was used to develop a software sensor to predict biological oxygen demand. This soft-sensor represents a significant advance in real-time BOD operational control by offering a very fast estimation of this important wastewater parameter when compared to the traditional 5-days bio-essay BOD test procedure. Furthermore, hybrids of KSOM-ANN and KSOM-ANFIS were shown to result much more improved model performance than using the respective modelling paradigms on their own.Damascus Universit

    Machine learning techniques implementation in power optimization, data processing, and bio-medical applications

    Get PDF
    The rapid progress and development in machine-learning algorithms becomes a key factor in determining the future of humanity. These algorithms and techniques were utilized to solve a wide spectrum of problems extended from data mining and knowledge discovery to unsupervised learning and optimization. This dissertation consists of two study areas. The first area investigates the use of reinforcement learning and adaptive critic design algorithms in the field of power grid control. The second area in this dissertation, consisting of three papers, focuses on developing and applying clustering algorithms on biomedical data. The first paper presents a novel modelling approach for demand side management of electric water heaters using Q-learning and action-dependent heuristic dynamic programming. The implemented approaches provide an efficient load management mechanism that reduces the overall power cost and smooths grid load profile. The second paper implements an ensemble statistical and subspace-clustering model for analyzing the heterogeneous data of the autism spectrum disorder. The paper implements a novel k-dimensional algorithm that shows efficiency in handling heterogeneous dataset. The third paper provides a unified learning model for clustering neuroimaging data to identify the potential risk factors for suboptimal brain aging. In the last paper, clustering and clustering validation indices are utilized to identify the groups of compounds that are responsible for plant uptake and contaminant transportation from roots to plants edible parts --Abstract, page iv

    Pertanika Journal of Science & Technology

    Get PDF

    Pertanika Journal of Science & Technology

    Get PDF

    Review of Anaerobic Digestion Modeling and Optimization Using Nature-Inspired Techniques

    Get PDF
    Although it is a well-researched topic, the complexity, time for process stabilization, and economic factors related to anaerobic digestion call for simulation of the process offline with the help of computer models. Nature-inspired techniques are a recently developed branch of artificial intelligence wherein knowledge is transferred from natural systems to engineered systems. For soft computing applications, nature-inspired techniques have several advantages, including scope for parallel computing, dynamic behavior, and self-organization. This paper presents a comprehensive review of such techniques and their application in anaerobic digestion modeling. We compiled and synthetized the literature on the applications of nature-inspired techniques applied to anaerobic digestion. These techniques provide a balance between diversity and speed of arrival at the optimal solution, which has stimulated their use in anaerobic digestion modeling

    A survey on pre-processing techniques: relevant issues in the context of environmental data mining

    Get PDF
    One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft
    corecore