14,461 research outputs found

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules

    Full text link
    The need to prediscretize numeric attributes before they can be used in association rule learning is a source of inefficiencies in the resulting classifier. This paper describes several new rule tuning steps aiming to recover information lost in the discretization of numeric (quantitative) attributes, and a new rule pruning strategy, which further reduces the size of the classification models. We demonstrate the effectiveness of the proposed methods on postoptimization of models generated by three state-of-the-art association rule classification algorithms: Classification based on Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016), and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from the UCI repository show that the postoptimized models are consistently smaller -- typically by about 50% -- and have better classification performance on most datasets

    Feature extraction and classification of movie reviews

    Get PDF

    Review and Comparison of Intelligent Optimization Modelling Techniques for Energy Forecasting and Condition-Based Maintenance in PV Plants

    Get PDF
    Within the field of soft computing, intelligent optimization modelling techniques include various major techniques in artificial intelligence. These techniques pretend to generate new business knowledge transforming sets of "raw data" into business value. One of the principal applications of these techniques is related to the design of predictive analytics for the improvement of advanced CBM (condition-based maintenance) strategies and energy production forecasting. These advanced techniques can be used to transform control system data, operational data and maintenance event data to failure diagnostic and prognostic knowledge and, ultimately, to derive expected energy generation. One of the systems where these techniques can be applied with massive potential impact are the legacy monitoring systems existing in solar PV energy generation plants. These systems produce a great amount of data over time, while at the same time they demand an important e ort in order to increase their performance through the use of more accurate predictive analytics to reduce production losses having a direct impact on ROI. How to choose the most suitable techniques to apply is one of the problems to address. This paper presents a review and a comparative analysis of six intelligent optimization modelling techniques, which have been applied on a PV plant case study, using the energy production forecast as the decision variable. The methodology proposed not only pretends to elicit the most accurate solution but also validates the results, in comparison with the di erent outputs for the di erent techniques

    Handling Concept Drift for Predictions in Business Process Mining

    Get PDF
    Predictive services nowadays play an important role across all business sectors. However, deployed machine learning models are challenged by changing data streams over time which is described as concept drift. Prediction quality of models can be largely influenced by this phenomenon. Therefore, concept drift is usually handled by retraining of the model. However, current research lacks a recommendation which data should be selected for the retraining of the machine learning model. Therefore, we systematically analyze different data selection strategies in this work. Subsequently, we instantiate our findings on a use case in process mining which is strongly affected by concept drift. We can show that we can improve accuracy from 0.5400 to 0.7010 with concept drift handling. Furthermore, we depict the effects of the different data selection strategies

    KACST Arabic Text Classification Project: Overview and Preliminary Results

    No full text
    Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques
    corecore