1,656 research outputs found

    Learning Interpretable Rules for Multi-label Classification

    Full text link
    Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further informatio

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Data mining in manufacturing: a review based on the kind of knowledge

    Get PDF
    In modern manufacturing environments, vast amounts of data are collected in database management systems and data warehouses from all involved areas, including product and process design, assembly, materials planning, quality control, scheduling, maintenance, fault detection etc. Data mining has emerged as an important tool for knowledge acquisition from the manufacturing databases. This paper reviews the literature dealing with knowledge discovery and data mining applications in the broad domain of manufacturing with a special emphasis on the type of functions to be performed on the data. The major data mining functions to be performed include characterization and description, association, classification, prediction, clustering and evolution analysis. The papers reviewed have therefore been categorized in these five categories. It has been shown that there is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years. This review reveals the progressive applications and existing gaps identified in the context of data mining in manufacturing. A novel text mining approach has also been used on the abstracts and keywords of 150 papers to identify the research gaps and find the linkages between knowledge area, knowledge type and the applied data mining tools and techniques

    Explainable Artificial Intelligence and Causal Inference based ATM Fraud Detection

    Full text link
    Gaining the trust of customers and providing them empathy are very critical in the financial domain. Frequent occurrence of fraudulent activities affects these two factors. Hence, financial organizations and banks must take utmost care to mitigate them. Among them, ATM fraudulent transaction is a common problem faced by banks. There following are the critical challenges involved in fraud datasets: the dataset is highly imbalanced, the fraud pattern is changing, etc. Owing to the rarity of fraudulent activities, Fraud detection can be formulated as either a binary classification problem or One class classification (OCC). In this study, we handled these techniques on an ATM transactions dataset collected from India. In binary classification, we investigated the effectiveness of various over-sampling techniques, such as the Synthetic Minority Oversampling Technique (SMOTE) and its variants, Generative Adversarial Networks (GAN), to achieve oversampling. Further, we employed various machine learning techniques viz., Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting Tree (GBT), Multi-layer perceptron (MLP). GBT outperformed the rest of the models by achieving 0.963 AUC, and DT stands second with 0.958 AUC. DT is the winner if the complexity and interpretability aspects are considered. Among all the oversampling approaches, SMOTE and its variants were observed to perform better. In OCC, IForest attained 0.959 CR, and OCSVM secured second place with 0.947 CR. Further, we incorporated explainable artificial intelligence (XAI) and causal inference (CI) in the fraud detection framework and studied it through various analyses.Comment: 34 pages; 21 Figures; 8 Table

    Applications of Artificial Intelligence in Power Systems

    Get PDF
    Artificial intelligence tools, which are fast, robust and adaptive can overcome the drawbacks of traditional solutions for several power systems problems. In this work, applications of AI techniques have been studied for solving two important problems in power systems. The first problem is static security evaluation (SSE). The objective of SSE is to identify the contingencies in planning and operations of power systems. Numerical conventional solutions are time-consuming, computationally expensive, and are not suitable for online applications. SSE may be considered as a binary-classification, multi-classification or regression problem. In this work, multi-support vector machine is combined with several evolutionary computation algorithms, including particle swarm optimization (PSO), differential evolution, Ant colony optimization for the continuous domain, and harmony search techniques to solve the SSE. Moreover, support vector regression is combined with modified PSO with a proposed modification on the inertia weight in order to solve the SSE. Also, the correct accuracy of classification, the speed of training, and the final cost of using power equipment heavily depend on the selected input features. In this dissertation, multi-object PSO has been used to solve this problem. Furthermore, a multi-classifier voting scheme is proposed to get the final test output. The classifiers participating in the voting scheme include multi-SVM with different types of kernels and random forests with an adaptive number of trees. In short, the development and performance of different machine learning tools combined with evolutionary computation techniques have been studied to solve the online SSE. The performance of the proposed techniques is tested on several benchmark systems, namely the IEEE 9-bus, 14-bus, 39-bus, 57-bus, 118-bus, and 300-bus power systems. The second problem is the non-convex, nonlinear, and non-differentiable economic dispatch (ED) problem. The purpose of solving the ED is to improve the cost-effectiveness of power generation. To solve ED with multi-fuel options, prohibited operating zones, valve point effect, and transmission line losses, genetic algorithm (GA) variant-based methods, such as breeder GA, fast navigating GA, twin removal GA, kite GA, and United GA are used. The IEEE systems with 6-units, 10-units, and 15-units are used to study the efficiency of the algorithms

    Applications of Artificial Intelligence in Power Systems

    Get PDF
    Artificial intelligence tools, which are fast, robust and adaptive can overcome the drawbacks of traditional solutions for several power systems problems. In this work, applications of AI techniques have been studied for solving two important problems in power systems. The first problem is static security evaluation (SSE). The objective of SSE is to identify the contingencies in planning and operations of power systems. Numerical conventional solutions are time-consuming, computationally expensive, and are not suitable for online applications. SSE may be considered as a binary-classification, multi-classification or regression problem. In this work, multi-support vector machine is combined with several evolutionary computation algorithms, including particle swarm optimization (PSO), differential evolution, Ant colony optimization for the continuous domain, and harmony search techniques to solve the SSE. Moreover, support vector regression is combined with modified PSO with a proposed modification on the inertia weight in order to solve the SSE. Also, the correct accuracy of classification, the speed of training, and the final cost of using power equipment heavily depend on the selected input features. In this dissertation, multi-object PSO has been used to solve this problem. Furthermore, a multi-classifier voting scheme is proposed to get the final test output. The classifiers participating in the voting scheme include multi-SVM with different types of kernels and random forests with an adaptive number of trees. In short, the development and performance of different machine learning tools combined with evolutionary computation techniques have been studied to solve the online SSE. The performance of the proposed techniques is tested on several benchmark systems, namely the IEEE 9-bus, 14-bus, 39-bus, 57-bus, 118-bus, and 300-bus power systems. The second problem is the non-convex, nonlinear, and non-differentiable economic dispatch (ED) problem. The purpose of solving the ED is to improve the cost-effectiveness of power generation. To solve ED with multi-fuel options, prohibited operating zones, valve point effect, and transmission line losses, genetic algorithm (GA) variant-based methods, such as breeder GA, fast navigating GA, twin removal GA, kite GA, and United GA are used. The IEEE systems with 6-units, 10-units, and 15-units are used to study the efficiency of the algorithms
    corecore