75 research outputs found

    Effectiveness evaluation of data mining based IDS

    Get PDF
    Proceeding of: 6th Industrial Conference on Data Mining, ICDM 2006, Leipzig, Germany, July 14-15, 2006.Data mining has been widely applied to the problem of Intrusion Detection in computer networks. However, the misconception of the underlying problem has led to out of context results. This paper shows that factors such as the probability of intrusion and the costs of responding to detected intrusions must be taken into account in order to compare the effectiveness of machine learning algorithms over the intrusion detection domain. Furthermore, we show the advantages of combining different detection techniques. Results regarding the well known 1999 KDD dataset are shown.Publicad

    Decision Support for Road Safety: Development of Key Performance Indicators for Police Analysts

    Get PDF
    In 2017, five out of 100,000 people were killed by road accidents in Europe. In order to reduce this number with appropriate measures, the police nowadays manually defines combinations of accident attributes (e. g., accidents on slippery road surfaces at night), which then form the basis for tracking the number of accidents over time. The aim of this paper is to combine the following data analysis approaches in order to detect interesting attribute combinations, also referred to as “itemsets”, relevant for current and future observations. The resulting combinations are proposed to the police as new key performance indicators and can also be used directly for planning police measures to increase road safety. A four-stage decision support system is introduced that employs frequent itemset mining in the first stage. The temporal aspect of traffic accident data is illustrated by time series containing, for each itemset, the relative frequencies of accidents with the corresponding attribute combination. In the second step, the time series are grouped according to their shape by time series clustering and classification. In the third step, we determine the optimal forecasting method for each generated cluster of time series. Based on the prediction of future frequencies, we identify the most interesting attribute combinations in the last step. These are displayed geographically so that a police analyst can easily identify current and developing hot spots

    Cascade evaluation of clustering algorithm

    Get PDF
    International audienceThis paper is about the evaluation of the results of clustering algorithms, and the comparison of such algorithms. We propose a new method based on the enrichment of a set of independent labeled datasets by the results of clustering, and the use of a supervised method to evaluate the interest of adding such new information to the datasets. We thus adapt the cascade generalization paradigm in the case where we combine an unsupervised and a supervised learner. We also consider the case where independent supervised learnings are performed on the different groups of data objects created by the clustering. We then conduct experiments using different supervised algorithms to compare various clustering algorithms. And we thus show that our proposed method exhibits a coherent behavior, pointing out, for example, that the algorithms based on the use of complex probabilistic models outperform algorithms based on the use of simpler models

    Building a Document Corpus for Manufacturing Knowledge Retrieval

    Get PDF
    When faced with challenging technical problems, R&D personnel would often turn to technical papers to seek inspiration for a solution. The building of a corpus of such papers and the easy retrieval of relevant papers by the user in his query is an area that has not been systematically dealt with. This is an attempt to build such a corpus for manufacturing R&D personnel. Manufacturing Corpus Version 1 (MCV1) is an archive of more than 1400 relevant manufacturing engineering papers between 1998 and 2000. In this paper, the origins and motivation of building MCV1 is discussed. The innovative coding process which is specially designed for manufacturing companies will be presented. All other relevant issues, like coding policy, category codes and input documents, will be explained. Finally, two quality indicators which integrate all concerns about coding quality will be examined.Singapore-MIT Alliance (SMA

    Cluster Analysis of Smart Metering Data - An Implementation in Practice

    Get PDF
    The introduction of smart meter technology is a great challenge for the German energy industry. It requires not only large investments in the communication and metering infrastructure, but also a redesign of traditional business processes. The newly incurring costs cannot be fully passed on to the end customers. One option to counterbalance these expenses is to exploit the newly generated smart metering data for the creation of new services and improved processes. For instance, performing a cluster analysis of smart metering data focused on the customers’ time-based consumption behavior allows for a detailed customer segmentation. In the article we present a cluster analysis performed on real-world consumption data from a smart meter project conducted by a German regional utilities company. We show how to integrate a cluster analysis approach into a business intelligence environment and evaluate this artifact as defined by design science. We discuss the results of the cluster analysis and highlight options to apply them to segment-specific tariff design

    Knowledge extraction from medium voltage load diagrams to support the definition of electrical tariffs

    Get PDF
    With the electricity market liberalization, distribution and retail companies are looking for better market strategies based on adequate information upon the consumption patterns of its electricity customers. In this environment all consumers are free to choose their electricity supplier. A fair insight on the customer´s behaviour will permit the definition of specific contract aspects based on the different consumption patterns. In this paper Data Mining (DM) techniques are applied to electricity consumption data from a utility client’s database. To form the different customer´s classes, and find a set of representative consumption patterns, we have used the Two-Step algorithm which is a hierarchical clustering algorithm. Each consumer class will be represented by its load profile resulting from the clustering operation. Next, to characterize each consumer class a classification model will be constructed with the C5.0 classification algorithm

    ORE - A Tool for Repairing and Enriching Knowledge Bases

    Full text link
    • …
    corecore