27 research outputs found

    Practical data mining in a large utility company

    Get PDF
    We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present

    Practical data mining in a large utility company

    Get PDF
    We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present

    Recovering Multiple Nonnegative Time Series From a Few Temporal Aggregates

    Full text link
    Motivated by electricity consumption metering, we extend existing nonnegative matrix factorization (NMF) algorithms to use linear measurements as observations, instead of matrix entries. The objective is to estimate multiple time series at a fine temporal scale from temporal aggregates measured on each individual series. Furthermore, our algorithm is extended to take into account individual autocorrelation to provide better estimation, using a recent convex relaxation of quadratically constrained quadratic program. Extensive experiments on synthetic and real-world electricity consumption datasets illustrate the effectiveness of our matrix recovery algorithms

    Sliding HyperLogLog: Estimating cardinality in a data stream

    Get PDF
    In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al to the data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/\sqrt{m} (the same as in HyperLogLog algorithm). As the new algorithm answers more flexible queries, it needs an additional memory storage compared to HyerLogLog algorithm. It is proved that this additional memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with an additional memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic

    Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation

    Full text link
    We propose in this paper an exploratory analysis algorithm for functional data. The method partitions a set of functions into KK clusters and represents each cluster by a simple prototype (e.g., piecewise constant). The total number of segments in the prototypes, PP, is chosen by the user and optimally distributed among the clusters via two dynamic programming algorithms. The practical relevance of the method is shown on two real world datasets

    Open challenges for Machine Learning based Early Decision-Making research

    Full text link
    More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularly studied in the field of Early Time Series Classification. This paper introduces a more general problem, called Machine Learning based Early Decision Making (ML-EDM), which consists in optimizing the decision times of models in a wide range of settings where data is collected over time. After defining the ML-EDM problem, ten challenges are identified and proposed to the scientific community to further research in this area. These challenges open important application perspectives, discussed in this paper

    Practical Data Mining in a large utility company

    No full text
    We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are presen

    Practical data mining in a large utility company

    No full text
    We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present
    corecore