29 research outputs found
Practical data mining in a large utility company
We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present
Practical data mining in a large utility company
We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present
Recovering Multiple Nonnegative Time Series From a Few Temporal Aggregates
Motivated by electricity consumption metering, we extend existing nonnegative
matrix factorization (NMF) algorithms to use linear measurements as
observations, instead of matrix entries. The objective is to estimate multiple
time series at a fine temporal scale from temporal aggregates measured on each
individual series. Furthermore, our algorithm is extended to take into account
individual autocorrelation to provide better estimation, using a recent convex
relaxation of quadratically constrained quadratic program. Extensive
experiments on synthetic and real-world electricity consumption datasets
illustrate the effectiveness of our matrix recovery algorithms
Sliding HyperLogLog: Estimating cardinality in a data stream
In this paper, a new algorithm estimating the number of active flows in a data stream is proposed. This algorithm adapts the HyperLogLog algorithm of Flajolet et al to the data stream processing by adding a sliding window mechanism. It has the advantage to estimate at any time the number of flows seen over any duration bounded by the length of the sliding window. The estimate is very accurate with a standard error of about 1.04/\sqrt{m} (the same as in HyperLogLog algorithm). As the new algorithm answers more flexible queries, it needs an additional memory storage compared to HyerLogLog algorithm. It is proved that this additional memory is at most equal to 5mln(n/m) bytes, where n is the real number of flows in the sliding window. For instance, with an additional memory of only 35kB, a standard error of about 3% can be achieved for a data stream of several million flows. Theoretical results are validated on both real and synthetic traffic
Exploratory Analysis of Functional Data via Clustering and Optimal Segmentation
We propose in this paper an exploratory analysis algorithm for functional
data. The method partitions a set of functions into clusters and represents
each cluster by a simple prototype (e.g., piecewise constant). The total number
of segments in the prototypes, , is chosen by the user and optimally
distributed among the clusters via two dynamic programming algorithms. The
practical relevance of the method is shown on two real world datasets
Open challenges for Machine Learning based Early Decision-Making research
More and more applications require early decisions, i.e. taken as soon as
possible from partially observed data. However, the later a decision is made,
the more its accuracy tends to improve, since the description of the problem to
hand is enriched over time. Such a compromise between the earliness and the
accuracy of decisions has been particularly studied in the field of Early Time
Series Classification. This paper introduces a more general problem, called
Machine Learning based Early Decision Making (ML-EDM), which consists in
optimizing the decision times of models in a wide range of settings where data
is collected over time. After defining the ML-EDM problem, ten challenges are
identified and proposed to the scientific community to further research in this
area. These challenges open important application perspectives, discussed in
this paper
Practical Data Mining in a large utility company
We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are presen
Practical data mining in a large utility company
We present in this paper the main applications of data mining techniques at Electricité de France, the French national electric power company. This includes electric load curve analysis and prediction of customer characteristics. Closely related with data mining techniques are data warehouse management problems: we show that statistical methods can be used to help to manage data consistency and to provide accurate reports even when missing data are present