66 research outputs found
A Causality Based Feature Selection Approach for Multivariate Time Series Forecasting
International audienceâThe field of time series forecasting has progressed significantly in recent decades, specially in regards to the need of forecasting economic data. That said, some issues still arise. In particular when we are working with a set of time series that have a large number of variables. Hence, a selection step is usually needed in order to reduce the number of variables that will contribute to forecast each target time series. In this paper, we propose a feature selection and / or dimension reduction algorithm for forecasting multivariate time series, based on (i) the notion of the Granger causality, and (ii) on a selection step based on a clustering strategy. Finally, we carry out experiments on different real data sets, by comparing our proposal and some of the most used feature selection methods. Experiments show that we improved the forecasting accuracy compared with the evaluated methods
Cubes convexes
In various approaches, data cubes are pre-computed in order to answer
efficiently OLAP queries. The notion of data cube has been declined in various
ways: iceberg cubes, range cubes or differential cubes. In this paper, we
introduce the concept of convex cube which captures all the tuples of a
datacube satisfying a constraint combination. It can be represented in a very
compact way in order to optimize both computation time and required storage
space. The convex cube is not an additional structure appended to the list of
cube variants but we propose it as a unifying structure that we use to
characterize, in a simple, sound and homogeneous way, the other quoted types of
cubes. Finally, we introduce the concept of emerging cube which captures the
significant trend inversions. characterizations
Closed sets based discovery of small covers for association rules
International audienceIn this paper, we address the problem of the understandability and usefulness of the set of discovered association rules. This problem is important since real-life databases lead most of the time to several thousands of rules with high confidence. We thus propose new algorithms based on the Galois closed sets to limit the extraction to small informative covers for exact and approximate rules, and small structural covers for approximate rules. Once frequent closed itemsets - which constitute a generating set for both frequent itemsets and association rules - have been discovered, no additional database pass is needed to derive these covers. Experiments conducted on real-life databases show that these algorithms are efficient and valuable in practice
Closed sets based discovery of small covers for association rules (extended version)
International audienceIn this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since real-life databases yield most of the time several thousands of rules with high conïŹdence. We propose new algorithms based on Galois closed sets to reduce the extraction to small covers (or bases) for exact and approximate rules, adapted from lattice theory and data analysis domain. Once frequent closed itemsets â which constitute a generating set for both frequent itemsets and association rules â have been discovered, no additional database pass is needed to derive these bases. Experiments conducted on real-life databases show that these algorithms are efïŹcient and valuable in practice
Mining bases for association rules using closed sets
International audienceAssociation rules are conditional implications between requent itemsets. The problem of the usefulness and the elevance of the set of discovered association rules is related to the huge number of rules extracted and the presence of many redundancies among these rules for many datasets. We address this important problem using the Galois connection framework and we show that we can generate bases or association rules using the frequent closed itemsets extracted by the Close or the A-Close algorithms
Treillis des concepts skylines : Analyse multidimensionnelle des skylines fond\'ee sur les ensembles en accord
The skyline concept has been introduced in order to exhibit the best objects
according to all the criterion combinations and makes it possible to analyse
the relationships between skyline objects. Like the data cube, the skycube is
so voluminous that reduction approaches are really necessary. In this paper, we
define an approach which partially materializes the skycube. The underlying
idea is to discard from the representation the skycuboids which can be computed
again the most easily. To meet this reduction objective, we characterize a
formal framework: the agree concept lattice. From this structure, we derive the
skyline concept lattice which is one of its constrained instances. The strong
points of our approach are: (i) it is attribute oriented; (ii) it provides a
boundary for the number of lattice nodes; (iii) it facilitates the navigation
within the Skycuboids
Computing iceberg concept lattices with Titanic
International audienceWe introduce the notion of iceberg concept lattices and show their use in knowledge discovery in databases. Iceberg lattices are a conceptual clustering method, which is well suited for analyzing very large databases. They also serve as a condensed representation of frequent itemsets, as starting point for computing bases of association rules, and as a visualization method for association rules. Iceberg concept lattices are based on the theory of Formal Concept Analysis, a mathematical theory with applications in data analysis, information retrieval, and knowledge discovery. We present a new algorithm called TITANIC for computing (iceberg) concept lattices. It is based on data mining techniques with a level-wise approach. In fact, TITANIC can be used for a more general problem: Computing arbitrary closure systems when the closure operator comes along with a so-called weight function. The use of weight functions for computing closure systems has not been discussed in the literature up to now. Applications providing such a weight function include association rule mining, functional dependencies in databases, conceptual clustering, and ontology engineering. The algorithm is experimentally evaluated and compared with Ganter's Next-Closure algorithm. The evaluation shows an important gain in eïŹciency, especially for weakly correlated data
Generating a condensed representation for association rules
International audienceAssociation rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation for association rules. This representation is characterized by frequent closed itemsets and their generators. It contains the non-redundant association rules having minimal antecedent and maximal consequent, called min-max association rules. We think that these rules are the most relevant since they are the most general non-redundant association rules. Furthermore, this representation is a basis, i.e., a generating set for all association rules, their supports and their confidences, and all of them can be retrieved needless accessing the data. We introduce algorithms for extracting this basis and for reconstructing all association rules. Results of experiments carried out on real datasets show the usefulness of this approach. In order to generate this basis when an algorithm for extracting frequent itemsetsâsuch as Apriori for instanceâis used, we also present an algorithm for deriving frequent closed itemsets and their generators from frequent itemsets without using the dataset
Pascal : un algorithme d'extraction des motifs fréquents
International audienceNous proposons dans cet article l'algorithme Pascal qui introduit une nouvelle optimisation de l'algorithme de rĂ©fĂ©rence Apriori. Cette optimisation est fondĂ©e sur le comptage des motifs par infĂ©rence, qui utilise le concept de motifs clĂ©s. Le support des motifs frĂ©quents non clĂ©s peut ĂȘtre infĂ©rĂ© du support des motifs clĂ©s sans accĂšs Ă la base de donnĂ©es. ExpĂ©rimentalement, la comparaison de Pascal avec Apriori, Close et Max-Miner montre son efficacitĂ©. Les motifs clĂ©s permettent aussi de dĂ©finir les rĂšgles d'association informatives, potentiellement plus utiles que l'ensemble complet des rĂšgles d'association et beaucoup moins nombreuses
Levelwise search of frequent patterns with counting inference
Colloque avec actes et comité de lecture. nationale.National audienceIn this paper,we address the problem of the efficiency of the main phase of most data mining applications: The frequent pattern extraction. This problem is mainly related to the number of operations required for counting pattern supports in the database, and we propose a new method called pattern counting inference, that allows to perform as few support counts as possible. Using this method, the support of a pattern is determined without accessing the database whenever possible, using the supports of some of its sub-patterns called key patterns. This method was implemented in the Pascal algorithm that is an optimization of the simple and efficient Apriori Algorithm. Experiments comparing Pascal to the Apriori, Close and Max-Miner algorithms, each one representative of a frequent patterns discovery strategy, show that Pascal improves the efficiency of the frequent pattern extraction from correlated data and that it does not induce additional execution times when data is weakly correlated
- âŠ