84 research outputs found
On mining complex sequential data by means of FCA and pattern structures
Nowadays data sets are available in very complex and heterogeneous ways.
Mining of such data collections is essential to support many real-world
applications ranging from healthcare to marketing. In this work, we focus on
the analysis of "complex" sequential data by means of interesting sequential
patterns. We approach the problem using the elegant mathematical framework of
Formal Concept Analysis (FCA) and its extension based on "pattern structures".
Pattern structures are used for mining complex data (such as sequences or
graphs) and are based on a subsumption operation, which in our case is defined
with respect to the partial order on sequences. We show how pattern structures
along with projections (i.e., a data reduction of sequential structures), are
able to enumerate more meaningful patterns and increase the computing
efficiency of the approach. Finally, we show the applicability of the presented
method for discovering and analyzing interesting patient patterns from a French
healthcare data set on cancer. The quantitative and qualitative results (with
annotations and analysis from a physician) are reported in this use case which
is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures;
projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems.
The paper is created in the wake of the conference on Concept Lattice and
their Applications (CLA'2013). 27 pages, 9 figures, 3 table
Inferring DQN structure for high-dimensional continuous control
International audienceDespite recent advancements in the field of Deep Reinforcement Learning, Deep Q-network (DQN) models still show lackluster performance on problems with high-dimensional action spaces. The problem is even more pronounced for cases with high-dimensional continuous action spaces due to combinatorial increase in the number of the outputs. Recent works approach the problem by dividing the network into multiple parallel or sequential (action) modules responsible for different discretized actions. However there are drawbacks to both the parallel and the sequential approaches, i.e. parallel module architectures lack coordination between action modules, leading to extra complexity in the task, while a sequential structure can result in the vanishing gradients problem and exploding parameter space. In this work we show that the compositional structure of the action modules has a significant impact on the model performance, we propose a novel approach to infer the network structure for DQN models operating with high-dimensional continuous actions. Our method is based on uncertainty estimation techniques and yields substantially higher scores for MuJoCo environments with high-dimensional continuous action spaces, as well as a realistic AAA sailing simulator game
Computing Closed Skycubes
International audienceIn this paper, we tackle the problem of efficient skycube computation. We introduce a novel approach significantly reducing domination tests for a given subspace and the number of subspaces searched. Technically, we identify two types of skyline points that can be directly derived without using any domination tests. Moreover, based on formal concept analysis, we introduce two closure operators that enable a concise representation of skyline cubes. We show that this concise representation is easy to compute and develop an efficient algorithm, which only needs to search a small portion of the huge search space. We show with empirical results the merits of our approach
Sequential Pattern Mining using FCA and Pattern Structures for Analyzing Visitor Trajectories in a Museum
International audienceThis paper presents our work on mining visitor trajectories in Hecht Museum (Haifa, Israel), within the framework of CrossCult Eu-ropean Project about cultural heritage. We present a theoretical and practical research work about the characterization of visitor trajectories and the mining of these trajectories as sequences. The mining process is based on two approaches in the framework of FCA, namely the mining of subsequences without any constraint and the mining of frequent contiguous subsequences. Both approaches are based on pattern structures. In parallel, a similarity measure allows us to build a hierarchical classification which is used for interpretation and characterization of the trajectories w.r.t. four well-known visiting styles
Sequence Classification Based on Delta-Free Sequential Pattern
International audienceSequential pattern mining is one of the most studied and challenging tasks in data mining. However, the extension of well-known methods from many other classical patterns to sequences is not a trivial task. In this paper we study the notion of δ-freeness for sequences. While this notion has extensively been discussed for itemsets, this work is the first to extend it to sequences. We define an efficient algorithm devoted to the extraction of δ-free sequential patterns. Furthermore, we show the advantage of the δ-free sequences and highlight their importance when building sequence classifiers, and we show how they can be used to address the feature selection problem in statistical classifiers, as well as to build symbolic classifiers which optimizes both accuracy and earliness of predictions
Application of Biclustering to the Discovery of Constant and Gradual Patterns
International audienc
Découverte de sous-groupes avec les arbres de recherche de Monte Carlo
National audienceDécouvrir des règles qui distinguent clairement une classe d'une autre reste un problème difficile. De tels motifs permettent de suggérer des hypothèses pouvant expliquer une classe. La découverte de sous-groupes (Subgroup Discovery , SD), un cadre qui définit formellement cette tâche d'extraction de motifs, est toujours confrontée à deux problèmes majeurs: (i) définir des mesures de qualité appropriées qui caractérisent la singularité d'un motif et (ii) choisir une heuristique d'exploration de l'espace de recherche correcte lorsqu'une énuméra-tion complète est irréalisable. À ce jour, les algorithmes de SD les plus efficaces sont basés sur une recherche en faisceau (Beam Search, BS). La collection de motifs extraits manque cependant de diversité en raison de la nature gloutonne de l'exploration. Nous proposons ici d'utiliser une technique d'exploration récente, la recherche arborescente de Monte Carlo (Monte Carlo Tree Search, MCTS). Le compromis entre l'exploitation et l'exploration ainsi que la puissance de la recherche aléatoire permettent d'obtenir une solution disponible à tout moment et de surpasser généralement les approches de type BS. Notre étude empirique, avec plusieurs mesures de qualité, sur divers jeux de données de référence et du monde réel démontre la qualité de notre approche
Sequential pattern mining for analyzing visitor trajectories
International audienc
Combiner plongements de graphes et clustering pour l'alignement de connaissances pharmacogénomiques
National audienceLa publication et l'édition concurrentes de graphes de connaissances, notamment biomédicaux, au sein du Web des données entraînent l'existence de recouvrements entre ces graphes. Leur appariement est donc une tâche essentielle si l'on veut considérer l'ensemble des connaissances disponibles pour un domaine, ce qui inclut l'identification de noeuds équivalents, plus spécifiques ou similaires au sein de graphes agrégés. Nous proposons d'identifier et de typer ces appariements en apprenant des plongements de noeuds avec les réseaux convolutifs de graphes, puis en partitionnant par clustering les noeuds en fonction de la similarité de leurs plongements. Nous avons expérimenté cette approche en alignant des connaissances pharmacogénomiques, l'application réelle qui a motivé ce travail. Nous examinons particulièrement l'apport des connaissances de domaine en mesurant l'amélioration des résultats d'alignement après application de règles d'inférence. Nous observons des distances entre plongements cohérentes avec les alignements connus, notamment des distances plus faibles pour les équivalences
- …