8 research outputs found
An Information Theoretic Approach to Modeling Neural Network Expert Systems
In this paper we propose several novel techniques for mapping rule bases, such as are used in rule based expert systems, onto neural network architectures. Our objective in doing this is to achieve a system capable of incremental learning, and distributed probabilistic inference. Such a system would be capable of performing inference many orders of magnitude faster than current serial rule based expert systems, and hence be capable of true real time operation. In addition, the rule based formalism gives the system an explicit knowledge representation, unlike current neural models. We propose an information-theoretic approach to this problem, which really has two aspects: firstly learning the model and, secondly, performing inference using this model. We will show a clear pathway to implementing an expert system starting from raw data, via a learned rule-based model, to a neural network that performs distributed inference
An Information Theoretic Approach to Rule-Based Connectionist Expert Systems
We discuss in this paper architectures for executing probabilistic rule-bases in a parallel
manner, using as a theoretical basis recently introduced information-theoretic
models. We will begin by describing our (non-neural) learning algorithm and theory
of quantitative rule modelling, followed by a discussion on the exact nature of two
particular models. Finally we work through an example of our approach, going from
database to rules to inference network, and compare the network's performance with
the theoretical limits for specific problems
New Learning Models for Generating Classification Rules Based on Rough Set Approach
Data sets, static or dynamic, are very important and useful for presenting real life
features in different aspects of industry, medicine, economy, and others. Recently,
different models were used to generate knowledge from vague and uncertain data
sets such as induction decision tree, neural network, fuzzy logic, genetic algorithm,
rough set theory, and others. All of these models take long time to learn for a huge
and dynamic data set. Thus, the challenge is how to develop an efficient model that
can decrease the learning time without affecting the quality of the generated
classification rules. Huge information systems or data sets usually have some
missing values due to unavailable data that affect the quality of the generated
classification rules. Missing values lead to the difficulty of extracting useful
information from that data set. Another challenge is how to solve the problem of
missing data. Rough set theory is a new mathematical tool to deal with vagueness and uncertainty.
It is a useful approach for uncovering classificatory knowledge and building a
classification rules. So, the application of the theory as part of the learning models
was proposed in this thesis.
Two different models for learning in data sets were proposed based on two different
reduction algorithms. The split-condition-merge-reduct algorithm ( SCMR) was
performed on three different modules: partitioning the data set vertically into subsets,
applying rough set concepts of reduction to each subset, and merging the reducts of
all subsets to form the best reduct. The enhanced-split-condition-merge-reduct
algorithm (E SCMR) was performed on the above three modules followed by another
module that applies the rough set reduction concept again to the reduct generated by
SCMR in order to generate the best reduct, which plays the same role as if all
attributes in this subset existed. Classification rules were generated based on the best
reduct.
For the problem of missing data, a new approach was proposed based on data
partitioning and function mode. In this new approach, the data set was partitioned
horizontally into different subsets. All objects in each subset of data were described
by only one classification value. The mode function was applied to each subset of
data that has missing values in order to find the most frequently occurring value in
each attribute. Missing values in that attribute were replaced by the mode value.
The proposed approach for missing values produced better results compared to other
approaches. Also, the proposed models for learning in data sets generated the classification rules faster than other methods. The accuracy of the classification rules
by the proposed models was high compared to other models
Fouille de données par extraction de motifs graduels : contextualisation et enrichissement
This thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases » and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription.Les travaux de cette thĂšse s'inscrivent dans le cadre de l'extraction de connaissances et de la fouille de donnĂ©es appliquĂ©e Ă des bases de donnĂ©es numĂ©riques ou floues afin d'extraire des rĂ©sumĂ©s linguistiques sous la forme de motifs graduels exprimant des corrĂ©lations de co-variations des valeurs des attributs, de la forme « plus la tempĂ©rature augmente, plus la pression augmente ». Notre objectif est de les contextualiser et de les enrichir en proposant diffĂ©rents types de complĂ©ments d'information afin d'augmenter leur qualitĂ© et leur apporter une meilleure interprĂ©tation. Nous proposons quatre formes de nouveaux motifs : nous avons tout d'abord Ă©tudiĂ© les motifs dits « renforcĂ©s », qui effectuent, dans le cas de donnĂ©es floues, une contextualisation par intĂ©gration d'attributs complĂ©mentaires, ajoutant des clauses introduites linguistiquement par l'expression « d'autant plus que ». Ils peuvent ĂȘtre illustrĂ©s par l'exemple « plus la tempĂ©rature diminue, plus le volume de l'air diminue, d'autant plus que sa densitĂ© augmente ». Ce renforcement est interprĂ©tĂ© comme validitĂ© accrue des motifs graduels. Nous nous sommes Ă©galement intĂ©ressĂ©es Ă la transposition de la notion de renforcement aux rĂšgles d'association classiques en discutant de leurs interprĂ©tations possibles et nous montrons leur apport limitĂ©. Nous proposons ensuite de traiter le problĂšme des motifs graduels contradictoires rencontrĂ© par exemple lors de l'extraction simultanĂ©e des deux motifs « plus la tempĂ©rature augmente, plus l'humiditĂ© augmente » et « plus la tempĂ©rature augmente, plus l'humiditĂ© diminue ». Pour gĂ©rer ces contradictions, nous proposons une dĂ©finition contrainte du support d'un motif graduel, qui, en particulier, ne dĂ©pend pas uniquement du motif considĂ©rĂ©, mais aussi de ses contradicteurs potentiels. Nous proposons Ă©galement deux mĂ©thodes d'extraction, respectivement basĂ©es sur un filtrage a posteriori et sur l'intĂ©gration de la contrainte du nouveau support dans le processus de gĂ©nĂ©ration. Nous introduisons Ă©galement les motifs graduels caractĂ©risĂ©s, dĂ©finis par l'ajout d'une clause linguistiquement introduite par l'expression « surtout si » comme par exemple « plus la tempĂ©rature diminue, plus l'humiditĂ© diminue, surtout si la tempĂ©rature varie dans [0, 10] °C » : la clause additionnelle prĂ©cise des plages de valeurs sur lesquelles la validitĂ© des motifs est accrue. Nous formalisons la qualitĂ© de cet enrichissement comme un compromis entre deux contraintes imposĂ©es Ă l'intervalle identifiĂ©, portant sur sa taille et sa validitĂ©, ainsi qu'une extension tenant compte de la densitĂ© des donnĂ©es.Nous proposons une mĂ©thode d'extraction automatique basĂ©e sur des outils de morphologie mathĂ©matique et la dĂ©finition d'un filtre appropriĂ© et transcription
GĂ©nĂ©ration de connaissances Ă lâaide du retour dâexpĂ©rience : application Ă la maintenance industrielle
Les travaux de recherche prĂ©sentĂ©s dans ce mĂ©moire sâinscrivent dans le cadre de la valorisation des connaissances issues des expĂ©riences passĂ©es afin dâamĂ©liorer les performances des processus industriels. La connaissance est considĂ©rĂ©e aujourd'hui comme une ressource stratĂ©gique importante pouvant apporter un avantage concurrentiel dĂ©cisif aux organisations. La gestion des connaissances (et en particulier le retour dâexpĂ©rience) permet de prĂ©server et de valoriser des informations liĂ©es aux activitĂ©s dâune entreprise afin dâaider la prise de dĂ©cision et de crĂ©er de nouvelles connaissances Ă partir du patrimoine immatĂ©riel de lâorganisation. Dans ce contexte, les progrĂšs des technologies de lâinformation et de la communication jouent un rĂŽle essentiel dans la collecte et la gestion des connaissances. LâimplĂ©mentation gĂ©nĂ©ralisĂ©e des systĂšmes dâinformation industriels, tels que les ERP (Enterprise Resource Planning), rend en effet disponible un grand volume dâinformations issues des Ă©vĂ©nements ou des faits passĂ©s, dont la rĂ©utilisation devient un enjeu majeur. Toutefois, ces fragments de connaissances (les expĂ©riences passĂ©es) sont trĂšs contextualisĂ©s et nĂ©cessitent des mĂ©thodologies bien prĂ©cises pour ĂȘtre gĂ©nĂ©ralisĂ©s. Etant donnĂ© le potentiel des informations recueillies dans les entreprises en tant que source de nouvelles connaissances, nous proposons dans ce travail une dĂ©marche originale permettant de gĂ©nĂ©rer de nouvelles connaissances tirĂ©es de lâanalyse des expĂ©riences passĂ©es, en nous appuyant sur la complĂ©mentaritĂ© de deux courants scientifiques : la dĂ©marche de Retour dâExpĂ©rience (REx) et les techniques dâExtraction de Connaissances Ă partir de DonnĂ©es (ECD). Le couplage REx-ECD proposĂ© porte principalement sur : i) la modĂ©lisation des expĂ©riences recueillies Ă lâaide dâun formalisme de reprĂ©sentation de connaissances afin de faciliter leur future exploitation, et ii) lâapplication de techniques relatives Ă la fouille de donnĂ©es (ou data mining) afin dâextraire des expĂ©riences de nouvelles connaissances sous la forme de rĂšgles. Ces rĂšgles doivent nĂ©cessairement ĂȘtre Ă©valuĂ©es et validĂ©es par les experts du domaine avant leur rĂ©utilisation et/ou leur intĂ©gration dans le systĂšme industriel. Tout au long de cette dĂ©marche, nous avons donnĂ© une place privilĂ©giĂ©e aux Graphes Conceptuels (GCs), formalisme de reprĂ©sentation des connaissances choisi pour faciliter le stockage, le traitement et la comprĂ©hension des connaissances extraites par lâutilisateur, en vue dâune exploitation future. Ce mĂ©moire sâarticule en quatre chapitres. Le premier constitue un Ă©tat de lâart abordant les gĂ©nĂ©ralitĂ©s des deux courants scientifiques qui contribuent Ă notre proposition : le REx et les techniques dâECD. Le second chapitre prĂ©sente la dĂ©marche REx-ECD proposĂ©e, ainsi que les outils mis en Ćuvre pour la gĂ©nĂ©ration de nouvelles connaissances afin de valoriser les informations disponibles dĂ©crivant les expĂ©riences passĂ©es. Le troisiĂšme chapitre prĂ©sente une mĂ©thodologie structurĂ©e pour interprĂ©ter et Ă©valuer lâintĂ©rĂȘt des connaissances extraites lors de la phase de post-traitement du processus dâECD. Finalement, le dernier chapitre expose des cas rĂ©els dâapplication de la dĂ©marche proposĂ©e Ă des interventions de maintenance industrielle. ABSTRACT : The research work presented in this thesis relates to knowledge extraction from past experiences in order to improve the performance of industrial process. Knowledge is nowadays considered as an important strategic resource providing a decisive competitive advantage to organizations. Knowledge management (especially the experience feedback) is used to preserve and enhance the information related to a companyâs activities in order to support decision-making and create new knowledge from the intangible heritage of the organization. In that context, advances in information and communication technologies play an essential role for gathering and processing knowledge. The generalised implementation of industrial information systems such as ERPs (Enterprise Resource Planning) make available a large amount of data related to past events or historical facts, which reuse is becoming a major issue. However, these fragments of knowledge (past experiences) are highly contextualized and require specific methodologies for being generalized. Taking into account the great potential of the information collected in companies as a source of new knowledge, we suggest in this work an original approach to generate new knowledge based on the analysis of past experiences, taking into account the complementarity of two scientific threads: Experience Feedback (EF) and Knowledge Discovery techniques from Databases (KDD). The suggested EF-KDD combination focuses mainly on: i) modelling the experiences collected using a knowledge representation formalism in order to facilitate their future exploitation, and ii) applying techniques related to data mining in order to extract new knowledge in the form of rules. These rules must necessarily be evaluated and validated by experts of the industrial domain before their reuse and/or integration into the industrial system. Throughout this approach, we have given a privileged position to Conceptual Graphs (CGs), knowledge representation formalism chosen in order to facilitate the storage, processing and understanding of the extracted knowledge by the user for future exploitation. This thesis is divided into four chapters. The first chapter is a state of the art addressing the generalities of the two scientific threads that contribute to our proposal: EF and KDD. The second chapter presents the EF-KDD suggested approach and the tools used for the generation of new knowledge, in order to exploit the available information describing past experiences. The third chapter suggests a structured methodology for interpreting and evaluating the usefulness of the extracted knowledge during the post-processing phase in the KDD process. Finally, the last chapter discusses real case studies dealing with the industrial maintenance domain, on which the proposed approach has been applied
GĂ©nĂ©ration de connaissances Ă lâaide du retour dâexpĂ©rience : application Ă la maintenance industrielle
Les travaux de recherche prĂ©sentĂ©s dans ce mĂ©moire sâinscrivent dans le cadre de la valorisation des connaissances issues des expĂ©riences passĂ©es afin dâamĂ©liorer les performances des processus industriels. La connaissance est considĂ©rĂ©e aujourd'hui comme une ressource stratĂ©gique importante pouvant apporter un avantage concurrentiel dĂ©cisif aux organisations. La gestion des connaissances (et en particulier le retour dâexpĂ©rience) permet de prĂ©server et de valoriser des informations liĂ©es aux activitĂ©s dâune entreprise afin dâaider la prise de dĂ©cision et de crĂ©er de nouvelles connaissances Ă partir du patrimoine immatĂ©riel de lâorganisation. Dans ce contexte, les progrĂšs des technologies de lâinformation et de la communication jouent un rĂŽle essentiel dans la collecte et la gestion des connaissances. LâimplĂ©mentation gĂ©nĂ©ralisĂ©e des systĂšmes dâinformation industriels, tels que les ERP (Enterprise Resource Planning), rend en effet disponible un grand volume dâinformations issues des Ă©vĂ©nements ou des faits passĂ©s, dont la rĂ©utilisation devient un enjeu majeur. Toutefois, ces fragments de connaissances (les expĂ©riences passĂ©es) sont trĂšs contextualisĂ©s et nĂ©cessitent des mĂ©thodologies bien prĂ©cises pour ĂȘtre gĂ©nĂ©ralisĂ©s. Etant donnĂ© le potentiel des informations recueillies dans les entreprises en tant que source de nouvelles connaissances, nous proposons dans ce travail une dĂ©marche originale permettant de gĂ©nĂ©rer de nouvelles connaissances tirĂ©es de lâanalyse des expĂ©riences passĂ©es, en nous appuyant sur la complĂ©mentaritĂ© de deux courants scientifiques : la dĂ©marche de Retour dâExpĂ©rience (REx) et les techniques dâExtraction de Connaissances Ă partir de DonnĂ©es (ECD). Le couplage REx-ECD proposĂ© porte principalement sur : i) la modĂ©lisation des expĂ©riences recueillies Ă lâaide dâun formalisme de reprĂ©sentation de connaissances afin de faciliter leur future exploitation, et ii) lâapplication de techniques relatives Ă la fouille de donnĂ©es (ou data mining) afin dâextraire des expĂ©riences de nouvelles connaissances sous la forme de rĂšgles. Ces rĂšgles doivent nĂ©cessairement ĂȘtre Ă©valuĂ©es et validĂ©es par les experts du domaine avant leur rĂ©utilisation et/ou leur intĂ©gration dans le systĂšme industriel. Tout au long de cette dĂ©marche, nous avons donnĂ© une place privilĂ©giĂ©e aux Graphes Conceptuels (GCs), formalisme de reprĂ©sentation des connaissances choisi pour faciliter le stockage, le traitement et la comprĂ©hension des connaissances extraites par lâutilisateur, en vue dâune exploitation future. Ce mĂ©moire sâarticule en quatre chapitres. Le premier constitue un Ă©tat de lâart abordant les gĂ©nĂ©ralitĂ©s des deux courants scientifiques qui contribuent Ă notre proposition : le REx et les techniques dâECD. Le second chapitre prĂ©sente la dĂ©marche REx-ECD proposĂ©e, ainsi que les outils mis en Ćuvre pour la gĂ©nĂ©ration de nouvelles connaissances afin de valoriser les informations disponibles dĂ©crivant les expĂ©riences passĂ©es. Le troisiĂšme chapitre prĂ©sente une mĂ©thodologie structurĂ©e pour interprĂ©ter et Ă©valuer lâintĂ©rĂȘt des connaissances extraites lors de la phase de post-traitement du processus dâECD. Finalement, le dernier chapitre expose des cas rĂ©els dâapplication de la dĂ©marche proposĂ©e Ă des interventions de maintenance industrielle
Algorithmes automatiques pour la fouille visuelle de donnĂ©es et la visualisation de rĂšgles dâassociation : application aux donnĂ©es aĂ©ronautiques
Depuis quelques années, nous assistons à une véritable explosion de la production de données dans de nombreux domaines, comme les réseaux sociaux ou le commerce en ligne. Ce phénomÚne récent est renforcé par la généralisation des périphériques connectés, dont l'utilisation est devenue aujourd'hui quasi-permanente. Le domaine aéronautique n'échappe pas à cette tendance. En effet, le besoin croissant de données, dicté par l'évolution des systÚmes de gestion du trafic aérien et par les événements, donne lieu à une prise de conscience sur leur importance et sur une nouvelle maniÚre de les appréhender, qu'il s'agisse de stockage, de mise à disposition et de valorisation. Les capacités d'hébergement ont été adaptées, et ne constituent pas une difficulté majeure. Celle-ci réside plutÎt dans le traitement de l'information et dans l'extraction de connaissances. Dans le cadre du Visual Analytics, discipline émergente née des conséquences des attentats de 2001, cette extraction combine des approches algorithmiques et visuelles, afin de bénéficier simultanément de la flexibilité, de la créativité et de la connaissance humaine, et des capacités de calculs des systÚmes informatiques. Ce travail de thÚse a porté sur la réalisation de cette combinaison, en laissant à l'homme une position centrale et décisionnelle. D'une part, l'exploration visuelle des données, par l'utilisateur, pilote la génération des rÚgles d'association, qui établissent des relations entre elles. D'autre part, ces rÚgles sont exploitées en configurant automatiquement la visualisation des données concernées par celles-ci, afin de les mettre en valeur. Pour cela, ce processus bidirectionnel entre les données et les rÚgles a été formalisé, puis illustré, à l'aide d'enregistrements de trafic aérien récent, sur la plate-forme Videam que nous avons développée. Celle-ci intÚgre, dans un environnement modulaire et évolutif, plusieurs briques IHM et algorithmiques, permettant l'exploration interactive des données et des rÚgles d'association, tout en laissant à l'utilisateur la maßtrise globale du processus, notamment en paramétrant et en pilotant les algorithmes. ABSTRACT : In the past few years, we have seen a large scale data production in many areas, such as social networks and e-business. This recent phenomenon is enhanced by the widespread use of devices, which are permanently connected. The aeronautical field is also involved in this trend. Indeed, its growing need for data, which is driven by air trafic management systems evolution and by events, leads to a widescale focus on its key role and on new ways to manage it. It deals with storage, availability and exploitation. Data hosting capacity, that has been adapted, is not a major challenge. The issue is now in data processing and knowledge extraction from it. Visual Analytics is an emerging field, stemming from the September 2001 events. It combines automatic and visual approaches, in order to benefit simultaneously from human flexibility, creativity and knowledge, and also from processing capacities of computers. This PhD thesis has focused on this combination, by giving to the operator a centered and decisionmaking role. On the one hand, the visual data exploration drives association rules extraction. They correspond to links between the data. On the other hand, these rules are exploited by automatically con_gurating the visualization of the concerned data, in order to highlight it. To achieve this, a bidirectional process has been formalized, between data and rules. It has been illustrated by air trafic recordings, thanks to the Videam platform, that we have developed. By integrating several HMI and algorithmic applications in a modular and upgradeable environment, it allows interactive exploration of both data and association rules. This is done by giving to human the mastering of the global process, especially by setting and driving algorithms