    First Elements on Knowledge Discovery guided by Domain Knowledge (KDDK)

    International audienceIn this paper, we present research trends carried out in the Orpailleur team at Loria, showing how knowledge discovery and knowledge processing may be combined. The knowledge discovery in databases process (KDD) consists in processing a huge volume of data for extracting significant and reusable knowledge units. From a knowledge representation perspective, the KDD process may take advantage of domain knowledge embedded in ontologies relative to the domain of data, leading to the notion of ''knowledge discovery guided by domain knowledge'' or KDDK. The KDDK process is based on the classification process (and its multiple forms), e.g. for modeling, representing, reasoning, and discovering. Some applications are detailed, showing how KDDK can be instantiated in an application domain. Finally, an architecture of an integrated KDDK system is proposed and discussed

    Ontology-guided data preparation for discovering genotype-phenotype relationships

    International audienceComplexity of post-genomic data and multiplicity of mining strategies are two limits to Knowledge Discovery in Databases (KDD) in life sciences. Because they provide a semantic frame to data and because they benefit from the progress of semantic web technologies, bio-ontologies should be considered for playing a key role in the KDD process. In the frame of a case study relative to the search of genotype-phenotype relationships, we demonstrate the capability of bio-ontologies to guide data selection during the preparation step of the KDD process. We propose three scenarios to illustrate how domain knowledge can be taken into account in order to select or aggregate data to mine, and consequently how it can facilitate result interpretation at the end of the process

    Linked data and online classifications to organise mined patterns in patient data

    In this paper, we investigate the use of web data resources in medicine, especially through medical classifications made available using the principles of Linked Data, to support the interpretation of patterns mined from patient care trajectories. Interpreting such patterns is naturally a challenge for an analyst, as it requires going through large amounts of results and access to sufficient background knowledge. We employ linked data, especially as exposed through the BioPortal system, to create a navigation structure within the patterns obtained form sequential pattern mining. We show how this approach provides a flexible way to explore data about trajectories of diagnoses and treatments according to different medical classifications

    Why and How Knowledge Discovery Can Be Useful for Solving Problems with CBR

    International audienceIn this talk, we discuss and illustrate links existing between knowledge discovery in databases (KDD), knowledge representation and reasoning (KRR), and case-based reasoning (CBR). KDD techniques especially based on Formal Concept Analysis (FCA) are well formalized and allow the design of concept lattices from binary and complex data. These concept lattices provide a realistic basis for knowledge base organization and ontology engineering. More generally, they can be used for representing knowledge and reasoning in knowledge systems and CBR systems as well

    Semantic Indexing and Retrieval based on Formal Concept Analysis

    Semantic indexing and retrieval has become an important research area, as the available amount of information on the Web is growing more and more. In this paper, we introduce an original approach to semantic indexing and retrieval based on Formal Concept Analysis. The concept lattice is used as a semantic index and we propose an original algorithm for traversing the lattice and answering user queries. This framework has been used and evaluated on song datasets

    Intelligence artificielle: Les défis actuels et l'action d'Inria - Livre blanc Inria

    Livre blanc Inria N°01International audienceInria white papers look at major current challenges in informatics and mathematics and show actions conducted by our project-teams to address these challenges. This document is the first produced by the Strategic Technology Monitoring & Prospective Studies Unit. Thanks to a reactive observation system, this unit plays a lead role in supporting Inria to develop its strategic and scientific orientations. It also enables the institute to anticipate the impact of digital sciences on all social and economic domains. It has been coordinated by Bertrand Braunschweig with contributions from 45 researchers from Inria and from our partners. Special thanks to Peter Sturm for his precise and complete review.Les livres blancs d’Inria examinent les grands défis actuels du numérique et présentent les actions menées par noséquipes-projets pour résoudre ces défis. Ce document est le premier produit par la cellule veille et prospective d’Inria. Cette unité, par l’attention qu’elle porte aux évolutions scientifiques et technologiques, doit jouer un rôle majeur dans la détermination des orientations stratégiques et scientifiques d’Inria. Elle doit également permettre à l’Institut d’anticiper l’impact des sciences du numérique dans tous les domaines sociaux et économiques. Ce livre blanc a été coordonné par Bertrand Braunschweig avec des contributions de 45 chercheurs d’Inria et de ses partenaires. Un grand merci à Peter Sturm pour sa relecture précise et complète. Merci également au service STIP du centre de Saclay – Île-de-France pour la correction finale de la version française

    Atténuation des biais involontaires dans les modèles de langage masqué

    International audienceAlgorithmic fairness is currently one of the most debated topics in artificial intelligence. Primarily because of its massive importance and critical impact on various aspects of human lives. One of its highly discussed drawbacks is the unintended bias we observe in complex machine learning models, and that carries adverse effects on various fields ranging from healthcare to legal policing.This work focused on devising novel methodologies for mitigating unintended bias found in language models. Our work mainly concerned the gender bias associated with occupations. However, it can be extended to other kinds such as demographic, racial, etc. As the definition of bias is subjective and varies case by case, it is challenging to measure and reduce biases in pre-trained language models.This work proposes an advanced architecture based on Deep Reinforcement Learning to mitigate unintended bias in pre-trained language models. The proposed architecture tackles withstanding challenges without compromising performance and defines a system that is portable and easy to adapt.The thesis is organised as follows. After defining the problem statement in Chapter 1, we lay down the pre-requisites required for this study in Chapter 2. In the following two chapters, we detail the proposed architecture and the experimental setup used for evaluation. Chapter 5 comprises a discussion based on the results and possible explanations of the model’s behaviour. Finally, in Chapter 6, we discuss the limitations as well as promising perspectives for future work.L'équité algorithmique est actuellement l'un des sujets les plus débattus en intelligence artificielle. Principalement en raison de son importance énorme et de son impact critique sur divers aspects de la vie humaine. L'un de ses inconvénients très discutés est le biais involontaire que nous observons dans les modèles complexes d'apprentissage automatique, et qui a des effets néfastes sur divers domaines allant des soins de santé à la police judiciaire.Ce travail s'est concentré sur la conception de nouvelles méthodologies pour atténuer les biais involontaires trouvés dans les modèles de langage. Nos travaux ont porté principalement sur les préjugés sexistes associés aux professions. Cependant, il peut être étendu à d'autres types tels que démographiques, raciaux, etc. Comme la définition du biais est subjective et varie au cas par cas, il est difficile de mesurer et de réduire les biais dans les modèles linguistiques pré-formés.Ce travail propose une architecture avancée basée sur l'apprentissage par renforcement profond pour atténuer les biais involontaires dans les modèles de langage pré-entraînés. L'architecture proposée relève les défis sans compromettre les performances et définit un système portable et facile à adapter.La thèse est organisée comme suit. Après avoir défini l'énoncé du problème au chapitre 1, nous posons les pré-requis nécessaires à cette étude au chapitre 2. Dans les deux chapitres suivants, nous détaillons l'architecture proposée et le montage expérimental utilisé pour l'évaluation. Le chapitre 5 comprend une discussion basée sur les résultats et les explications possibles du comportement du modèle. Enfin, dans le chapitre 6, nous discutons des limites ainsi que des perspectives prometteuses pour des travaux futurs

    Multilayer waveguide modes based analysis of 2-D photonic crystals-pertinent to modelling PCSELs

    Semiconductor lasers with the combination of characteristics such as large output power, single mode operation and good beam quality are very often desired. The photonic crystal surface emitting laser (PCSEL) has shown significant promise and has received much attention with the purpose of achieving devices with the desired characteristics. The evaluation of the resonant modes of the structure is a primary requirement in modelling PCSELs. However, conventional techniques such as PWE, CMT and FDTD are either computationally very time consuming or mathematically rather intensive. The aim of this thesis is to develop a new model for evaluating resonance of 2-D photonic crystal, pertinent to the lasing mode of PCSEL. Such aim is achieved by first studying wave characteristics of 1-D periodic structure and understanding the eigenmode and eigenfunction of both infinite and finite periodic structure. It is shown that the eigenmode of the infinite periodic structure is the Bloch mode while the eigenmode of the finite periodic structure is represented by optical tunnelling type of solution. The solutions correspond to the characteristic impedance of the periodic structure. The concept of eigenmode of finite periodic structure is then used to establish the 2-D model of photonic crystal. The essential underlying concept of the analysis procedure presented in this work is based on viewing the 2-D photonic crystal as a laterally periodic multilayer waveguide which is longitudinally segmented. Such model matches with conventional model favourably and proved to be versatile, efficient, fast (for 500×500 periods takes ~7min using laptop: 2 core at 1.70 GHz, negligible memory usage. (Compare to FDTD for 20×20 periods takes 5h using supercomputer system: 12 core, 24GB RAM). Thus, the model has the potential of generating more comprehensive models of photonic crystal based devices. Experimental work including fabrication, characterisation further proved the validity of the model. PCSEL with external reflection is experimental studied. It is shown that the lasing characteristics can be modified through introduced external reflection

    Bogoliubov's Vision: Quasiaverages and Broken Symmetry to Quantum Protectorate and Emergence

    In the present interdisciplinary review we focus on the applications of the symmetry principles to quantum and statistical physics in connection with some other branches of science. The profound and innovative idea of quasiaverages formulated by N.N. Bogoliubov, gives the so-called macro-objectivation of the degeneracy in domain of quantum statistical mechanics, quantum field theory and in the quantum physics in general. We discuss the complementary unifying ideas of modern physics, namely: spontaneous symmetry breaking, quantum protectorate and emergence. The interrelation of the concepts of symmetry breaking, quasiaverages and quantum protectorate was analyzed in the context of quantum theory and statistical physics. The chief purposes of this paper were to demonstrate the connection and interrelation of these conceptual advances of the many-body physics and to try to show explicitly that those concepts, though different in details, have a certain common features. Several problems in the field of statistical physics of complex materials and systems (e.g. the chirality of molecules) and the foundations of the microscopic theory of magnetism and superconductivity were discussed in relation to these ideas.Comment: 88 pages, 1 figure, Refs.42

    A Theoretical Framework and Practical Toolkit for Ethical Library Assessment

    Praktiker:innen stehen bei der Bewertung bzw. Evaluation von Bibliotheken unter doppeltem Druck, den Wert der Bibliothek zu demonstrieren und gleichzeitig die Werte der bibliothekarischen Profession einzuhalten. Um eine Praxis der Bibliotheksevaluation zu unterstützen, die sowohl den Wert der Bibliothek als auch die bibliothekarische Werte anspricht, untersucht diese Dissertation die Praxis der Bibliotheksevaluation durch die Perspektive praktischer Ethik und angewandter Werte. Die Hauptforschungsfrage lautet: „Wie kann Bibliotheksbewertung ethisch durchgeführt werden?“ Ich folgte einem dreistufigen Forschungsdesign: eine Literaturrecherche, eine Umfrage und Interviews. Die Literaturrecherche konzentriert sich auf die Ethik, Werte, Dilemmata und Praktiken von Bewertungspraktiker:innen. Eine vignettenbasierte Umfrage untersuchte Werte und Ethik bei der Bewertung von Bibliotheken weiter. Die Umfragedaten wurden mittels der konstruktivistischen Grounded Theory analysiert und die daraus resultierenden Codes etablierten ein neues Rahmenwerk und ein neues Instrument für die ethische Bewertung von Bibliotheken. Schließlich wurde das Instrument mit dem Namen Values-Sensitive Library Assessment Toolkit durch Interviews mit Bewertungspraktiker:innen validiert. Die Forschungsergebnisse zeigen, dass Praktiker:innen der Bibliotheksbewertung eine ethische Praxis anstreben, aber durch eine komplexe und dezentralisierte Wertelandschaft herausgefordert werden, die viele konkurrierende Möglichkeiten zur Identifizierung und Umsetzung von Werten bietet. Das Toolkit dient dazu, einen Satz von Werten zu modellieren, den Praktiker:innen anwenden können, um eine ethische Bewertungspraxis zu unterstützen.Library assessment practitioners face dual pressures to demonstrate library value and adhere to library values. To support a practice of library assessment that addresses both library value and library values, this dissertation examines the practice of library assessment through the lens of practical ethics and applied values. The main research question asks, “How can library assessment be practiced ethically?” I followed a three-step research design: a literature review, a survey, and interviews. The literature review focuses on the ethics, values, dilemmas, and practices of library assessment practitioners. A vignette-based survey further investigated values and ethics in assessment. Survey data was analyzed through constructivist grounded theory, and the resulting set of codes established a new framework and toolkit for ethical library assessment. Finally, the toolkit—named the Values-Sensitive Library Assessment Toolkit—was validated through interviews with assessment practitioners. Research findings indicate that library assessment practitioners seek an ethical practice, but are challenged by a complex and decentralized values landscape that offers many competing choices for identifying and implementing values. The toolkit serves to model a value set that practitioners can apply to support an ethical assessment practice