24 research outputs found

    Proceedings of the International Workshop "What can FCA do for Artificial Intelligence?" (FCA4AI 2014)

    Get PDF
    International audienceThis is the third edition of the FCA4AI workshop, whose first edition was organized at ECAI 2012 Conference (Montpellier, August 2012) and second edition was organized at IJCAI 2013 Conference (Beijing, August 2013, see http://www.fca4ai.hse.ru/). Formal Concept Analysis (FCA) is a mathematically well-founded theory aimed at data analysis and classification that can be used for many purposes, especially for Artificial Intelligence (AI) needs. The objective of the workshop is to investigate two main main issues: how can FCA support various AI activities (knowledge discovery, knowledge representation and reasoning, learning, data mining, NLP, information retrieval), and how can FCA be extended in order to help AI researchers to solve new and complex problems in their domain

    31th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    Information modelling is becoming more and more important topic for researchers, designers, and users of information systems.The amount and complexity of information itself, the number of abstractionlevels of information, and the size of databases and knowledge bases arecontinuously growing. Conceptual modelling is one of the sub-areas ofinformation modelling. The aim of this conference is to bring together experts from different areas of computer science and other disciplines, who have a common interest in understanding and solving problems on information modelling and knowledge bases, as well as applying the results of research to practice. We also aim to recognize and study new areas on modelling and knowledge bases to which more attention should be paid. Therefore philosophy and logic, cognitive science, knowledge management, linguistics and management science are relevant areas, too. In the conference, there will be three categories of presentations, i.e. full papers, short papers and position papers

    Concept Trees: Building Dynamic Concepts from Semi-Structured Data using Nature-Inspired Methods

    Full text link
    This paper describes a method for creating structure from heterogeneous sources, as part of an information database, or more specifically, a 'concept base'. Structures called 'concept trees' can grow from the semi-structured sources when consistent sequences of concepts are presented. They might be considered to be dynamic databases, possibly a variation on the distributed Agent-Based or Cellular Automata models, or even related to Markov models. Semantic comparison of text is required, but the trees can be built more, from automatic knowledge and statistical feedback. This reduced model might also be attractive for security or privacy reasons, as not all of the potential data gets saved. The construction process maintains the key requirement of generality, allowing it to be used as part of a generic framework. The nature of the method also means that some level of optimisation or normalisation of the information will occur. This gives comparisons with databases or knowledge-bases, but a database system would firstly model its environment or datasets and then populate the database with instance values. The concept base deals with a more uncertain environment and therefore cannot fully model it beforehand. The model itself therefore evolves over time. Similar to databases, it also needs a good indexing system, where the construction process provides memory and indexing structures. These allow for more complex concepts to be automatically created, stored and retrieved, possibly as part of a more cognitive model. There are also some arguments, or more abstract ideas, for merging physical-world laws into these automatic processes.Comment: Pre-prin

    Security Infrastructure Technology for Integrated Utilization of Big Data

    Get PDF
    This open access book describes the technologies needed to construct a secure big data infrastructure that connects data owners, analytical institutions, and user institutions in a circle of trust. It begins by discussing the most relevant technical issues involved in creating safe and privacy-preserving big data distribution platforms, and especially focuses on cryptographic primitives and privacy-preserving techniques, which are essential prerequisites. The book also covers elliptic curve cryptosystems, which offer compact public key cryptosystems; and LWE-based cryptosystems, which are a type of post-quantum cryptosystem. Since big data distribution platforms require appropriate data handling, the book also describes a privacy-preserving data integration protocol and privacy-preserving classification protocol for secure computation. Furthermore, it introduces an anonymization technique and privacy risk evaluation technique. This book also describes the latest related findings in both the living safety and medical fields. In the living safety field, to prevent injuries occurring in everyday life, it is necessary to analyze injury data, find problems, and implement suitable measures. But most cases don’t include enough information for injury prevention because the necessary data is spread across multiple organizations, and data integration is difficult from a security standpoint. This book introduces a system for solving this problem by applying a method for integrating distributed data securely and introduces applications concerning childhood injury at home and school injury. In the medical field, privacy protection and patient consent management are crucial for all research. The book describes a medical test bed for the secure collection and analysis of electronic medical records distributed among various medical institutions. The system promotes big-data analysis of medical data with a cloud infrastructure and includes various security measures developed in our project to avoid privacy violations

    Data-driven 2d materials discovery for next-generation electronics

    Get PDF
    The development of material discovery and design has lasted centuries in human history. After the concept of modern chemistry and material science was established, the strategy of material discovery relies on the experiments. Such a strategy becomes expensive and time-consuming with the increasing number of materials nowadays. Therefore, a novel strategy that is faster and more comprehensive is urgently needed. In this dissertation, an experiment-guided material discovery strategy is developed and explained using metal-organic frameworks (MOFs) as instances. The advent of 7r-stacked layered MOFs, which offer electrical conductivity on top of permanent porosity and high surface area, opened up new horizons for designing compact MOF-based devices such as battery electrodes, supercapacitors, and spintronics. Structural building blocks, including metal nodes and organic linkers in these electrically conductive (EC) MOFs, are recognized and taking permutations among the building blocks results in new systems with unprecedented and unexplored physical and chemical properties. With the ultimate goal of providing a platform for accelerated material design and discovery, here the foundation is laid for the creation of the first comprehensive database of EC MOFs with an experimentally guided approach. The first phase of this database, coined EC-MOF/Phase-I, is composed of 1,057 bulk and monolayer structures built by all possible combinations of experimentally reported organic linkers, functional groups, and metal nodes. A high-throughput (HT) workflow is constructed to implement density functional theory calculations with periodic boundary conditions to optimize the structures and calculate some of their most relevant properties. Because research and development in the area of EC MOFs has long been suffering from the lack of appropriate initial crystal structures, all of the geometries and property data have been made available for the use of the community through an online platform that was developed during the course of this work. This database provides comprehensive physical and chemical data of EC-MOFs as well as the convenience of selecting appropriate materials for specific applications, thus accelerating the design and discovery of EC MOF-based compact devices. Machine learning (ML), a technique of learning patterns of numerical data and making predictions, can be utilized in material discovery. Taking advantages of the EC-MOF Database, ML is adopted to predict property data that needs expensive calculations according to the crystal structures only. The implementation of ML is much faster than the HT workflow when the number of structures increases constantly

    Towards interoperability in heterogeneous database systems

    Get PDF
    Distributed heterogeneous databases consist of systems which differ physically and logically, containing different data models and data manipulation languages. Although these databases are independently created and administered they must cooperate and interoperate. Users need to access and manipulate data from several databases and applications may require data from a wide variety of independent databases. Therefore, a new system architecture is required to manipulate and manage distinct and multiple databases, in a transparent way, while preserving their autonomy. This report contains an extensive survey on heterogeneous databases, analysing and comparing the different aspects, concepts and approaches related to the topic. It introduces an architecture to support interoperability among heterogeneous database systems. The architecture avoids the use of a centralised structure to assist in the different phases of the interoperability process. It aims to support scalability, and to assure privacy and nfidentiality of the data. The proposed architecture allows the databases to decide when to participate in the system, what type of data to share and with which other databases, thereby preserving their autonomy. The report also describes an approach to information discovery in the proposed architecture, without using any centralised structure as repositories and dictionaries, and broadcasting to all databases. It attempts to reduce the number of databases searched and to preserve the privacy of the shared data. The main idea is to visit a database that either containsthe requested data or knows about another database that possible contains this data

    Protein microenvironments for topology analysis

    Get PDF
    Previously held under moratorium from 1st December 2016 until 1st December 2021Amino Acid Residues are often the focus of research on protein structures. However, in a folded protein, each residue finds itself in an environment that is defined by the properties of its surrounding residues. The term microenvironment is used herein to refer to these local ensembles. Not only do they have chemical properties but also topological properties which quantify concepts such as density, boundaries between domains and junction complexity. These quantifications are used to project a protein’s backbone structure into a series of scores. The hypothesis was that these sequences of scores can be used to discover protein domains and motifs and that they can be used to align and compare groups of 3D protein structures. This research sought to implement a system that could efficiently compute microenvironments such that they can be applied routinely to large datasets. The computation of the microenvironments was the most challenging aspect in terms of performance, and the optimisations required are described. Methods of scoring microenvironments were developed to enable the extraction of domain and motif data without 3D alignment. The problem of allosteric site detection was addressed with a classifier that gave high rates of allosteric site detection. Overall, this work describes the development of a system that scales well with increasing dataset sizes. It builds on existing techniques, in order to automatically detect the boundaries of domains and demonstrates the ability to process large datasets by application to allosteric site detection, a problem that has not previously been adequately solved.Amino Acid Residues are often the focus of research on protein structures. However, in a folded protein, each residue finds itself in an environment that is defined by the properties of its surrounding residues. The term microenvironment is used herein to refer to these local ensembles. Not only do they have chemical properties but also topological properties which quantify concepts such as density, boundaries between domains and junction complexity. These quantifications are used to project a protein’s backbone structure into a series of scores. The hypothesis was that these sequences of scores can be used to discover protein domains and motifs and that they can be used to align and compare groups of 3D protein structures. This research sought to implement a system that could efficiently compute microenvironments such that they can be applied routinely to large datasets. The computation of the microenvironments was the most challenging aspect in terms of performance, and the optimisations required are described. Methods of scoring microenvironments were developed to enable the extraction of domain and motif data without 3D alignment. The problem of allosteric site detection was addressed with a classifier that gave high rates of allosteric site detection. Overall, this work describes the development of a system that scales well with increasing dataset sizes. It builds on existing techniques, in order to automatically detect the boundaries of domains and demonstrates the ability to process large datasets by application to allosteric site detection, a problem that has not previously been adequately solved

    Artificial intelligence for crystal structure prediction

    Get PDF
    Predicting the ground-state and metastable crystal structures of materials from just knowing their composition is a formidable challenge in computational materials discovery. Recent studies that were published in the group of M. Scheffler have investigated how the relative stability of compounds between two crystal-structure types can be predicted from the properties of their atomic constituents within the framework of symbolic regression. By using a novel compressed-sensing-based method, the sure independence screening and sparsifying operator (SISSO), the descriptor that best captured the structural stability was identified from billions of candidates. A descriptor is a vector of analytical formulas built from simple physical quantities. In the first part of the thesis, a multi-task-learning extension of SISSO (MT-SISSO) that enables the treatment of the structural stability of compounds among multiple structure types is introduced. We show how the multi-task method that identifies a single descriptor for all structure types enables the prediction of a well-defined structural stability and, therefore, the design of a crystal-structure map. Moreover, we present how MT-SISSO determines accurate, predictive models even when trained with largely incomplete databases. A different artificial-intelligence approach proposed for tackling the crystal-structure prediction challenge is based on approximating the Born-Oppenheimer potential-energy surface (PES). In particular, Gaussian Approximation Potentials that are typically composed of a combination of two-, three-, and many-body potentials and fitted to elemental systems have attracted attention in recent years. First examples that were published in the group of G. Csanyi have demonstrated how the ground-state and metastable phases could correctly be identified for Si, C, P, and B, by exploring the PES that was predicted by such machine-learning potentials (ML potentials). However, the ML potentials introduced so far show limited transferability, i.e. their accuracy rapidly decreases in regions of the PES that are distant from the training data. As a consequence, these ML potentials are usually fitted to large training databases. Moreover, such training data needs to be constructed for every new material (more precisely, tuple of species types) that was not in the initial training database. For instance, the chemical-species information does not enter the ML potentials in the form of a variable. The second part of the thesis introduces a neural-network-based scheme to make ML potentials, specifically two- and three-body potentials, explicitly chemical-species-type dependent. We call the models chemical transferable potentials (CTP). The methodology enables the prediction of materials not included in the training data. As a showcase example, we consider a set of binary materials. The thesis tackles two challenges at the same time: a) the prediction of the PES of a material not contained in the training data and b) constructing robust models from a limited set of crystal structures. In particular, our tests examine to which extent the ML potentials that were trained on such sparse data allow an accurate prediction of regions of the PES that are far from the training data (in the structural space) but are sampled in a global crystal-structure search. When performing both constrained structure searches among a set of considered crystal-structure prototypes and an unbiased global structure search, we find that missing data in those regions does not hinder our models from identifying the ground-state phases of materials, even if the materials are not in the training data. Moreover, we compare our method to two state-of-the-art ML methods that, similarly to CTP, are capable of predicting the potential energies of materials not included in the training data. These are the extension of the smooth overlap of atomic positions by an alchemical similarity kernel (ASOAP) introduced in the group of M. Ceriotti, and the crystal graph convolutional neural networks (CGCNN) introduced in the group of J. C. Grossman. In the literature so far, the ASOAP and CGCNN have been benchmarked on single-point energy calculations but have not been investigated in combination with global, unbiased structure-search scenarios. We include the ASOAP and CGCNN in our structure-search tests. Our analysis reveals that, unlike CTP, these two approaches learn unphysical shapes of the PES in regions that surround the training data which are typically sampled in a structure-search application. This shortcoming is particularly evident in the unbiased global-search scenario.Die Vorhersage der Grundzustands- und metastabilen Kristallstrukturen von Materialien anhand der Kenntnis ihrer Zusammensetzung ist in der computergestützten Materialwissenschaft eine Herausforderung. In neueren Studien der Forschungsgruppe M. Schefflers wurde untersucht, wie die Energiedifferenz zwischen zwei Kristallstrukturtypen der gleichen chemischen Zusammensetzung anhand der Eigenschaften ihrer atomaren Bestandteile im Rahmen der symbolischen Regression vorhergesagt werden kann. Mithilfe der Verwendung einer neuartigen Compressed-Sensing-basierten Methode, des Sure Independence Screening and Sparsifying Operator (SISSO), wurde aus Milliarden von Kandidaten der Deskriptor identifiziert, der die strukturelle StabilitĂ€t am besten erfasst. Ein Deskriptor ist ein Vektor aus analytischen Formeln, die sich aus einfachen physikalischen GrĂ¶ĂŸen zusammensetzen. Im ersten Teil der Arbeit wird eine Multi-Task-Learning-Erweiterung von SISSO (MT-SISSO) vorgestellt, die das Behandeln von Energiedifferenzen zwischen mehreren Kristallstrukturtypen des gleichen Materials ermöglicht. Wir demonstrieren, wie die Multi-Task- Methode, die einen einzigen Deskriptor für alle Strukturtypen identifiziert, die Vorhersage einer wohldefinierten strukturellen StabilitĂ€t und damit das Erstellen einer Kristallstrukturkarte ermöglicht. Darüber hinaus zeigen wir, wie MT-SISSO genaue Vorhersagemodelle bildet, selbst wenn die Modelle mit weitgehend unvollstĂ€ndigen Daten trainiert werden. Ein weiterer bekannter Ansatz zur BewĂ€ltigung der Herausforderung der Kristallstrukturvorhersage mit künstlicher Intelligenz basiert auf der Approximation der Born-Oppenheimer-PotentialenergieoberflĂ€che (PEO). Insbesondere haben Gaussian Approximation Potentials, die in der Regel aus einer Kombination von Zwei-, Drei- und Vielteilchenpotentialen bestehen und an Materialien, die aus einem chemischen Element bestehen, gefittet werden, in den letzten Jahren Aufmerksamkeit erregt. Erste Beispiele, die in der Gruppe von G. Csanyi veröffentlicht wurden, haben gezeigt, wie die Grundzustands- und metastabilen Kristallstrukturen von Si, C, P und B korrekt identifiziert werden können. Dabei wurde die PEO erkundet, die durch die Gaussian Approximation Potentials - oder allgemeiner Machine-Learning-Potentials (ML-Potentials) - vorhergesagt wurde. Die Transferierbarkeit der bisher bekannten ML-Potentials ist allerdings begrenzt, d. h. ihre Genauigkeit nimmt in Bereichen der PEO, die weit entfernt von den Trainingsdaten liegen, rapide ab. Folglich werden diese ML-Potentiale an große Trainingsdatenbanken gefittet. Des Weiteren müssen solche Trainingsdaten für jedes neue Material (genauer gesagt, Tupel von chemischen Elementen), das nicht in der aktuellen Trainingsdatenbank enthalten ist, konstruiert werden. Beispielsweise fehlt in den ML-Potentials eine Beschreibung der Eigenschaften der chemischen Elemente der Materialien in Form einer Variable. Im zweiten Teil der Arbeit wird eine auf Neuronalen-Netzen-basierende Methode entwickelt, die eine explizite AbhĂ€ngigkeit der ML-Potentials, insbesondere Zwei- und Drei-Teilchen-Potentiale, von den chemischen Elementen des Materials erlaubt. Wir nennen die Modelle Chemical Transferable Potentials (CTP). Die Methodik ermöglicht die Vorhersage von Materialien, die nicht in den Trainingsdaten enthalten sind. Als Vorzeigebeispiel betrachten wir eine Reihe von binĂ€ren Materialien. Die Arbeit befasst sich mit zwei Herausforderungen zur gleichen Zeit: a) der Vorhersage der PEO eines Materials, das nicht in den Trainingsdaten enthalten ist, und b) das Bilden robuster Modelle aus einer begrenzten Anzahl an Kristallstrukturen. In unseren Untersuchungen wird insbesondere evaluiert, inwieweit die auf solch spĂ€rlichen Daten trainierten ML-Potentiale eine genaue Vorhersage von Regionen der PEO ermöglichen, die zwar weit von den Trainingsdaten (im Kristallstrukturraum) entfernt liegen, aber in einer globalen Kristallstruktursuche mit abgetastet werden. Sowohl bei eingeschrĂ€nkten Kristallstruktursuchen unter einer Reihe von betrachteten Kristallstrukturprototypen als auch bei einer uneingeschrĂ€nkten globalen Kristallstruktursuche stellen wir fest, dass fehlende Daten in diesen Kristallstrukturregionen unsere Modelle nicht daran hindern, die Grundzustandskristallstrukturen von Materialien zu identifizieren, selbst wenn die Materialien nicht in den Trainingsdaten enthalten sind. Darüber hinaus vergleichen wir unsere Methode mit zwei modernen ML-Methoden, die Ă€hnlich wie die CTP in der Lage sind, die potentielle Energie von Materialien vorherzusagen, die nicht in den Trainingsdaten enthalten sind. Die eine Methode basiert auf einer Erweiterung des Smooth Overlap of Atomic Positions um einen alchemical Ähnlichkeitsmaß (ASOAP), welche in der Gruppe von M. Ceriotti entwickelt wurde. Die zweite Methode heißt Crystal Graph Convolutional Neural Networks (CGCNN) und wurde in der Gruppe von J. C. Grossman eingeführt. Bisher wurden ASOAP und CGCNN in der Literatur anhand von Einzelpunkt-Energieberechnungen validiert, aber nicht im Rahmen globaler uneingeschrĂ€nkter Kristallstruktursuchen. Wir wenden unsere Kristallstruktursuchtests ebenso auf ASOAP und CGCNN an. Unsere Untersuchungen zeigen, dass die beiden Methoden im Gegensatz zu den CTP unphysikalische Formen der PEO in Regionen lernen, die weit von den Trainingsdaten entfernt liegen, aber in einer Kristallstruktursuche üblicherweise abgetastet werden. Diese Limitation kommt besonders im uneingeschrĂ€nkten und globalen Suchszenario zur Geltung

    Subsurface Characterization by Means of Geovisual Analytics

    Get PDF
    This Thesis is concerned with one of the major problems in subsurface characterizations emerging from ever-increasing loads of data in the last decades: What kind of technologies suit well for extracting novel, valid and useful knowledge from persistent data repositories for the characterization of subsurface regions and how can such technologies be implemented in an integrated, community-open software platform? In order to address those questions, an interactive, open-source software platform for geoscientific knowledge discovery has been developed, which enables domain experts to generate, optimize and validate prognostic models of the subsurface domain. Such a free tool has been missing in the geoscientific community so far. The extensible software platform GeoReVi (Geological Reservoir Virtualization) implements selected aspects of geovisual analytics with special attention being paid to an implementation of the knowledge discovery in databases process. With GeoReVi the human expert can model and visualize static and dynamic systems in the subsurface in a feedback cycle. The created models can be analyzed and parameterized by means of modern approaches from geostatistics and data mining. Hence, knowledge that is useful to both the assessment of subsurface potentials and to support decision-making during the utilization process of the subsurface regions can be extracted and exchanged in a formalized manner. The modular software application is composed of both integrated and centralized databases, a graphical user interface and a business logic. In order to fulfill the needs of low computing time in accordance with high computational complexity of spatial problems, the software system makes intense use of parallelism and asynchronous programming. The competitiveness of industry branches, which are aimed at utilizing the subsurface in unknown regions, such as the geothermal energy production or carbon capture and storage, are especially dependent on the quality of spatial forecasts for relevant rock and fluid properties. Thus, the focus of this work has been laid upon the implementation of algorithms, which enhance the predictability of properties in space under consideration of uncertainty. The software system was therefore evaluated in ample real-world scenarios by solving problems from scientific, educational and industrial projects. The implemented software system shows an excellent suitability to generically address spatial problems such as interpolation or stochastic simulation under consideration of numerical uncertainty. In this context, GeoReVi served as a tool for discovering new knowledge with special regard to investigating the heterogeneity of rock media on multiple scales of investigation. Among others, it could be demonstrated that the three-dimensional scalar fields of different petrophysical and geochemical properties in sandstone media may diverge significantly at small-scales. In fact, if the small-scale variability is not considered in field-scale projects, in which the sampling density is usually low, statistical correlations and thus empirical relationships might be feigned. Furthermore, it could be demonstrated that the simple kriging variance, which is used to simulate the natural variability in sequential simulations, systematically underestimates the intrinsic variability of the investigated sandstone media. If the small-scale variability can be determined by high-resolution sampling, it can be used to enhance conditional simulations at the scale of depositional environments

    Disaster risk reduction and post disaster infrastructure reconstruction in Sri Lanka

    Get PDF
    Disasters resulting from natural hazards such as floods, drought, earthquakes, cyclones impact societies in several ways, while damaging lives, infrastructure and resulting in financial and environmental losses. Therefore, prevention of disasters through reducing disaster risk has been critically important to minimise the impact of disasters. A proactive stance to reduce the toll of disasters requires an approach with both predisaster risk reduction and post-disaster recovery. However, the world is gradually shifting from disaster response to a more proactive approach to disaster management. However, integration of disaster risk has been identified as a key priority within the post disaster reconstruction process. Accordingly, the main aim of this paper is to draw attention to the importance of integration of disaster risk reduction in to post-disaster infrastructure reconstruction, which is a part of a doctoral research. Infrastructure reconstruction programs aim to change the vulnerable conditions for the development of the country. It is well identified that all critical infrastructure facilities must be designed to a given level of safety from disaster impact. The research based on which this paper is written aims to reveal the contribution of integration of disaster risk reduction into post-disaster infrastructure reconstruction on economic development. This paper reveals the disaster risk strategies used in general and in specific to the post-tsunami infrastructure sector in Sri Lanka and discovers their success rate. Further, the paper discusses the challenges associated with integration of disaster reduction into post tsunami infrastructure reconstruction projects
    corecore