230 research outputs found

    Generative retrieval-augmented ontologic graph and multi-agent strategies for interpretive large language model-based materials design

    Full text link
    Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design and manufacturing, including their capacity to work effectively with both human language, symbols, code, and numerical data. Here we explore the use of large language models (LLMs) as a tool that can support engineering analysis of materials, applied to retrieving key information about subject areas, developing research hypotheses, discovery of mechanistic relationships across disparate areas of knowledge, and writing and executing simulation codes for active knowledge generation based on physical ground truths. When used as sets of AI agents with specific features, capabilities, and instructions, LLMs can provide powerful problem solution strategies for applications in analysis and design problems. Our experiments focus on using a fine-tuned model, MechGPT, developed based on training data in the mechanics of materials domain. We first affirm how finetuning endows LLMs with reasonable understanding of domain knowledge. However, when queried outside the context of learned matter, LLMs can have difficulty to recall correct information. We show how this can be addressed using retrieval-augmented Ontological Knowledge Graph strategies that discern how the model understands what concepts are important and how they are related. Illustrated for a use case of relating distinct areas of knowledge - here, music and proteins - such strategies can also provide an interpretable graph structure with rich information at the node, edge and subgraph level. We discuss nonlinear sampling strategies and agent-based modeling applied to complex question answering, code generation and execution in the context of automated force field development from actively learned Density Functional Theory (DFT) modeling, and data analysis

    Computational acquisition of knowledge in small-data environments: a case study in the field of energetics

    Get PDF
    The UK’s defence industry is accelerating its implementation of artificial intelligence, including expert systems and natural language processing (NLP) tools designed to supplement human analysis. This thesis examines the limitations of NLP tools in small-data environments (common in defence) in the defence-related energetic-materials domain. A literature review identifies the domain-specific challenges of developing an expert system (specifically an ontology). The absence of domain resources such as labelled datasets and, most significantly, the preprocessing of text resources are identified as challenges. To address the latter, a novel general-purpose preprocessing pipeline specifically tailored for the energetic-materials domain is developed. The effectiveness of the pipeline is evaluated. Examination of the interface between using NLP tools in data-limited environments to either supplement or replace human analysis completely is conducted in a study examining the subjective concept of importance. A methodology for directly comparing the ability of NLP tools and experts to identify important points in the text is presented. Results show the participants of the study exhibit little agreement, even on which points in the text are important. The NLP, expert (author of the text being examined) and participants only agree on general statements. However, as a group, the participants agreed with the expert. In data-limited environments, the extractive-summarisation tools examined cannot effectively identify the important points in a technical document akin to an expert. A methodology for the classification of journal articles by the technology readiness level (TRL) of the described technologies in a data-limited environment is proposed. Techniques to overcome challenges with using real-world data such as class imbalances are investigated. A methodology to evaluate the reliability of human annotations is presented. Analysis identifies a lack of agreement and consistency in the expert evaluation of document TRL.Open Acces

    A Quadruple-Based Text Analysis System for History and Philosophy of Science

    Get PDF
    abstract: Computational tools in the digital humanities often either work on the macro-scale, enabling researchers to analyze huge amounts of data, or on the micro-scale, supporting scholars in the interpretation and analysis of individual documents. The proposed research system that was developed in the context of this dissertation ("Quadriga System") works to bridge these two extremes by offering tools to support close reading and interpretation of texts, while at the same time providing a means for collaboration and data collection that could lead to analyses based on big datasets. In the field of history of science, researchers usually use unstructured data such as texts or images. To computationally analyze such data, it first has to be transformed into a machine-understandable format. The Quadriga System is based on the idea to represent texts as graphs of contextualized triples (or quadruples). Those graphs (or networks) can then be mathematically analyzed and visualized. This dissertation describes two projects that use the Quadriga System for the analysis and exploration of texts and the creation of social networks. Furthermore, a model for digital humanities education is proposed that brings together students from the humanities and computer science in order to develop user-oriented, innovative tools, methods, and infrastructures.Dissertation/ThesisDoctoral Dissertation Biology 201

    A database of battery materials auto-generated using ChemDataExtractor

    Get PDF
    Funder: University of Cambridge | Christ's College, University of Cambridge (Christ's College); doi: https://doi.org/10.13039/501100000590Abstract: A database of battery materials is presented which comprises a total of 292,313 data records, with 214,617 unique chemical-property data relations between 17,354 unique chemicals and up to five material properties: capacity, voltage, conductivity, Coulombic efficiency and energy. 117,403 data are multivariate on a property where it is the dependent variable in part of a data series. The database was auto-generated by mining text from 229,061 academic papers using the chemistry-aware natural language processing toolkit, ChemDataExtractor version 1.5, which was modified for the specific domain of batteries. The collected data can be used as a representative overview of battery material information that is contained within text of scientific papers. Public availability of these data will also enable battery materials design and prediction via data-science methods. To the best of our knowledge, this is the first auto-generated database of battery materials extracted from a relatively large number of scientific papers. We also provide a Graphical User Interface (GUI) to aid the use of this database

    A decision support system for eco-efficient biorefinery process comparison using a semantic approach

    Get PDF
    Enzymatic hydrolysis of the main components of lignocellulosic biomass is one of the promising methods to further upgrading it into biofuels. Biomass pre-treatment is an essential step in order to reduce cellulose crystallinity, increase surface and porosity and separate the major constituents of biomass. Scientific literature in this domain is increasing fast and could be a valuable source of data. As these abundant scientific data are mostly in textual format and heterogeneously structured, using them to compute biomass pre-treatment efficiency is not straightforward. This paper presents the implementation of a Decision Support System (DSS) based on an original pipeline coupling knowledge engineering (KE) based on semantic web technologies, soft computing techniques and environmental factor computation. The DSS allows using data found in the literature to assess environmental sustainability of biorefinery systems. The pipeline permits to: (1) structure and integrate relevant experimental data, (2) assess data source reliability, (3) compute and visualize green indicators taking into account data imprecision and source reliability. This pipeline has been made possible thanks to innovative researches in the coupling of ontologies, uncertainty management and propagation. In this first version, data acquisition is done by experts and facilitated by a termino-ontological resource. Data source reliability assessment is based on domain knowledge and done by experts. The operational prototype has been used by field experts on a realistic use case (rice straw). The obtained results have validated the usefulness of the system. Further work will address the question of a higher automation level for data acquisition and data source reliability assessment

    A Semantic e-Science Platform for 20th Century Paint Conservation

    Get PDF

    Antennas and Electromagnetics Research via Natural Language Processing.

    Get PDF
    Advanced techniques for performing natural language processing (NLP) are being utilised to devise a pioneering methodology for collecting and analysing data derived from scientific literature. Despite significant advancements in automated database generation and analysis within the domains of material chemistry and physics, the implementation of NLP techniques in the realms of metamaterial discovery, antenna design, and wireless communications remains at its early stages. This thesis proposes several novel approaches to advance research in material science. Firstly, an NLP method has been developed to automatically extract keywords from large-scale unstructured texts in the area of metamaterial research. This enables the uncovering of trends and relationships between keywords, facilitating the establishment of future research directions. Additionally, a trained neural network model based on the encoder-decoder Long Short-Term Memory (LSTM) architecture has been developed to predict future research directions and provide insights into the influence of metamaterials research. This model lays the groundwork for developing a research roadmap of metamaterials. Furthermore, a novel weighting system has been designed to evaluate article attributes in antenna and propagation research, enabling more accurate assessments of impact of each scientific publication. This approach goes beyond conventional numeric metrics to produce more meaningful predictions. Secondly, a framework has been proposed to leverage text summarisation, one of the primary NLP tasks, to enhance the quality of scientific reviews. It has been applied to review recent development of antennas and propagation for body-centric wireless communications, and the validation has been made available for comparison with well-referenced datasets for text summarisation. Lastly, the effectiveness of automated database building in the domain of tunable materials and their properties has been presented. The collected database will use as an input for training a surrogate machine learning model in an iterative active learning cycle. This model will be utilised to facilitate high-throughput material processing, with the ultimate goal of discovering novel materials exhibiting high tunability. The approaches proposed in this thesis will help to accelerate the discovery of new materials and enhance their applications in antennas, which has the potential to transform electromagnetic material research

    Spatial ontologies for architectural heritage

    Get PDF
    Informatics and artificial intelligence have generated new requirements for digital archiving, information, and documentation. Semantic interoperability has become fundamental for the management and sharing of information. The constraints to data interpretation enable both database interoperability, for data and schemas sharing and reuse, and information retrieval in large datasets. Another challenging issue is the exploitation of automated reasoning possibilities. The solution is the use of domain ontologies as a reference for data modelling in information systems. The architectural heritage (AH) domain is considered in this thesis. The documentation in this field, particularly complex and multifaceted, is well-known to be critical for the preservation, knowledge, and promotion of the monuments. For these reasons, digital inventories, also exploiting standards and new semantic technologies, are developed by international organisations (Getty Institute, ONU, European Union). Geometric and geographic information is essential part of a monument. It is composed by a number of aspects (spatial, topological, and mereological relations; accuracy; multi-scale representation; time; etc.). Currently, geomatics permits the obtaining of very accurate and dense 3D models (possibly enriched with textures) and derived products, in both raster and vector format. Many standards were published for the geographic field or in the cultural heritage domain. However, the first ones are limited in the foreseen representation scales (the maximum is achieved by OGC CityGML), and the semantic values do not consider the full semantic richness of AH. The second ones (especially the core ontology CIDOC – CRM, the Conceptual Reference Model of the Documentation Commettee of the International Council of Museums) were employed to document museums’ objects. Even if it was recently extended to standing buildings and a spatial extension was included, the integration of complex 3D models has not yet been achieved. In this thesis, the aspects (especially spatial issues) to consider in the documentation of monuments are analysed. In the light of them, the OGC CityGML is extended for the management of AH complexity. An approach ‘from the landscape to the detail’ is used, for considering the monument in a wider system, which is essential for analysis and reasoning about such complex objects. An implementation test is conducted on a case study, preferring open source applications
    • …
    corecore