25 research outputs found

    Teaching Specific Scientific Knowledge into Large Language Models through Additional Training

    Full text link
    Through additional training, we explore embedding specialized scientific knowledge into the Llama 2 Large Language Model (LLM). Key findings reveal that effective knowledge integration requires reading texts from multiple perspectives, especially in instructional formats. We utilize text augmentation to tackle the scarcity of specialized texts, including style conversions and translations. Hyperparameter optimization proves crucial, with different size models (7b, 13b, and 70b) reasonably undergoing additional training. Validating our methods, we construct a dataset of 65,000 scientific papers. Although we have succeeded in partially embedding knowledge, the study highlights the complexities and limitations of incorporating specialized information into LLMs, suggesting areas for further improvement.Comment: added token information for some texts, and fixed typ

    Integrating multiple materials science projects in a single neural network

    No full text
    Traditionally, machine learning for materials science is based on database-specific models and is limited in the number of predictable parameters. Here, a versatile graph-based neural network can integrate multiple data sources, allowing the prediction of more than 40 parameters simultaneously

    Quantum circuit learning as a potential algorithm to predict experimental chemical properties

    No full text
    We introduce quantum circuit learning (QCL) as an emerging regression algorithm for chemo- and materials-informatics. The supervised model, functioning on the rule of quantum mechanics, can process linear and smooth non-linear functions from small datasets (< 100 records). Compared with conventional algorithms, such as random forest, support vector machine, and linear regressions, the QCL can offer better predictions with some one-dimensional functions and experimental chemical databases. QCL will potentially help the virtual exploration of new molecules and materials more efficiently through its superior prediction performances

    Using GPT-4 in Parameter Selection of Materials Informatics: Improving Predictive Accuracy Amidst Data Scarcity and \u27Ugly Duckling\u27 Dilemma

    No full text
    Materials informatics and cheminformatics struggle with data scarcity, hindering the extraction of significant relationships between structures and properties. The "Ugly Duckling" theorem, suggesting the difficulty of data processing without assumptions or prior knowledge, exacerbates this problem. Current methodologies don\u27t entirely bypass this theorem and may lead to decreased accuracy with unfamiliar data. We propose using Open AI GPT-4 language model for explanatory variable selection, leveraging its extensive knowledge and logical reasoning capabilities to embed domain knowledge in tasks predicting structure-property correlations, such as the refractive index of polymers. This can partially overcome challenges posed by the "Ugly Duckling" theorem and limited data availability

    Prompt engineering of GPT-4 for chemical research: what can/cannot be done?

    No full text
    This paper evaluates the capabilities and limitations of the Generative Pre-trained Transformer 4 (GPT-4) in chemical research. Although GPT-4 exhibits remarkable proficiencies, it is evident that the quality of input data significantly affects its performance. We explore GPT-4\u27s potential in chemical tasks, such as foundational chemistry knowledge, cheminformatics, data analysis, problem prediction, and proposal abilities. While the language model partially outperformed traditional methods, such as black-box optimization, it fell short against specialized algorithms, highlighting the need for their combined use. The paper shares the prompts given to GPT-4 and its responses, providing a resource for prompt engineering within the community, and concludes with a discussion on the future of chemical research using large language models

    Exploration of organic superionic glassy conductors by process and materials informatics with lossless graph database

    No full text
    Data-driven material exploration is a ground-breaking research style; however, daily experimental results are difficult to record, analyze, and share. We report a new data platform that losslessly describes the relationships of structures, properties, and processes as graphs in electronic laboratory notebooks. As a model project, organic superionic glassy conductors were explored by recording over 500 different experiments. Automated data analysis revealed the essential factors for a remarkable room temperature ionic conductivity of 10^−4-10^−3 S/cm and a lithium transference number of around 0.8. In contrast to previous materials research, everyone can access all the experimental results, including graphs, raw measurement data, and data processing systems, at a public repository. Direct data sharing will improve scientific communication and accelerate integration of material knowledge

    Automated experiment and data generation by foundation models for synthesizing polyamic acid particles

    No full text
    This study proposes an automated system for synthesizing polyamic acid particles using a custom liquid-handling device and a robotic arm. Integrating cameras and a multimodal large language model facilitates continuous monitoring and documentation, enhancing objectivity in synthetic experiments, and enabling future advancements in experimental research

    Automated design of Li+-conducting polymer by quantum-inspired annealing

    No full text
    Automated molecule design by computers has been an essential topic in materials informatics. Still, generating practical structures is not easy because of the difficulty in treating material stability, synthetic difficulty, mechanical properties, and other miscellaneous parameters, often leading to the generation of junk molecules. We tackle the problem by introducing supervised/unsupervised machine learning and quantum-inspired annealing. Our autonomous molecular design system can help experimental researchers discover practical materials more efficiently. Like the human design process, new molecules are explored based on knowledge of existing compounds. A new solid-state polymer electrolyte for lithium-ion batteries is designed and synthesized, giving a promising room temperature conductivity of 10^-5 S/cm with reasonable thermal, chemical, and mechanical properties
    corecore