25 research outputs found
Teaching Specific Scientific Knowledge into Large Language Models through Additional Training
Through additional training, we explore embedding specialized scientific
knowledge into the Llama 2 Large Language Model (LLM). Key findings reveal that
effective knowledge integration requires reading texts from multiple
perspectives, especially in instructional formats. We utilize text augmentation
to tackle the scarcity of specialized texts, including style conversions and
translations. Hyperparameter optimization proves crucial, with different size
models (7b, 13b, and 70b) reasonably undergoing additional training. Validating
our methods, we construct a dataset of 65,000 scientific papers. Although we
have succeeded in partially embedding knowledge, the study highlights the
complexities and limitations of incorporating specialized information into
LLMs, suggesting areas for further improvement.Comment: added token information for some texts, and fixed typ
Integrating multiple materials science projects in a single neural network
Traditionally, machine learning for materials science is based on database-specific models and is limited in the number of predictable parameters. Here, a versatile graph-based neural network can integrate multiple data sources, allowing the prediction of more than 40 parameters simultaneously
Quantum circuit learning as a potential algorithm to predict experimental chemical properties
We introduce quantum circuit learning (QCL) as an emerging regression algorithm for chemo- and materials-informatics. The supervised model, functioning on the rule of quantum mechanics, can process linear and smooth non-linear functions from small datasets (< 100 records). Compared with conventional algorithms, such as random forest, support vector machine, and linear regressions, the QCL can offer better predictions with some one-dimensional functions and experimental chemical databases. QCL will potentially help the virtual exploration of new molecules and materials more efficiently through its superior prediction performances
Using GPT-4 in Parameter Selection of Materials Informatics: Improving Predictive Accuracy Amidst Data Scarcity and \u27Ugly Duckling\u27 Dilemma
Materials informatics and cheminformatics struggle with data scarcity, hindering the extraction of significant relationships between structures and properties. The "Ugly Duckling" theorem, suggesting the difficulty of data processing without assumptions or prior knowledge, exacerbates this problem. Current methodologies don\u27t entirely bypass this theorem and may lead to decreased accuracy with unfamiliar data. We propose using Open AI GPT-4 language model for explanatory variable selection, leveraging its extensive knowledge and logical reasoning capabilities to embed domain knowledge in tasks predicting structure-property correlations, such as the refractive index of polymers. This can partially overcome challenges posed by the "Ugly Duckling" theorem and limited data availability
Prompt engineering of GPT-4 for chemical research: what can/cannot be done?
This paper evaluates the capabilities and limitations of the Generative Pre-trained Transformer 4 (GPT-4) in chemical research. Although GPT-4 exhibits remarkable proficiencies, it is evident that the quality of input data significantly affects its performance. We explore GPT-4\u27s potential in chemical tasks, such as foundational chemistry knowledge, cheminformatics, data analysis, problem prediction, and proposal abilities. While the language model partially outperformed traditional methods, such as black-box optimization, it fell short against specialized algorithms, highlighting the need for their combined use. The paper shares the prompts given to GPT-4 and its responses, providing a resource for prompt engineering within the community, and concludes with a discussion on the future of chemical research using large language models
Exploration of organic superionic glassy conductors by process and materials informatics with lossless graph database
Data-driven material exploration is a ground-breaking research style; however, daily experimental results are difficult to record, analyze, and share. We report a new data platform that losslessly describes the relationships of structures, properties, and processes as graphs in electronic laboratory notebooks. As a model project, organic superionic glassy conductors were explored by recording over 500 different experiments. Automated data analysis revealed the essential factors for a remarkable room temperature ionic conductivity of 10^−4-10^−3 S/cm and a lithium transference number of around 0.8. In contrast to previous materials research, everyone can access all the experimental results, including graphs, raw measurement data, and data processing systems, at a public repository. Direct data sharing will improve scientific communication and accelerate integration of material knowledge
Automated experiment and data generation by foundation models for synthesizing polyamic acid particles
This study proposes an automated system for synthesizing polyamic acid particles using a custom liquid-handling device and a robotic arm. Integrating cameras and a multimodal large language model facilitates continuous monitoring and documentation, enhancing objectivity in synthetic experiments, and enabling future advancements in experimental research
Automated design of Li+-conducting polymer by quantum-inspired annealing
Automated molecule design by computers has been an essential topic in materials informatics. Still, generating practical structures is not easy because of the difficulty in treating material stability, synthetic difficulty, mechanical properties, and other miscellaneous parameters, often leading to the generation of junk molecules. We tackle the problem by introducing supervised/unsupervised machine learning and quantum-inspired annealing. Our autonomous molecular design system can help experimental researchers discover practical materials more efficiently. Like the human design process, new molecules are explored based on knowledge of existing compounds. A new solid-state polymer electrolyte for lithium-ion batteries is designed and synthesized, giving a promising room temperature conductivity of 10^-5 S/cm with reasonable thermal, chemical, and mechanical properties