In this work, we present the ChemNLP library that can be used for 1) curating
open access datasets for materials and chemistry literature, developing and
comparing traditional machine learning, transformers and graph neural network
models for 2) classifying and clustering texts, 3) named entity recognition for
large-scale text-mining, 4) abstractive summarization for generating titles of
articles from abstracts, 5) text generation for suggesting abstracts from
titles, 6) integration with density functional theory dataset for identifying
potential candidate materials such as superconductors, and 7) web-interface
development for text and reference query. We primarily use the publicly
available arXiv and Pubchem datasets but the tools can be used for other
datasets as well. Moreover, as new models are developed, they can be easily
integrated in the library. ChemNLP is available at the websites:
https://github.com/usnistgov/chemnlp and https://jarvis.nist.gov/jarvischemnlp