15 research outputs found

    Per- and Polyfluoroalkyl Substances (PFAS) in PubChem: 7 Million and Growing.

    Get PDF
    peer reviewedPer- and polyfluoroalkyl substances (PFAS) are of high concern, with calls to regulate them as a class. In 2021, the Organisation for Economic Co-operation and Development (OECD) revised the definition of PFAS to include any chemical containing at least one saturated CF2 or CF3 moiety. The consequence is that one of the largest open chemical collections, PubChem, with 116 million compounds, now contains over 7 million PFAS under this revised definition. These numbers are several orders of magnitude higher than previously established PFAS lists (typically thousands of entries) and pose an incredible challenge to researchers and computational workflows alike. This article describes a dynamic, openly accessible effort to navigate and explore the >7 million PFAS and >21 million fluorinated compounds (September 2023) in PubChem by establishing the "PFAS and Fluorinated Compounds in PubChem" Classification Browser (or "PubChem PFAS Tree"). A total of 36500 nodes support browsing of the content according to several categories, including classification, structural properties, regulatory status, or presence in existing PFAS suspect lists. Additional annotation and associated data can be used to create subsets (and thus manageable suspect lists or databases) of interest for a wide range of environmental, regulatory, exposomics, and other applications

    Dioxin2023 Plenary: Exploring Millions of PFAS with FAIR and Open Science

    Get PDF
    editorial reviewedPlenary presentation for Dioxin 2023 in Maastricht - Tuesday 12 September Exploring Millions of PFAS with FAIR and Open Science This presentation features a sound track created by Jamie Perera (slide 27) on "Our Chemical Past, Present and Future", which can be downloaded on Vimeo (video) or Soundcloud (sound only). Please leave feedback there if you enjoy it

    ELIXIR and Toxicology: a community in development [version 2; peer review: 2 approved]

    Get PDF
    Toxicology has been an active research field for many decades, with academic, industrial and government involvement. Modern omics and computational approaches are changing the field, from merely disease-specific observational models into target-specific predictive models. Traditionally, toxicology has strong links with other fields such as biology, chemistry, pharmacology, and medicine. With the rise of synthetic and new engineered materials, alongside ongoing prioritisation needs in chemical risk assessment for existing chemicals, early predictive evaluations are becoming of utmost importance to both scientific and regulatory purposes. ELIXIR is an intergovernmental organisation that brings together life science resources from across Europe. To coordinate the linkage of various life science efforts around modern predictive toxicology, the establishment of a new ELIXIR Community is seen as instrumental. In the past few years, joint efforts, building on incidental overlap, have been piloted in the context of ELIXIR. For example, the EU-ToxRisk, diXa, HeCaToS, transQST, and the nanotoxicology community have worked with the ELIXIR TeSS, Bioschemas, and Compute Platforms and activities. In 2018, a core group of interested parties wrote a proposal, outlining a sketch of what this new ELIXIR Toxicology Community would look like. A recent workshop (held September 30th to October 1st, 2020) extended this into an ELIXIR Toxicology roadmap and a shortlist of limited investment-high gain collaborations to give body to this new community. This Whitepaper outlines the results of these efforts and defines our vision of the ELIXIR Toxicology Community and how it complements other ELIXIR activities

    PubChemLite for Exposomics

    No full text
    PubChemLite is a subset of PubChem (https://pubchem.ncbi.nlm.nih.gov/) selected from major categories of the Table of Contents page at the PubChem Classification Browser (https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72). With this release, there is now just one "exposomics" flavour, which is the former tier1 plus two new categories (Associated Disorders & Diseases and Identification): PubChemLite "exposomics" is 371,663 compounds (31 Oct 2020) compiled from 10 categories: AgroChemInfo, BioPathway, DrugMedicInfo, FoodRelated, PharmacoInfo, SafetyInfo, ToxicityInfo, KnownUse, DisorderDisease, Identification. PubChemCIDs have been collapsed by InChIKey first block, reporting the structure from the most annotated CID, plus related CIDs. Entries that will be ignored by MetFrag (salts, disconnected substances) or cause errors (e.g. transition metals) have been removed. The Patent and PubMed ID counts are extracted from files on the PubChem FTP site. The "AnnoTypeCount" term counts how many of the categories are represented, the subsequent column (named per category) counts the number of annotation categories available in the next sub-category of the TOC entry. These files can be used "as is" as localCSV for MetFrag Command Line (https://ipb-halle.github.io/MetFrag/) - please do NOT upload these files directly to the web interface, they are too large and will instead be available in a drop-down menu. Further details are described in Schymanski et al. (2021) DOI:10.1186/s13321-021-00489-0. NOTE: The latest PubChemLite for Exposomics version can be downloaded at DOI:10.5281/zenodo.5995885 (currently updating monthly)

    Empowering Large Chemical Knowledge Bases for Exposomics: PubChemLite Meets MetFrag

    Get PDF
    Abstract Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much—yet not enough—information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput “big data” services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments

    Retrospective non-target analysis to support regulatory water monitoring: from masses of interest to recommendations via in silico workflows

    Get PDF
    Abstract Background Applying non-target analysis (NTA) in regulatory environmental monitoring remains challenging—instead of having exploratory questions, regulators usually already have specific questions related to environmental protection aims. Additionally, data analysis can seem overwhelming because of the large data volumes and many steps required. This work aimed to establish an open in silico workflow to identify environmental chemical unknowns via retrospective NTA within the scope of a pre-existing Swiss environmental monitoring campaign focusing on industrial chemicals. The research question addressed immediate regulatory priorities: identify pollutants with industrial point sources occurring at the highest intensities over two time points. Samples from 22 wastewater treatment plants obtained in 2018 and measured using liquid chromatography–high resolution mass spectrometry were retrospectively analysed by (i) performing peak-picking to identify masses of interest; (ii) prescreening and quality-controlling spectra, and (iii) tentatively identifying priority “known unknown” pollutants by leveraging environmentally relevant chemical information provided by Swiss, Swedish, EU-wide, and American regulators. This regulator-supplied information was incorporated into MetFrag, an in silico identification tool replete with “post-relaunch” features used here. This study’s unique regulatory context posed challenges in data quality and volume that were directly addressed with the prescreening, quality control, and identification workflow developed. Results One confirmed and 21 tentative identifications were achieved, suggesting the presence of compounds as diverse as manufacturing reagents, adhesives, pesticides, and pharmaceuticals in the samples. More importantly, an in-depth interpretation of the results in the context of environmental regulation and actionable next steps are discussed. The prescreening and quality control workflow is openly accessible within the R package Shinyscreen, and adaptable to any (retrospective) analysis requiring automated quality control of mass spectra and non-target identification, with potential applications in environmental and metabolomics analyses. Conclusions NTA in regulatory monitoring is critical for environmental protection, but bottlenecks in data analysis and results interpretation remain. The prescreening and quality control workflow, and interpretation work performed here are crucial steps towards scaling up NTA for environmental monitoring

    Virtual Podium Keynote: Compound Identification and Exposomics: DIY Databases?

    No full text
    In light of recent events, many of us have been impacted by the cancellation of conferences and meetings. We are not only losing the opportunity to present our research, but a chance to connect with our community. Virtual Podium is a platform and opportunity to present and learn about compelling scientific research. Our third session will be focused on Compound Identification. Our keynote speaker this week will be Emma Schymanski who is the PI of Environmental Cheminformatics at the University of Luxembourg. Session 3: Compound Identification Friday, April 10, 2020 at 12:00-1:00PM PDT (3:00-4:00PM EDT) Session 3 - Compound Identification: https://www.eventbrite.com/e/10142661373

    Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag

    Get PDF
    Abstract Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much—yet not enough—information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput “big data” services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments
    corecore