Search CORE

196 research outputs found

Refining multiple sequence alignments with conserved core regions

Author: Bryant Stephen H.
Chakrabarti Saikat
Lanczycki Christopher J.
Panchenko Anna R.
Przytycka Teresa M.
Thiessen Paul A.
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution () and will be incorporated into the next release of the Cn3D structure/alignment viewer

CiteSeerX

PubMed Central

Per- and Polyfluoroalkyl Substances (PFAS) in PubChem: 7 Million and Growing.

Author: Bolton Evan E
CHIRSIR Parviel
KONDIC Todor
SCHYMANSKI Emma
Thiessen Paul A
Zhang Jian
Publication venue: American Chemical Society (ACS)
Publication date: 07/11/2023
Field of study

peer reviewedPer- and polyfluoroalkyl substances (PFAS) are of high concern, with calls to regulate them as a class. In 2021, the Organisation for Economic Co-operation and Development (OECD) revised the definition of PFAS to include any chemical containing at least one saturated CF2 or CF3 moiety. The consequence is that one of the largest open chemical collections, PubChem, with 116 million compounds, now contains over 7 million PFAS under this revised definition. These numbers are several orders of magnitude higher than previously established PFAS lists (typically thousands of entries) and pose an incredible challenge to researchers and computational workflows alike. This article describes a dynamic, openly accessible effort to navigate and explore the >7 million PFAS and >21 million fluorinated compounds (September 2023) in PubChem by establishing the "PFAS and Fluorinated Compounds in PubChem" Classification Browser (or "PubChem PFAS Tree"). A total of 36500 nodes support browsing of the content according to several categories, including classification, structural properties, regulatory status, or presence in existing PFAS suspect lists. Additional annotation and associated data can be used to create subsets (and thus manageable suspect lists or databases) of interest for a wide range of environmental, regulatory, exposomics, and other applications

Open Repository and Bibliography - Luxembourg

Open and FAIR transformation product data for improved suspect/non-target screening: REFTPs in the NORMAN-SLE, PubChem and patRoon

Author: Bolton Evan E
CHIRSIR Parviel
Helmus Rick
SCHYMANSKI Emma
Thiessen Paul A
Zhang Jian
Publication venue
Publication date: 13/06/2023
Field of study

peer reviewedPresentation given at ICCE, Venice, 11 - 15 June 2023

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Open Repository and Bibliography - Luxembourg

HESI UVCB Meeting - Integrating UVCBs and Related Data into Open Chemical Knowledgebases - PubChem and NORMAN-SLE

Author: Bolton Evan E.
ELAPAVALORE Anjana
Li Qingliang
SCHYMANSKI Emma
Thiessen Paul A.
Zaslavsky Leonid
Zhang Jian
Publication venue
Publication date: 18/09/2023
Field of study

Presentation and poster (given remotely) for the HESI UVCB Meeting in Iceland, 18-19 September, 2023

Open Repository and Bibliography - Luxembourg

MMDB: annotating protein sequences with Entrez's 3D-structure database

Author: Addess Kenneth J.
Bryant Stephen H.
Chen Jie
Geer Lewis Y.
He Jane
He Siqian
Lu Shennan
Madej Thomas
Marchler-Bauer Aron
Thiessen Paul A.
Wang Yanli
Zhang Naigong
Publication venue: Oxford University Press
Publication date: 29/11/2006
Field of study

Three-dimensional (3D) structure is now known for a large fraction of all protein families. Thus, it has become rather likely that one will find a homolog with known 3D structure when searching a sequence database with an arbitrary query sequence. Depending on the extent of similarity, such neighbor relationships may allow one to infer biological function and to identify functional sites such as binding motifs or catalytic centers. Entrez's 3D-structure database, the Molecular Modeling Database (MMDB), provides easy access to the richness of 3D structure data and its large potential for functional annotation. Entrez's search engine offers several tools to assist biologist users: (i) links between databases, such as between protein sequences and structures, (ii) pre-computed sequence and structure neighbors, (iii) visualization of structure and sequence/structure alignment. Here, we describe an annotation service that combines some of these tools automatically, Entrez's ‘Related Structure’ links. For all proteins in Entrez, similar sequences with known 3D structure are detected by BLAST and alignments are recorded. The ‘Related Structure’ service summarizes this information and presents 3D views mapping sequence residues onto all 3D structures available in MMDB ()

Crossref

PubMed Central

Universality and its Origins at the Amorphous Solidification Transition

Author: A. Zippelius
A. Zippelius
Annette Zippelius
C. Roos
E. Marinari
E. Marinari
H. E. Castillo
Horacio E. Castillo
J.-P. Bouchaud
K. Binder
M. Huthmann
M. Mézard
O. Thiessen
P. G. de Gennes
P. G. de Gennes
P. M. Goldbart
P. M. Goldbart
P. M. Goldbart
P. M. Goldbart
P. M. Goldbart
P. M. Goldbart
P. M. Goldbart
Paul M. Goldbart
R. T. Deam
S. Franz
S. J. Barsky
W. H. Stockmayer
Weiqun Peng
Publication venue: 'American Physical Society (APS)'
Publication date: 23/09/1997
Field of study

Systems undergoing an equilibrium phase transition from a liquid state to an amorphous solid state exhibit certain universal characteristics. Chief among these are the fraction of particles that are randomly localized and the scaling functions that describe the order parameter and (equivalently) the statistical distribution of localization lengths for these localized particles. The purpose of this Paper is to discuss the origins and consequences of this universality, and in doing so, three themes are explored. First, a replica-Landau-type approach is formulated for the universality class of systems that are composed of extended objects connected by permanent random constraints and undergo amorphous solidification at a critical density of constraints. This formulation generalizes the cases of randomly cross-linked and end-linked macromolecular systems, discussed previously. The universal replica free energy is constructed, in terms of the replica order parameter appropriate to amorphous solidification, the value of the order parameter is obtained in the liquid and amorphous solid states, and the chief universal characteristics are determined. Second, the theory is reformulated in terms of the distribution of local static density fluctuations rather than the replica order parameter. It is shown that a suitable free energy can be constructed, depending on the distribution of static density fluctuations, and that this formulation yields precisely the same conclusions as the replica approach. Third, the universal predictions of the theory are compared with the results of extensive numerical simulations of randomly cross-linked macromolecular systems, due to Barsky and Plischke, and excellent agreement is found.Comment: 10 pages, including 3 figures (REVTEX

arXiv.org e-Print Archive

Crossref

ShinyTPs: Curating Transformation Products from Text Mining Results.

Author: Bolton Evan E
CHIRSIR Parviel
KRIER Jessy
PALM Emma Helena
SCHYMANSKI Emma
Thiessen Paul A
Zhang Jian
Publication venue: American Chemical Society (ACS)
Publication date: 29/09/2023
Field of study

peer reviewedTransformation product (TP) information is essential to accurately evaluate the hazards compounds pose to human health and the environment. However, information about TPs is often limited, and existing data is often not fully Findable, Accessible, Interoperable, and Reusable (FAIR). FAIRifying existing TP knowledge is a relatively easy path toward improving access to data for identification workflows and for machine-learning-based algorithms. ShinyTPs was developed to curate existing transformation information derived from text-mined data within the PubChem database. The application (available as an R package) visualizes the text-mined chemical names to facilitate the user validation of the automatically extracted reactions. ShinyTPs was applied to a case study using 436 tentatively identified compounds to prioritize TP retrieval. This resulted in the extraction of 645 reactions (associated with 496 compounds), of which 319 were not previously available in PubChem. The curated reactions were added to the PubChem Transformations library, which was used as a TP suspect list for identification of TPs using the open-source workflow patRoon. In total, 72 compounds from the library were tentatively identified, 18% of which were curated using ShinyTPs, showing that the app can help support TP identification in non-target analysis workflows.U-AGR-8049 - H2020 - ZeroPM (01/10/2021 - 30/09/2026) - SCHYMANSKI Emm

Open Repository and Bibliography - Luxembourg

Civil society leaders and Northern Ireland's peace process: hopes and fears for the future

Author: Ahmed Shamima
Bew Paul
Bogdan Robert
Boulding Elise
Brown Michael
Byrne Sean
Byrne Sean
Byrne Sean
Cairns
Carter Jimmy
Charmaz Kathy
Chatfield Charles
Cochrane Feargal
Cox Michael
Crocker Chester A.
Darby John
Darby John
Diamond Louise
Farrell Michael
Fitzduff Mari
Hawken Paul
Jeong Howon
Junne Gerd
Lederach John Paul
Lederach John Paul
MacGinty Roger
MacGinty Roger
Senehi Jessica
Senehi Jessica
Shirlow Peter
Stedman Stephen
Stephen Fiona
Tajfel Henri
Taylor Rupert
Thiessen Chuck
Wallach John
Publication venue: 'SAGE Publications'
Publication date: 01/01/2010
Field of study

Crossref

Coventry University Pure Portal

CDD: a Conserved Domain Database for protein classification

Author: Anderson John B.
Bryant Stephen H.
Cherukuri Praveen F.
DeWeese-Scott Carol
Geer Lewis Y.
Gwadz Marc
He Siqian
Hurwitz David I.
Jackson John D.
Ke Zhaoxi
Lanczycki Christopher J.
Liebert Cynthia A.
Liu Chunlei
Lu Fu
Marchler Gabriele H.
Marchler-Bauer Aron
Mullokandov Mikhail
Shoemaker Benjamin A.
Simonyan Vahan
Song James S.
Thiessen Paul A.
Yamashita Roxanne A.
Yin Jodie J.
Zhang Dachuan
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

The Conserved Domain Database (CDD) is the protein classification component of NCBI's Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed®, and can be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. CD-Search, which is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, is a fast, interactive tool to identify conserved domains in new protein sequences. CD-Search results for protein sequences in Entrez are pre-computed to provide links between proteins and domain models, and computational annotation visible upon request. Protein–protein queries submitted to NCBI's BLAST search service at http://www.ncbi.nlm.nih.gov/BLAST are scanned for the presence of conserved domains by default. While CDD started out as essentially a mirror of publicly available domain alignment collections, such as SMART, Pfam and COG, we have continued an effort to update, and in some cases replace these models with domain hierarchies curated at the NCBI. Here, we report on the progress of the curation effort and associated improvements in the functionality of the CDD information retrieval system

CiteSeerX

Crossref

PubMed Central