3 research outputs found
SELFormer: Molecular Representation Learning via SELFIES Language Models
Automated computational analysis of the vast chemical space is critical for
numerous fields of research such as drug discovery and material science.
Representation learning techniques have recently been employed with the primary
objective of generating compact and informative numerical expressions of
complex data. One approach to efficiently learn molecular representations is
processing string-based notations of chemicals via natural language processing
(NLP) algorithms. Majority of the methods proposed so far utilize SMILES
notations for this purpose; however, SMILES is associated with numerous
problems related to validity and robustness, which may prevent the model from
effectively uncovering the knowledge hidden in the data. In this study, we
propose SELFormer, a transformer architecture-based chemical language model
that utilizes a 100% valid, compact and expressive notation, SELFIES, as input,
in order to learn flexible and high-quality molecular representations.
SELFormer is pre-trained on two million drug-like compounds and fine-tuned for
diverse molecular property prediction tasks. Our performance evaluation has
revealed that, SELFormer outperforms all competing methods, including graph
learning-based approaches and SMILES-based chemical language models, on
predicting aqueous solubility of molecules and adverse drug reactions. We also
visualized molecular representations learned by SELFormer via dimensionality
reduction, which indicated that even the pre-trained model can discriminate
molecules with differing structural properties. We shared SELFormer as a
programmatic tool, together with its datasets and pre-trained models. Overall,
our research demonstrates the benefit of using the SELFIES notations in the
context of chemical language modeling and opens up new possibilities for the
design and discovery of novel drug candidates with desired features.Comment: 22 pages, 4 figures, 8 table
TMCO1 Gen Sekans Varyanlatlarının Fonksiyonel Özelliklerinin In Silico Analizlerlerle Değerlendirilmesi
Transmembrane
and Coiled-Coil Domains 1 (TMCO1) protein is encoded by TMCO1 gene consists of 7 exons. Previous studies have identified
multiple TMCO1 variants in patients
with cerebro-facio-thoracic dysplasia (CFTD) and TMCO1 locus was also shown to be associated with primary open angle
glaucoma (POAG). However, there are limited number of research exist reporting
associations of the TMCO1 gene
sequence variants and majority of the findings affirm the pathogenicity of the
nonsense and frameshift TMCO1 variants
and their associations with clinical phenotypes. Thus functional properties of
the single nucleotide variants causing amino acid changes in the TMCO1 are yet
to be comprehensively elucidated. In this study, we evaluated the effects of
amino acid substitutions on protein structure, identified their putative roles
in post-translational modifications (PTM) and in regulatory mechanism for TMCO1
protein. We classified 41 missense variants as pathogenic based on combined
scores of common in silico tools (SIFT, MutationTaster2, Polyphen2). Of these
41 variants, four (p.K211Q, p.K105E, p.S235F, p.K237R) were identified to be
located in PTMs and regulatory protein binding sites; thus they were proposed
to be putative functional variants. Moreover, rs1387528611 (p.Lys128Gln) had
also strong evidence (RegulomeDB score=2b) for its possible regulatory
function. The results of our in silico analyses highlight the functional
importance of the missense TMCO1
variants that may contribute to the TMCO1-associated
disease phenotypes and further in vivo evaluation yet to be needed to uncover
their role in human diseases
Democratizing knowledge representation with BioCypher
International audienceStandardising the representation of biomedical knowledge among allresearchers is an insurmountable task, hindering the effectiveness of manycomputational methods. To facilitate harmonisation and interoperability despitethis fundamental challenge, we propose to standardise the framework ofknowledge graph creation instead. We implement this standardisation inBioCypher, a FAIR (findable, accessible, interoperable, reusable) framework totransparently build biomedical knowledge graphs while preserving provenances ofthe source data. Mapping the knowledge onto biomedical ontologies helps tobalance the needs for harmonisation, human and machine readability, and ease ofuse and accessibility to non-specialist researchers. We demonstrate the usefulnessof the framework on a variety of use cases, from maintenance of task-specificknowledge stores, to interoperability between biomedical domains, to on-demandbuilding of task-specific knowledge graphs for federated learning. BioCypher(https://biocypher.org) thus facilitates automating knowledge-based biomedicalresearch, and we encourage the community to further develop and use it