17,965 research outputs found
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research
Graph neural networks (GNNs) have shown high potential for a variety of
real-world, challenging applications, but one of the major obstacles in GNN
research is the lack of large-scale flexible datasets. Most existing public
datasets for GNNs are relatively small, which limits the ability of GNNs to
generalize to unseen data. The few existing large-scale graph datasets provide
very limited labeled data. This makes it difficult to determine if the GNN
model's low accuracy for unseen data is inherently due to insufficient training
data or if the model failed to generalize. Additionally, datasets used to train
GNNs need to offer flexibility to enable a thorough study of the impact of
various factors while training GNN models.
In this work, we introduce the Illinois Graph Benchmark (IGB), a research
dataset tool that the developers can use to train, scrutinize and
systematically evaluate GNN models with high fidelity. IGB includes both
homogeneous and heterogeneous academic graphs of enormous sizes, with more than
40% of their nodes labeled. Compared to the largest graph datasets publicly
available, the IGB provides over 162X more labeled data for deep learning
practitioners and developers to create and evaluate models with higher
accuracy. The IGB dataset is a collection of academic graphs designed to be
flexible, enabling the study of various GNN architectures, embedding generation
techniques, and analyzing system performance issues for node classification
tasks. IGB is open-sourced, supports DGL and PyG frameworks, and comes with
releases of the raw text that we believe foster emerging language models and
GNN research projects. An early public version of IGB is available at
https://github.com/IllinoisGraphBenchmark/IGB-Datasets.Comment: Accepted in KDD'23 conference. This is final preprint versio
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction
The task of learning an expressive molecular representation is central to developing quantitative structure–activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed
AI in drug discovery and its clinical relevance
The COVID-19 pandemic has emphasized the need for novel drug discovery process. However, the journey from conceptualizing a drug to its eventual implementation in clinical settings is a long, complex, and expensive process, with many potential points of failure. Over the past decade, a vast growth in medical information has coincided with advances in computational hardware (cloud computing, GPUs, and TPUs) and the rise of deep learning. Medical data generated from large molecular screening profiles, personal health or pathology records, and public health organizations could benefit from analysis by Artificial Intelligence (AI) approaches to speed up and prevent failures in the drug discovery pipeline. We present applications of AI at various stages of drug discovery pipelines, including the inherently computational approaches of de novo design and prediction of a drug's likely properties. Open-source databases and AI-based software tools that facilitate drug design are discussed along with their associated problems of molecule representation, data collection, complexity, labeling, and disparities among labels. How contemporary AI methods, such as graph neural networks, reinforcement learning, and generated models, along with structure-based methods, (i.e., molecular dynamics simulations and molecular docking) can contribute to drug discovery applications and analysis of drug responses is also explored. Finally, recent developments and investments in AI-based start-up companies for biotechnology, drug design and their current progress, hopes and promotions are discussed in this article.
Other InformationPublished in:HeliyonLicense: https://creativecommons.org/licenses/by/4.0/See article on publisher's website: https://doi.org/10.1016/j.heliyon.2023.e17575 </p
A New Clustering Algorithm Based on Pattern Extraction in Molecular Fingerprints
In this paper an algorithm for the extraction of patterns in chemical fingerprints is described. As input this algorithm uses a fingerprint representation of the molecule dataset, generating a group of consistent disjoint patterns also represented as binary arrays, which are satisfied by not necessarily disjoint subsets of molecules in the dataset. The algorithm has been completely developed in Java, allowing its integration into free applications of computational chemistry. The algorithm has been tested, and the use of the patterns instead of the original fingerprints has presented an increase in the efficiency in the processes of datasets classification. The results show that it is possible to reconstruct the original fingerprints using the final group of patterns that characterize all the elements of the dataset
- …