29,063 research outputs found
Exploration of Reaction Pathways and Chemical Transformation Networks
For the investigation of chemical reaction networks, the identification of
all relevant intermediates and elementary reactions is mandatory. Many
algorithmic approaches exist that perform explorations efficiently and
automatedly. These approaches differ in their application range, the level of
completeness of the exploration, as well as the amount of heuristics and human
intervention required. Here, we describe and compare the different approaches
based on these criteria. Future directions leveraging the strengths of chemical
heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure
Systematic Topology Analysis and Generation Using Degree Correlations
We present a new, systematic approach for analyzing network topologies. We
first introduce the dK-series of probability distributions specifying all
degree correlations within d-sized subgraphs of a given graph G. Increasing
values of d capture progressively more properties of G at the cost of more
complex representation of the probability distribution. Using this series, we
can quantitatively measure the distance between two graphs and construct random
graphs that accurately reproduce virtually all metrics proposed in the
literature. The nature of the dK-series implies that it will also capture any
future metrics that may be proposed. Using our approach, we construct graphs
for d=0,1,2,3 and demonstrate that these graphs reproduce, with increasing
accuracy, important properties of measured and modeled Internet topologies. We
find that the d=2 case is sufficient for most practical purposes, while d=3
essentially reconstructs the Internet AS- and router-level topologies exactly.
We hope that a systematic method to analyze and synthesize topologies offers a
significant improvement to the set of tools available to network topology and
protocol researchers.Comment: Final versio
Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach
While it is well-known and acknowledged that the performance of graph
algorithms is heavily dependent on the input data, there has been surprisingly
little research to quantify and predict the impact the graph structure has on
performance. Parallel graph algorithms, running on many-core systems such as
GPUs, are no exception: most research has focused on how to efficiently
implement and tune different graph operations on a specific GPU. However, the
performance impact of the input graph has only been taken into account
indirectly as a result of the graphs used to benchmark the system.
In this work, we present a case study investigating how to use the properties
of the input graph to improve the performance of the breadth-first search (BFS)
graph traversal. To do so, we first study the performance variation of 15
different BFS implementations across 248 graphs. Using this performance data,
we show that significant speed-up can be achieved by combining the best
implementation for each level of the traversal. To make use of this
data-dependent optimization, we must correctly predict the relative performance
of algorithms per graph level, and enable dynamic switching to the optimal
algorithm for each level at runtime.
We use the collected performance data to train a binary decision tree, to
enable high-accuracy predictions and fast switching. We demonstrate empirically
that our decision tree is both fast enough to allow dynamic switching between
implementations, without noticeable overhead, and accurate enough in its
prediction to enable significant BFS speedup. We conclude that our model-driven
approach (1) enables BFS to outperform state of the art GPU algorithms, and (2)
can be adapted for other BFS variants, other algorithms, or more specific
datasets
Formal Context Generation using Dirichlet Distributions
We suggest an improved way to randomly generate formal contexts based on
Dirichlet distributions. For this purpose we investigate the predominant way to
generate formal contexts, a coin-tossing model, recapitulate some of its
shortcomings and examine its stochastic model. Building up on this we propose
our Dirichlet model and develop an algorithm employing this idea. By comparing
our generation model to a coin-tossing model we show that our approach is a
significant improvement with respect to the variety of contexts generated.
Finally, we outline a possible application in null model generation for formal
contexts.Comment: 16 pages, 7 figure
Towards Data-centric Graph Machine Learning: Review and Outlook
Data-centric AI, with its primary focus on the collection, management, and
utilization of data to drive AI models and applications, has attracted
increasing attention in recent years. In this article, we conduct an in-depth
and comprehensive review, offering a forward-looking outlook on the current
efforts in data-centric AI pertaining to graph data-the fundamental data
structure for representing and capturing intricate dependencies among massive
and diverse real-life entities. We introduce a systematic framework,
Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of
the graph data lifecycle, including graph data collection, exploration,
improvement, exploitation, and maintenance. A thorough taxonomy of each stage
is presented to answer three critical graph-centric questions: (1) how to
enhance graph data availability and quality; (2) how to learn from graph data
with limited-availability and low-quality; (3) how to build graph MLOps systems
from the graph data-centric view. Lastly, we pinpoint the future prospects of
the DC-GML domain, providing insights to navigate its advancements and
applications.Comment: 42 pages, 9 figure
Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing
Spider webs are incredible biological structures, comprising thin but strong
silk filament and arranged into complex hierarchical architectures with
striking mechanical properties (e.g., lightweight but high strength, achieving
diverse mechanical responses). While simple 2D orb webs can easily be mimicked,
the modeling and synthesis of 3D-based web structures remain challenging,
partly due to the rich set of design features. Here we provide a detailed
analysis of the heterogenous graph structures of spider webs, and use deep
learning as a way to model and then synthesize artificial, bio-inspired 3D web
structures. The generative AI models are conditioned based on key geometric
parameters (including average edge length, number of nodes, average node
degree, and others). To identify graph construction principles, we use
inductive representation sampling of large experimentally determined spider web
graphs, to yield a dataset that is used to train three conditional generative
models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics,
with sparse neighbor representation, 2) a discrete diffusion model with full
neighbor representation, and 3) an autoregressive transformer architecture with
full neighbor representation. All three models are scalable, produce complex,
de novo bio-inspired spider web mimics, and successfully construct graphs that
meet the design objectives. We further propose algorithm that assembles web
samples produced by the generative models into larger-scale structures based on
a series of geometric design targets, including helical and parametric shapes,
mimicking, and extending natural design principles towards integration with
diverging engineering objectives. Several webs are manufactured using 3D
printing and tested to assess mechanical properties
Domain-Agnostic Molecular Generation with Self-feedback
The generation of molecules with desired properties has gained tremendous
popularity, revolutionizing the way scientists design molecular structures and
providing valuable support for chemical and drug design. However, despite the
potential of language models in molecule generation, they face numerous
challenges such as the generation of syntactically or chemically flawed
molecules, narrow domain focus, and limitations in creating diverse and
directionally feasible molecules due to a dearth of annotated data or external
molecular databases. To this end, we introduce MolGen, a pre-trained molecular
language model tailored specifically for molecule generation. MolGen acquires
intrinsic structural and grammatical insights by reconstructing over 100
million molecular SELFIES, while facilitating knowledge transfer between
different domains through domain-agnostic molecular prefix tuning. Moreover, we
present a self-feedback paradigm that inspires the pre-trained model to align
with the ultimate goal of producing molecules with desirable properties.
Extensive experiments on well-known benchmarks confirm MolGen's optimization
capabilities, encompassing penalized logP, QED, and molecular docking
properties. Further analysis shows that MolGen can accurately capture molecule
distributions, implicitly learn their structural characteristics, and
efficiently explore chemical space. The pre-trained model, codes, and datasets
are publicly available for future research at https://github.com/zjunlp/MolGen.Comment: Work in progress. Add results of binding affinit
- …