Search CORE

29,063 research outputs found

Exploration of Reaction Pathways and Chemical Transformation Networks

Author: Reiher Markus
Simm Gregor N.
Vaucher Alain C.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 03/12/2018
Field of study

For the investigation of chemical reaction networks, the identification of all relevant intermediates and elementary reactions is mandatory. Many algorithmic approaches exist that perform explorations efficiently and automatedly. These approaches differ in their application range, the level of completeness of the exploration, as well as the amount of heuristics and human intervention required. Here, we describe and compare the different approaches based on these criteria. Future directions leveraging the strengths of chemical heuristics, human interaction, and physical rigor are discussed.Comment: 48 pages, 4 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Systematic Topology Analysis and Generation Using Degree Correlations

Author: Amin Vahdat
Bu T.
Dmitri Krioukov
Dorogovtsev S. N.
Erdós P.
Fraigniaud P.
Gkantsidis C.
Kevin Fall
Priya Mahadevan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

We present a new, systematic approach for analyzing network topologies. We first introduce the dK-series of probability distributions specifying all degree correlations within d-sized subgraphs of a given graph G. Increasing values of d capture progressively more properties of G at the cost of more complex representation of the probability distribution. Using this series, we can quantitatively measure the distance between two graphs and construct random graphs that accurately reproduce virtually all metrics proposed in the literature. The nature of the dK-series implies that it will also capture any future metrics that may be proposed. Using our approach, we construct graphs for d=0,1,2,3 and demonstrate that these graphs reproduce, with increasing accuracy, important properties of measured and modeled Internet topologies. We find that the d=2 case is sufficient for most practical purposes, while d=3 essentially reconstructs the Internet AS- and router-level topologies exactly. We hope that a systematic method to analyze and synthesize topologies offers a significant improvement to the set of tools available to network topology and protocol researchers.Comment: Final versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach

Author: de Laat Cees
Varbanescu Ana Lucia
Verstraaten Merijn
Publication venue
Publication date: 03/08/2017
Field of study

While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune different graph operations on a specific GPU. However, the performance impact of the input graph has only been taken into account indirectly as a result of the graphs used to benchmark the system. In this work, we present a case study investigating how to use the properties of the input graph to improve the performance of the breadth-first search (BFS) graph traversal. To do so, we first study the performance variation of 15 different BFS implementations across 248 graphs. Using this performance data, we show that significant speed-up can be achieved by combining the best implementation for each level of the traversal. To make use of this data-dependent optimization, we must correctly predict the relative performance of algorithms per graph level, and enable dynamic switching to the optimal algorithm for each level at runtime. We use the collected performance data to train a binary decision tree, to enable high-accuracy predictions and fast switching. We demonstrate empirically that our decision tree is both fast enough to allow dynamic switching between implementations, without noticeable overhead, and accurate enough in its prediction to enable significant BFS speedup. We conclude that our model-driven approach (1) enables BFS to outperform state of the art GPU algorithms, and (2) can be adapted for other BFS variants, other algorithms, or more specific datasets

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Formal Context Generation using Dirichlet Distributions

Author: B Dawkins
B Ganter
SO Kuznetsov
TS Ferguson
W Ulrich
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/09/2018
Field of study

We suggest an improved way to randomly generate formal contexts based on Dirichlet distributions. For this purpose we investigate the predominant way to generate formal contexts, a coin-tossing model, recapitulate some of its shortcomings and examine its stochastic model. Building up on this we propose our Dirichlet model and develop an algorithm employing this idea. By comparing our generation model to a coin-tossing model we show that our approach is a significant improvement with respect to the variety of contexts generated. Finally, we outline a possible application in null model generation for formal contexts.Comment: 16 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Towards Data-centric Graph Machine Learning: Review and Outlook

Author: Bao Zhifeng
Fang Meng
Hu Xia
Liew Alan Wee-Chung
Liu Yixin
Pan Shirui
Zheng Xin
Publication venue
Publication date: 19/09/2023
Field of study

Data-centric AI, with its primary focus on the collection, management, and utilization of data to drive AI models and applications, has attracted increasing attention in recent years. In this article, we conduct an in-depth and comprehensive review, offering a forward-looking outlook on the current efforts in data-centric AI pertaining to graph data-the fundamental data structure for representing and capturing intricate dependencies among massive and diverse real-life entities. We introduce a systematic framework, Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of the graph data lifecycle, including graph data collection, exploration, improvement, exploitation, and maintenance. A thorough taxonomy of each stage is presented to answer three critical graph-centric questions: (1) how to enhance graph data availability and quality; (2) how to learn from graph data with limited-availability and low-quality; (3) how to build graph MLOps systems from the graph data-centric view. Lastly, we pinpoint the future prospects of the DC-GML domain, providing insights to navigate its advancements and applications.Comment: 42 pages, 9 figure

arXiv.org e-Print Archive

Modeling and design of heterogeneous hierarchical bioinspired spider web structures using generative deep learning and additive manufacturing

Author: Buehler Markus J.
Lee Nic A.
Lu Wei
Publication venue
Publication date: 11/04/2023
Field of study

Spider webs are incredible biological structures, comprising thin but strong silk filament and arranged into complex hierarchical architectures with striking mechanical properties (e.g., lightweight but high strength, achieving diverse mechanical responses). While simple 2D orb webs can easily be mimicked, the modeling and synthesis of 3D-based web structures remain challenging, partly due to the rich set of design features. Here we provide a detailed analysis of the heterogenous graph structures of spider webs, and use deep learning as a way to model and then synthesize artificial, bio-inspired 3D web structures. The generative AI models are conditioned based on key geometric parameters (including average edge length, number of nodes, average node degree, and others). To identify graph construction principles, we use inductive representation sampling of large experimentally determined spider web graphs, to yield a dataset that is used to train three conditional generative models: 1) An analog diffusion model inspired by nonequilibrium thermodynamics, with sparse neighbor representation, 2) a discrete diffusion model with full neighbor representation, and 3) an autoregressive transformer architecture with full neighbor representation. All three models are scalable, produce complex, de novo bio-inspired spider web mimics, and successfully construct graphs that meet the design objectives. We further propose algorithm that assembles web samples produced by the generative models into larger-scale structures based on a series of geometric design targets, including helical and parametric shapes, mimicking, and extending natural design principles towards integration with diverging engineering objectives. Several webs are manufactured using 3D printing and tested to assess mechanical properties

arXiv.org e-Print Archive

Domain-Agnostic Molecular Generation with Self-feedback

Author: Chen Huajun
Chen Zhuo
Fan Xiaohui
Fang Yin
Zhang Ningyu
Publication venue
Publication date: 01/09/2023
Field of study

The generation of molecules with desired properties has gained tremendous popularity, revolutionizing the way scientists design molecular structures and providing valuable support for chemical and drug design. However, despite the potential of language models in molecule generation, they face numerous challenges such as the generation of syntactically or chemically flawed molecules, narrow domain focus, and limitations in creating diverse and directionally feasible molecules due to a dearth of annotated data or external molecular databases. To this end, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation. MolGen acquires intrinsic structural and grammatical insights by reconstructing over 100 million molecular SELFIES, while facilitating knowledge transfer between different domains through domain-agnostic molecular prefix tuning. Moreover, we present a self-feedback paradigm that inspires the pre-trained model to align with the ultimate goal of producing molecules with desirable properties. Extensive experiments on well-known benchmarks confirm MolGen's optimization capabilities, encompassing penalized logP, QED, and molecular docking properties. Further analysis shows that MolGen can accurately capture molecule distributions, implicitly learn their structural characteristics, and efficiently explore chemical space. The pre-trained model, codes, and datasets are publicly available for future research at https://github.com/zjunlp/MolGen.Comment: Work in progress. Add results of binding affinit

arXiv.org e-Print Archive