28 research outputs found
The Cost of Perfection for Matchings in Graphs
Perfect matchings and maximum weight matchings are two fundamental
combinatorial structures. We consider the ratio between the maximum weight of a
perfect matching and the maximum weight of a general matching. Motivated by the
computer graphics application in triangle meshes, where we seek to convert a
triangulation into a quadrangulation by merging pairs of adjacent triangles, we
focus mainly on bridgeless cubic graphs. First, we characterize graphs that
attain the extreme ratios. Second, we present a lower bound for all bridgeless
cubic graphs. Third, we present upper bounds for subclasses of bridgeless cubic
graphs, most of which are shown to be tight. Additionally, we present tight
bounds for the class of regular bipartite graphs
Improving Molecular Properties Prediction Through Latent Space Fusion
Pre-trained Language Models have emerged as promising tools for predicting
molecular properties, yet their development is in its early stages,
necessitating further research to enhance their efficacy and address challenges
such as generalization and sample efficiency. In this paper, we present a
multi-view approach that combines latent spaces derived from state-of-the-art
chemical models. Our approach relies on two pivotal elements: the embeddings
derived from MHG-GNN, which represent molecular structures as graphs, and
MoLFormer embeddings rooted in chemical language. The attention mechanism of
MoLFormer is able to identify relations between two atoms even when their
distance is far apart, while the GNN of MHG-GNN can more precisely capture
relations among multiple atoms closely located. In this work, we demonstrate
the superior performance of our proposed multi-view approach compared to
existing state-of-the-art methods, including MoLFormer-XL, which was trained on
1.1 billion molecules, particularly in intricate tasks such as predicting
clinical trial drug toxicity and inhibiting HIV replication. We assessed our
approach using six benchmark datasets from MoleculeNet, where it outperformed
competitors in five of them. Our study highlights the potential of latent space
fusion and feature integration for advancing molecular property prediction. In
this work, we use small versions of MHG-GNN and MoLFormer, which opens up an
opportunity for further improvement when our approach uses a larger-scale
dataset.Comment: 8 Pages, 4 Figures - Submited to the AI4Science Workshop - Neurips
202
Position Paper on Dataset Engineering to Accelerate Science
Data is a critical element in any discovery process. In the last decades, we
observed exponential growth in the volume of available data and the technology
to manipulate it. However, data is only practical when one can structure it for
a well-defined task. For instance, we need a corpus of text broken into
sentences to train a natural language machine-learning model. In this work, we
will use the token \textit{dataset} to designate a structured set of data built
to perform a well-defined task. Moreover, the dataset will be used in most
cases as a blueprint of an entity that at any moment can be stored as a table.
Specifically, in science, each area has unique forms to organize, gather and
handle its datasets. We believe that datasets must be a first-class entity in
any knowledge-intensive process, and all workflows should have exceptional
attention to datasets' lifecycle, from their gathering to uses and evolution.
We advocate that science and engineering discovery processes are extreme
instances of the need for such organization on datasets, claiming for new
approaches and tooling. Furthermore, these requirements are more evident when
the discovery workflow uses artificial intelligence methods to empower the
subject-matter expert. In this work, we discuss an approach to bringing
datasets as a critical entity in the discovery process in science. We
illustrate some concepts using material discovery as a use case. We chose this
domain because it leverages many significant problems that can be generalized
to other science fields.Comment: Published at 2nd Annual AAAI Workshop on AI to Accelerate Science and
Engineering (AI2ASE)
https://ai-2-ase.github.io/papers/16%5cSubmission%5cAAAI_Dataset_Engineering-8.pd
Beyond Chemical Language: A Multimodal Approach to Enhance Molecular Property Prediction
We present a novel multimodal language model approach for predicting
molecular properties by combining chemical language representation with
physicochemical features. Our approach, MULTIMODAL-MOLFORMER, utilizes a causal
multistage feature selection method that identifies physicochemical features
based on their direct causal effect on a specific target property. These causal
features are then integrated with the vector space generated by molecular
embeddings from MOLFORMER. In particular, we employ Mordred descriptors as
physicochemical features and identify the Markov blanket of the target
property, which theoretically contains the most relevant features for accurate
prediction. Our results demonstrate a superior performance of our proposed
approach compared to existing state-of-the-art algorithms, including the
chemical language-based MOLFORMER and graph neural networks, in predicting
complex tasks such as biodegradability and PFAS toxicity estimation. Moreover,
we demonstrate the effectiveness of our feature selection method in reducing
the dimensionality of the Mordred feature space while maintaining or improving
the model's performance. Our approach opens up promising avenues for future
research in molecular property prediction by harnessing the synergistic
potential of both chemical language and physicochemical features, leading to
enhanced performance and advancements in the field.Comment: 14 pages, 6 Figures, 5 tables. Submited to NEURIPS 2023, Under revie
Toward Human-AI Co-creation to Accelerate Material Discovery
There is an increasing need in our society to achieve faster advances in
Science to tackle urgent problems, such as climate changes, environmental
hazards, sustainable energy systems, pandemics, among others. In certain
domains like chemistry, scientific discovery carries the extra burden of
assessing risks of the proposed novel solutions before moving to the
experimental stage. Despite several recent advances in Machine Learning and AI
to address some of these challenges, there is still a gap in technologies to
support end-to-end discovery applications, integrating the myriad of available
technologies into a coherent, orchestrated, yet flexible discovery process.
Such applications need to handle complex knowledge management at scale,
enabling knowledge consumption and production in a timely and efficient way for
subject matter experts (SMEs). Furthermore, the discovery of novel functional
materials strongly relies on the development of exploration strategies in the
chemical space. For instance, generative models have gained attention within
the scientific community due to their ability to generate enormous volumes of
novel molecules across material domains. These models exhibit extreme
creativity that often translates in low viability of the generated candidates.
In this work, we propose a workbench framework that aims at enabling the
human-AI co-creation to reduce the time until the first discovery and the
opportunity costs involved. This framework relies on a knowledge base with
domain and process knowledge, and user-interaction components to acquire
knowledge and advise the SMEs. Currently,the framework supports four main
activities: generative modeling, dataset triage, molecule adjudication, and
risk assessment.Comment: 9 pages, 5 figures, NeurIPS 2022 WS: AI4Scienc