45 research outputs found
DocPrompt: Large-scale continue pretrain for zero-shot and few-shot document question answering
In this paper, we propose Docprompt for document question answering tasks
with powerful zero-shot and few-shot performance. We proposed a novel weakly
supervised data generation method, a novel multl-stage training method and a
novel understanding model & generation model ensemble method. Experiment
results show that the Docprompt model after continue pretrain significantly
outperforms the existing strong baseline models on document question answering
tasks. This method greatly improves the delivery efficiency and model
performance of document question answering customer projects, reducing
annotation costs and labor costs. Our demo can be found at
https://huggingface.co/spaces/PaddlePaddle/ERNIE-Layout
UniMAP: Universal SMILES-Graph Representation Learning
Molecular representation learning is fundamental for many drug related
applications. Most existing molecular pre-training models are limited in using
single molecular modality, either SMILES or graph representation. To
effectively leverage both modalities, we argue that it is critical to capture
the fine-grained 'semantics' between SMILES and graph, because subtle
sequence/graph differences may lead to contrary molecular properties. In this
paper, we propose a universal SMILE-graph representation learning model, namely
UniMAP. Firstly, an embedding layer is employed to obtain the token and
node/edge representation in SMILES and graph, respectively. A multi-layer
Transformer is then utilized to conduct deep cross-modality fusion. Specially,
four kinds of pre-training tasks are designed for UniMAP, including Multi-Level
Cross-Modality Masking (CMM), SMILES-Graph Matching (SGM), Fragment-Level
Alignment (FLA), and Domain Knowledge Learning (DKL). In this way, both global
(i.e. SGM and DKL) and local (i.e. CMM and FLA) alignments are integrated to
achieve comprehensive cross-modality fusion. We evaluate UniMAP on various
downstream tasks, i.e. molecular property prediction, drug-target affinity
prediction and drug-drug interaction. Experimental results show that UniMAP
outperforms current state-of-the-art pre-training methods.We also visualize the
learned representations to demonstrate the effect of multi-modality
integration
Protein-ligand binding representation learning from fine-grained interactions
The binding between proteins and ligands plays a crucial role in the realm of
drug discovery. Previous deep learning approaches have shown promising results
over traditional computationally intensive methods, but resulting in poor
generalization due to limited supervised data. In this paper, we propose to
learn protein-ligand binding representation in a self-supervised learning
manner. Different from existing pre-training approaches which treat proteins
and ligands individually, we emphasize to discern the intricate binding
patterns from fine-grained interactions. Specifically, this self-supervised
learning problem is formulated as a prediction of the conclusive binding
complex structure given a pocket and ligand with a Transformer based
interaction module, which naturally emulates the binding process. To ensure the
representation of rich binding information, we introduce two pre-training
tasks, i.e.~atomic pairwise distance map prediction and mask ligand
reconstruction, which comprehensively model the fine-grained interactions from
both structure and feature space. Extensive experiments have demonstrated the
superiority of our method across various binding tasks, including
protein-ligand affinity prediction, virtual screening and protein-ligand
docking
Fractional Denoising for 3D Molecular Pre-training
Coordinate denoising is a promising 3D molecular pre-training method, which
has achieved remarkable performance in various downstream drug discovery tasks.
Theoretically, the objective is equivalent to learning the force field, which
is revealed helpful for downstream tasks. Nevertheless, there are two
challenges for coordinate denoising to learn an effective force field, i.e. low
coverage samples and isotropic force field. The underlying reason is that
molecular distributions assumed by existing denoising methods fail to capture
the anisotropic characteristic of molecules. To tackle these challenges, we
propose a novel hybrid noise strategy, including noises on both dihedral angel
and coordinate. However, denoising such hybrid noise in a traditional way is no
more equivalent to learning the force field. Through theoretical deductions, we
find that the problem is caused by the dependency of the input conformation for
covariance. To this end, we propose to decouple the two types of noise and
design a novel fractional denoising method (Frad), which only denoises the
latter coordinate part. In this way, Frad enjoys both the merits of sampling
more low-energy structures and the force field equivalence. Extensive
experiments show the effectiveness of Frad in molecular representation, with a
new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of
MD17
Multimodal Molecular Pretraining via Modality Blending
Self-supervised learning has recently gained growing interest in molecular
modeling for scientific tasks such as AI-assisted drug discovery. Current
studies consider leveraging both 2D and 3D molecular structures for
representation learning. However, relying on straightforward alignment
strategies that treat each modality separately, these methods fail to exploit
the intrinsic correlation between 2D and 3D representations that reflect the
underlying structural characteristics of molecules, and only perform
coarse-grained molecule-level alignment. To derive fine-grained alignment and
promote structural molecule understanding, we introduce an atomic-relation
level "blend-then-predict" self-supervised learning approach, MoleBLEND, which
first blends atom relations represented by different modalities into one
unified relation matrix for joint encoding, then recovers modality-specific
information for 2D and 3D structures individually. By treating atom
relationships as anchors, MoleBLEND organically aligns and integrates visually
dissimilar 2D and 3D modalities of the same molecule at fine-grained atomic
level, painting a more comprehensive depiction of each molecule. Extensive
experiments show that MoleBLEND achieves state-of-the-art performance across
major 2D/3D molecular benchmarks. We further provide theoretical insights from
the perspective of mutual-information maximization, demonstrating that our
method unifies contrastive, generative (cross-modality prediction) and
mask-then-predict (single-modality prediction) objectives into one single
cohesive framework
Semi-supervised Graph Neural Networks for Pileup Noise Removal
The high instantaneous luminosity of the CERN Large Hadron Collider leads to
multiple proton-proton interactions in the same or nearby bunch crossings
(pileup). Advanced pileup mitigation algorithms are designed to remove this
noise from pileup particles and improve the performance of crucial physics
observables. This study implements a semi-supervised graph neural network for
particle-level pileup noise removal, by identifying individual particles
produced from pileup. The graph neural network is firstly trained on charged
particles with known labels, which can be obtained from detector measurements
on data or simulation, and then inferred on neutral particles for which such
labels are missing. This semi-supervised approach does not depend on the ground
truth information from simulation and thus allows us to perform training
directly on experimental data. The performance of this approach is found to be
consistently better than widely-used domain algorithms and comparable to the
fully-supervised training using simulation truth information. The study serves
as the first attempt at applying semi-supervised learning techniques to pileup
mitigation, and opens up a new direction of fully data-driven machine learning
pileup mitigation studies
Understanding How to Inform Blind and Low-Vision Users about Data Privacy through Privacy Question Answering Assistants
Understanding and managing data privacy in the digital world can be
challenging for sighted users, let alone blind and low-vision (BLV) users.
There is limited research on how BLV users, who have special accessibility
needs, navigate data privacy, and how potential privacy tools could assist
them. We conducted an in-depth qualitative study with 21 US BLV participants to
understand their data privacy risk perception and mitigation, as well as their
information behaviors related to data privacy. We also explored BLV users'
attitudes towards potential privacy question answering (Q&A) assistants that
enable them to better navigate data privacy information. We found that BLV
users face heightened security and privacy risks, but their risk mitigation is
often insufficient. They do not necessarily seek data privacy information but
clearly recognize the benefits of a potential privacy Q&A assistant. They also
expect privacy Q&A assistants to possess cross-platform compatibility, support
multi-modality, and demonstrate robust functionality. Our study sheds light on
BLV users' expectations when it comes to usability, accessibility, trust and
equity issues regarding digital data privacy.Comment: This research paper is accepted by USENIX Security '2