31 research outputs found
MolFM: A Multimodal Molecular Foundation Model
Molecular knowledge resides within three different modalities of information
sources: molecular structures, biomedical documents, and knowledge bases.
Effective incorporation of molecular knowledge from these modalities holds
paramount significance in facilitating biomedical research. However, existing
multimodal molecular foundation models exhibit limitations in capturing
intricate connections between molecular structures and texts, and more
importantly, none of them attempt to leverage a wealth of molecular expertise
derived from knowledge graphs. In this study, we introduce MolFM, a multimodal
molecular foundation model designed to facilitate joint representation learning
from molecular structures, biomedical texts, and knowledge graphs. We propose
cross-modal attention between atoms of molecular structures, neighbors of
molecule entities and semantically related texts to facilitate cross-modal
comprehension. We provide theoretical analysis that our cross-modal
pre-training captures local and global molecular knowledge by minimizing the
distance in the feature space between different modalities of the same
molecule, as well as molecules sharing similar structures or functions. MolFM
achieves state-of-the-art performance on various downstream tasks. On
cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04%
absolute gains under the zero-shot and fine-tuning settings, respectively.
Furthermore, qualitative analysis showcases MolFM's implicit ability to provide
grounding from molecular substructures and knowledge graphs. Code and models
are available on https://github.com/BioFM/OpenBioMed.Comment: 31 pages, 15 figures, and 15 table
PolyDiffuse: Polygonal Shape Reconstruction via Guided Set Diffusion Models
This paper presents PolyDiffuse, a novel structured reconstruction algorithm
that transforms visual sensor data into polygonal shapes with Diffusion Models
(DM), an emerging machinery amid exploding generative AI, while formulating
reconstruction as a generation process conditioned on sensor data. The task of
structured reconstruction poses two fundamental challenges to DM: 1) A
structured geometry is a ``set'' (e.g., a set of polygons for a floorplan
geometry), where a sample of elements has different but equivalent
representations, making the denoising highly ambiguous; and 2) A
``reconstruction'' task has a single solution, where an initial noise needs to
be chosen carefully, while any initial noise works for a generation task. Our
technical contribution is the introduction of a Guided Set Diffusion Model
where 1) the forward diffusion process learns guidance networks to control
noise injection so that one representation of a sample remains distinct from
its other permutation variants, thus resolving denoising ambiguity; and 2) the
reverse denoising process reconstructs polygonal shapes, initialized and
directed by the guidance networks, as a conditional generation process subject
to the sensor data. We have evaluated our approach for reconstructing two types
of polygonal shapes: floorplan as a set of polygons and HD map for autonomous
cars as a set of polylines. Through extensive experiments on standard
benchmarks, we demonstrate that PolyDiffuse significantly advances the current
state of the art and enables broader practical applications.Comment: Project page: https://poly-diffuse.github.io
Trading-off Mutual Information on Feature Aggregation for Face Recognition
Despite the advances in the field of Face Recognition (FR), the precision of
these methods is not yet sufficient. To improve the FR performance, this paper
proposes a technique to aggregate the outputs of two state-of-the-art (SOTA)
deep FR models, namely ArcFace and AdaFace. In our approach, we leverage the
transformer attention mechanism to exploit the relationship between different
parts of two feature maps. By doing so, we aim to enhance the overall
discriminative power of the FR system. One of the challenges in feature
aggregation is the effective modeling of both local and global dependencies.
Conventional transformers are known for their ability to capture long-range
dependencies, but they often struggle with modeling local dependencies
accurately. To address this limitation, we augment the self-attention mechanism
to capture both local and global dependencies effectively. This allows our
model to take advantage of the overlapping receptive fields present in
corresponding locations of the feature maps. However, fusing two feature maps
from different FR models might introduce redundancies to the face embedding.
Since these models often share identical backbone architectures, the resulting
feature maps may contain overlapping information, which can mislead the
training process. To overcome this problem, we leverage the principle of
Information Bottleneck to obtain a maximally informative facial representation.
This ensures that the aggregated features retain the most relevant and
discriminative information while minimizing redundant or misleading details. To
evaluate the effectiveness of our proposed method, we conducted experiments on
popular benchmarks and compared our results with state-of-the-art algorithms.
The consistent improvement we observed in these benchmarks demonstrates the
efficacy of our approach in enhancing FR performance.Comment: Accepted to 22 IEEE International Conference on Machine
Learning and Applications 2023 (ICMLA
Knowledge Graph Embedding: An Overview
Many mathematical models have been leveraged to design embeddings for
representing Knowledge Graph (KG) entities and relations for link prediction
and many downstream tasks. These mathematically-inspired models are not only
highly scalable for inference in large KGs, but also have many explainable
advantages in modeling different relation patterns that can be validated
through both formal proofs and empirical results. In this paper, we make a
comprehensive overview of the current state of research in KG completion. In
particular, we focus on two main branches of KG embedding (KGE) design: 1)
distance-based methods and 2) semantic matching-based methods. We discover the
connections between recently proposed models and present an underlying trend
that might help researchers invent novel and more effective models. Next, we
delve into CompoundE and CompoundE3D, which draw inspiration from 2D and 3D
affine operations, respectively. They encompass a broad spectrum of techniques
including distance-based and semantic-based methods. We will also discuss an
emerging approach for KG completion which leverages pre-trained language models
(PLMs) and textual descriptions of entities and relations and offer insights
into the integration of KGE embedding methods with PLMs for KG completion
Order vs. Chaos: A Language Model Approach for Side-channel Attacks
We introduce the Order vs. Chaos (OvC) classifier, a novel language-model approach for side-channel attacks combining the strengths of multitask learning (via the use of a language model), multimodal learning, and deep metric learning. Our methodology offers a viable substitute for the multitask classifiers used for learning multiple targets, as put forward by Masure et al. We highlight some well-known issues with multitask classifiers, like scalability, balancing multiple tasks, slow learning, large model sizes, and the need for complex hyperparameter tuning. Thus, we advocate language models in side-channel attacks.
We demonstrate improvements in results on different variants of ASCAD-V1 and ASCAD-V2 datasets compared to the existing state-of-the-art results. Additionally, we delve deeper with experiments on protected simulated datasets, allowing us to control noise levels and simulate specific leakage models. This exploration facilitates an understanding of the ramifications when the protective scheme\u27s masks do not leak and allows us to further compare our approach with other approaches. Furthermore, with the help of unprotected simulated datasets, we demonstrate that the OvC classifier, uninformed of the leakage model, can parallelize the proficiency of a conventional multi-class classifier that is leakage model-aware. This finding implies that our methodology sidesteps the need for predetermined a leakage model in side-channel attacks
Face Recognition using Deep Learning and TensorFlow framework
Detecting human faces and recognizing faces and facial expressions have always been an area of interest for different applications such as games, utilities and even security. With the advancement of machine learning, the techniques of detection and recognition have become more accurate and precise than ever before. However, machine learning remains a relatively complex field that could feel intimidating or inaccessible to many of us. Luckily, in the last couple of years, several organizations and open-source communities have been developing tools and libraries that help abstract the complex mathematical algorithms in order to encourage developers to easily create learning models and train them using any programming languages.
As part of this project, we will create a Face Detection framework in Python built on top of the work of several open-source projects and models with the hope to reduce the entry barrier for developers and to encourage them to focus more on developing innovative applications that make use of face detection and recognition
Tuned Contrastive Learning
In recent times, contrastive learning based loss functions have become
increasingly popular for visual self-supervised representation learning owing
to their state-of-the-art (SOTA) performance. Most of the modern contrastive
learning methods generalize only to one positive and multiple negatives per
anchor. A recent state-of-the-art, supervised contrastive (SupCon) loss,
extends self-supervised contrastive learning to supervised setting by
generalizing to multiple positives and negatives in a batch and improves upon
the cross-entropy loss. In this paper, we propose a novel contrastive loss
function -- Tuned Contrastive Learning (TCL) loss, that generalizes to multiple
positives and negatives in a batch and offers parameters to tune and improve
the gradient responses from hard positives and hard negatives. We provide
theoretical analysis of our loss function's gradient response and show
mathematically how it is better than that of SupCon loss. We empirically
compare our loss function with SupCon loss and cross-entropy loss in supervised
setting on multiple classification-task datasets to show its effectiveness. We
also show the stability of our loss function to a range of hyper-parameter
settings. Unlike SupCon loss which is only applied to supervised setting, we
show how to extend TCL to self-supervised setting and empirically compare it
with various SOTA self-supervised learning methods. Hence, we show that TCL
loss achieves performance on par with SOTA methods in both supervised and
self-supervised settings.Comment: Preprint Versio
Efficient Large-Scale Visual Representation Learning
In this article, we present our approach to single-modality visual
representation learning. Understanding visual representations of product
content is vital for recommendations, search, and advertising applications in
e-commerce. We detail and contrast techniques used to fine-tune large-scale
visual representation learning models in an efficient manner under low-resource
settings, including several pretrained backbone architectures, both in the
convolutional neural network as well as the vision transformer family. We
highlight the challenges for e-commerce applications at-scale and highlight the
efforts to more efficiently train, evaluate, and serve visual representations.
We present ablation studies evaluating the representation offline performance
for several downstream tasks, including our visually similar ad
recommendations. To this end, we present a novel text-to-image generative
offline evaluation method for visually similar recommendation systems. Finally,
we include online results from deployed machine learning systems in production
at Etsy
Manufacturing Quality Control with Autoencoder-Based Defect Localization and Unsupervised Class Selection
Manufacturing industries require efficient and voluminous production of
high-quality finished goods. In the context of Industry 4.0, visual anomaly
detection poses an optimistic solution for automatically controlling product
quality with high precision. Automation based on computer vision poses a
promising solution to prevent bottlenecks at the product quality checkpoint. We
considered recent advancements in machine learning to improve visual defect
localization, but challenges persist in obtaining a balanced feature set and
database of the wide variety of defects occurring in the production line. This
paper proposes a defect localizing autoencoder with unsupervised class
selection by clustering with k-means the features extracted from a pre-trained
VGG-16 network. The selected classes of defects are augmented with natural wild
textures to simulate artificial defects. The study demonstrates the
effectiveness of the defect localizing autoencoder with unsupervised class
selection for improving defect detection in manufacturing industries. The
proposed methodology shows promising results with precise and accurate
localization of quality defects on melamine-faced boards for the furniture
industry. Incorporating artificial defects into the training data shows
significant potential for practical implementation in real-world quality
control scenarios
Vec2Face-v2: Unveil Human Faces from their Blackbox Features via Attention-based Network in Face Recognition
In this work, we investigate the problem of face reconstruction given a
facial feature representation extracted from a blackbox face recognition
engine. Indeed, it is a very challenging problem in practice due to the
limitations of abstracted information from the engine. We, therefore, introduce
a new method named Attention-based Bijective Generative Adversarial Networks in
a Distillation framework (DAB-GAN) to synthesize the faces of a subject given
his/her extracted face recognition features. Given any unconstrained unseen
facial features of a subject, the DAB-GAN can reconstruct his/her facial images
in high definition. The DAB-GAN method includes a novel attention-based
generative structure with the newly defined Bijective Metrics Learning
approach. The framework starts by introducing a bijective metric so that the
distance measurement and metric learning process can be directly adopted in the
image domain for an image reconstruction task. The information from the
blackbox face recognition engine will be optimally exploited using the global
distillation process. Then an attention-based generator is presented for a
highly robust generator to synthesize realistic faces with ID preservation. We
have evaluated our method on the challenging face recognition databases, i.e.,
CelebA, LFW, CFP-FP, CP-LFW, AgeDB, CA-LFW, and consistently achieved
state-of-the-art results. The advancement of DAB-GAN is also proven in both
image realism and ID preservation properties.Comment: arXiv admin note: substantial text overlap with arXiv:2003.0695