85 research outputs found
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation
The goal of sequential recommendation (SR) is to predict a user's potential
interested items based on her/his historical interaction sequences. Most
existing sequential recommenders are developed based on ID features, which,
despite their widespread use, often underperform with sparse IDs and struggle
with the cold-start problem. Besides, inconsistent ID mappings hinder the
model's transferability, isolating similar recommendation domains that could
have been co-optimized. This paper aims to address these issues by exploring
the potential of multi-modal information in learning robust and generalizable
sequence representations. We propose MISSRec, a multi-modal pre-training and
transfer learning framework for SR. On the user side, we design a
Transformer-based encoder-decoder model, where the contextual encoder learns to
capture the sequence-level multi-modal synergy while a novel interest-aware
decoder is developed to grasp item-modality-interest relations for better
sequence representation. On the candidate item side, we adopt a dynamic fusion
module to produce user-adaptive item representation, providing more precise
matching between users and items. We pre-train the model with contrastive
learning objectives and fine-tune it in an efficient manner. Extensive
experiments demonstrate the effectiveness and flexibility of MISSRec, promising
an practical solution for real-world recommendation scenarios.Comment: Accepted to ACM MM 202
Deep learning in food category recognition
Integrating artificial intelligence with food category recognition has been a field of interest for research for the
past few decades. It is potentially one of the next steps in revolutionizing human interaction with food. The
modern advent of big data and the development of data-oriented fields like deep learning have provided advancements
in food category recognition. With increasing computational power and ever-larger food datasets,
the approach’s potential has yet to be realized. This survey provides an overview of methods that can be applied
to various food category recognition tasks, including detecting type, ingredients, quality, and quantity. We
survey the core components for constructing a machine learning system for food category recognition, including
datasets, data augmentation, hand-crafted feature extraction, and machine learning algorithms. We place a
particular focus on the field of deep learning, including the utilization of convolutional neural networks, transfer
learning, and semi-supervised learning. We provide an overview of relevant studies to promote further developments
in food category recognition for research and industrial applicationsMRC (MC_PC_17171)Royal Society (RP202G0230)BHF (AA/18/3/34220)Hope Foundation for Cancer Research (RM60G0680)GCRF (P202PF11)Sino-UK Industrial
Fund (RP202G0289)LIAS (P202ED10Data Science
Enhancement Fund (P202RE237)Fight for Sight (24NN201);Sino-UK
Education Fund (OP202006)BBSRC (RM32G0178B8
SPA: A Graph Spectral Alignment Perspective for Domain Adaptation
Unsupervised domain adaptation (UDA) is a pivotal form in machine learning to
extend the in-domain model to the distinctive target domains where the data
distributions differ. Most prior works focus on capturing the inter-domain
transferability but largely overlook rich intra-domain structures, which
empirically results in even worse discriminability. In this work, we introduce
a novel graph SPectral Alignment (SPA) framework to tackle the tradeoff. The
core of our method is briefly condensed as follows: (i)-by casting the DA
problem to graph primitives, SPA composes a coarse graph alignment mechanism
with a novel spectral regularizer towards aligning the domain graphs in
eigenspaces; (ii)-we further develop a fine-grained message propagation module
-- upon a novel neighbor-aware self-training mechanism -- in order for enhanced
discriminability in the target domain. On standardized benchmarks, the
extensive experiments of SPA demonstrate that its performance has surpassed the
existing cutting-edge DA methods. Coupled with dense model analysis, we
conclude that our approach indeed possesses superior efficacy, robustness,
discriminability, and transferability. Code and data are available at:
https://github.com/CrownX/SPA.Comment: NeurIPS 2023 camera read
Efficient Representation Learning With Graph Neural Networks
Graph neural networks (GNNs) have emerged as the dominant paradigm for graph representation learning, igniting widespread interest in utilizing sophisticated GNNs for diverse computer vision tasks in various domains, including visual SLAM, 3D object recognition and segmentation, as well as visual perception with event cameras. However, the applications of these GNNs often rely on cumbersome GNN architectures for favorable performance, posing challenges for real-time interaction, particularly in edge computing scenarios. This is particularly relevant in cases such as autonomous driving, where timely responses are crucial for handling complex traffic conditions. The objective of this thesis is to contribute to the advancement of learning efficient representations using lightweight GNNs, enabling their effective deployment in resource-constrained environments. To achieve this goal, the thesis explores various efficient learning schemes, focusing on four key aspects: the data side, the model side, the data-model side, and the application side. In terms of data-driven efficient learning, the thesis proposes an adaptive data modification scheme that allows a pre-trained model to be repurposed for multiple designated downstream tasks in a resource-efficient manner, without the need for re-training or fine-tuning. For model-centric efficiency, the thesis introduces a multi-talented and lightweight architecture, without accessing human annotations, that can integrate the expertise of the pre-trained complex GNNs specializing in different tasks. Furthermore, the thesis explores a dedicated binarization scheme on the data-model side that converts both input data and model parameters into 1-bit representations, resulting in lightweight 1-bit architectures. Finally, the thesis investigates an application-specific efficient learning scheme that models the style transfer process as message passing in GNNs, enabling efficient semi-parametric stylization
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Data Optimization in Deep Learning: A Survey
Large-scale, high-quality data are considered an essential factor for the
successful application of many deep learning techniques. Meanwhile, numerous
real-world deep learning tasks still have to contend with the lack of
sufficient amounts of high-quality data. Additionally, issues such as model
robustness, fairness, and trustworthiness are also closely related to training
data. Consequently, a huge number of studies in the existing literature have
focused on the data aspect in deep learning tasks. Some typical data
optimization techniques include data augmentation, logit perturbation, sample
weighting, and data condensation. These techniques usually come from different
deep learning divisions and their theoretical inspirations or heuristic
motivations may seem unrelated to each other. This study aims to organize a
wide range of existing data optimization methodologies for deep learning from
the previous literature, and makes the effort to construct a comprehensive
taxonomy for them. The constructed taxonomy considers the diversity of split
dimensions, and deep sub-taxonomies are constructed for each dimension. On the
basis of the taxonomy, connections among the extensive data optimization
methods for deep learning are built in terms of four aspects. We probe into
rendering several promising and interesting future directions. The constructed
taxonomy and the revealed connections will enlighten the better understanding
of existing methods and the design of novel data optimization techniques.
Furthermore, our aspiration for this survey is to promote data optimization as
an independent subdivision of deep learning. A curated, up-to-date list of
resources related to data optimization in deep learning is available at
\url{https://github.com/YaoRujing/Data-Optimization}
Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges
Machine Learning algorithms have had a profound impact on the field of
computer science over the past few decades. These algorithms performance is
greatly influenced by the representations that are derived from the data in the
learning process. The representations learned in a successful learning process
should be concise, discrete, meaningful, and able to be applied across a
variety of tasks. A recent effort has been directed toward developing Deep
Learning models, which have proven to be particularly effective at capturing
high-dimensional, non-linear, and multi-modal characteristics. In this work, we
discuss the principles and developments that have been made in the process of
learning representations, and converting them into desirable applications. In
addition, for each framework or model, the key issues and open challenges, as
well as the advantages, are examined
Multimodal Adversarial Learning
Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art
Stinging the Predators: A collection of papers that should never have been published
This ebook collects academic papers and conference abstracts that were meant to be so terrible that nobody in their right mind would publish them. All were submitted to journals and conferences to expose weak or non-existent peer review and other exploitative practices. Each paper has a brief introduction. Short essays round out the collection
- …