1,108 research outputs found
Self-supervised learning for transferable representations
Machine learning has undeniably achieved remarkable advances thanks to large labelled datasets and supervised learning. However, this progress is constrained by the labour-intensive annotation process. It is not feasible to generate extensive labelled datasets for every problem we aim to address. Consequently, there has been a notable shift in recent times toward approaches that solely leverage raw data. Among these, self-supervised learning has emerged as a particularly powerful approach, offering scalability to massive datasets and showcasing considerable potential for effective knowledge transfer. This thesis investigates self-supervised representation learning with a strong focus on computer vision applications. We provide a comprehensive survey of self-supervised methods across various modalities, introducing a taxonomy that categorises them into four distinct families while also highlighting practical considerations for real-world implementation. Our focus thenceforth is on the computer vision modality, where we perform a comprehensive benchmark evaluation of state-of-the-art self supervised models against many diverse downstream transfer tasks. Our findings reveal that self-supervised models often outperform supervised learning across a spectrum of tasks, albeit with correlations weakening as tasks transition beyond classification, particularly for datasets with distribution shifts. Digging deeper, we investigate the influence of data augmentation on the transferability of contrastive learners, uncovering a trade-off between spatial and appearance-based invariances that generalise to real-world transformations. This begins to explain the differing empirical performances achieved by self-supervised learners on different downstream tasks, and it showcases the advantages of specialised representations produced with tailored augmentation. Finally, we introduce a novel self-supervised pre-training algorithm for object detection, aligning pre-training with downstream architecture and objectives, leading to reduced localisation errors and improved label efficiency. In conclusion, this thesis contributes a comprehensive understanding of self-supervised representation learning and its role in enabling effective transfer across computer vision tasks
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Novel Methods for Natural Language Modeling and Pretraining
This thesis is about modeling language sequences to achieve lower perplexity, better generation, and benefit downstream language tasks; specifically, this thesis addresses the importance of natural language features including the segmentation feature, lexical feature, and alignment feature. We present three new techniques that improve language sequence modeling with different language features.
1. Segment-Aware Language Modeling is a novel model architecture leveraging the text segementation feature for text sequence modeling. It encodes richer positional information for language modeling, by replacing the original token position encoding with a combined position encoding of paragraph, sentence, and token. By applying our approach to Transformer-XL, we train a new language model, Segatron-XL, that achieves a 6.6-7.8% relative reduction in perplexity. Additionally, BERT pretrained with our method -- SegaBERT -- outperforms BERT on general language understanding, sentence representation learning, and machine reading comprehension tasks. Furthermore, our SegaBERT-large model outperforms RoBERTa-large on zero-shot STS tasks. These experimental results demonstrate that our proposed Segatron works on both language models with relative position embeddings and pretrained language models with absolute position embeddings.
2. Hypernym-Instructed Language Modeling is a novel training method leveraging the lexical feature for rare word modeling. It maps words that have a common WordNet hypernym to the same class and trains large neural LMs by gradually annealing from predicting the class to token prediction during training. Class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. Empirically, this curriculum learning strategy consistently reduces perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and ArXiv. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words.
3. Alignment-Aware Acoustic and Text Modeling is a novel pretraining method leveraging both the segmentation and alignment features for text-speech sequence modeling. It reconstructs masked acoustic signals with text input and
acoustic-text alignment during training. In this way, the pretrained model can generate high quality of reconstructed spectrogram, which can be applied to the speech editing and new speaker TTS directly. Experiments show A3T outperforms SOTA models on speech editing and improves multi-speaker speech synthesis without the external speaker verification model
Comparing Storm Resolving Models and Climates via Unsupervised Machine Learning
Storm-resolving models (SRMs) have gained widespread interest because of the
unprecedented detail with which they resolve the global climate. However, it
remains difficult to quantify objective differences in how SRMs resolve complex
atmospheric formations. This lack of appropriate tools for comparing model
similarities is a problem in many disparate fields that involve simulation
tools for complex data. To address this challenge we develop methods to
estimate distributional distances based on both nonlinear dimensionality
reduction and vector quantization. Our approach automatically learns
appropriate notions of similarity from low-dimensional latent data
representations that the different models produce. This enables an
intercomparison of nine SRMs based on their high-dimensional simulation data
and reveals that only six are similar in their representation of atmospheric
dynamics. Furthermore, we uncover signatures of the convective response to
global warming in a fully unsupervised way. Our study provides a path toward
evaluating future high-resolution simulation data more objectively.Comment: 22 pages, 19 figures. Submitted to journal for consideratio
Semantic Communications with Variable-Length Coding for Extended Reality
Wireless extended reality (XR) has attracted wide attentions as a promising
technology to improve users' mobility and quality of experience. However, the
ultra-high data rate requirement of wireless XR has hindered its development
for many years. To overcome this challenge, we develop a semantic communication
framework, where semantically-unimportant information is highly-compressed or
discarded in semantic coders, significantly improving the transmission
efficiency. Besides, considering the fact that some source content may have
less amount of semantic information or have higher tolerance to channel noise,
we propose a universal variable-length semantic-channel coding method. In
particular, we first use a rate allocation network to estimate the best code
length for semantic information and then adjust the coding process accordingly.
By adopting some proxy functions, the whole framework is trained in an
end-to-end manner. Numerical results show that our semantic system
significantly outperforms traditional transmission methods and the proposed
variable-length coding scheme is superior to the fixed-length coding methods.Comment: 1. Update the performance of VL-SCC in Fig8. under new rate
allocation architecture 2. Give a fair comparison between VL-SCC and SCC in
Fig9. 3. fix the typo of LDPC rate (1/3 changed to 2/3) 4. Reduce L=32 to 16,
and update the bp
Proceedings of the 8th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2023)
This volume gathers the papers presented at the Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), Tampere, Finland, during 21–22 September 2023
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Efficiency and Sustainability of the Distributed Renewable Hybrid Power Systems Based on the Energy Internet, Blockchain Technology and Smart Contracts-Volume II
The climate changes that are becoming visible today are a challenge for the global research community. In this context, renewable energy sources, fuel cell systems, and other energy generating sources must be optimally combined and connected to the grid system using advanced energy transaction methods. As this reprint presents the latest solutions in the implementation of fuel cell and renewable energy in mobile and stationary applications, such as hybrid and microgrid power systems based on the Energy Internet, Blockchain technology, and smart contracts, we hope that they will be of interest to readers working in the related fields mentioned above
- …