Search CORE

161 research outputs found

Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning

Author: Kelleher John D.
Lindh Annika
Mahalunkar Abhijit
Ross Robert
Salton Giancarlo
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2018
Field of study

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online: https://github.com/AnnikaLindh/Diverse_and_Specific_Image_Captionin

Arrow@TUDublin

Augmenting Image Classifiers Using Data Augmentation Generative Adversarial Networks

Author: B Rosén
BM Lake
D Silver
G Hinton
H Shimodaira
J Salamon
K He
M Sugiyama
SJ Nowlan
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/09/2018
Field of study

Crossref

Edinburgh Research Explorer

From Pointwise to Powerhouse: Initialising Neural Networks with Generative Models

Author: Fuchs Moritz
Harder Christian
Mukhopadhyay Anirban
Tolkach Yuri
Publication venue
Publication date: 25/10/2023
Field of study

Traditional initialisation methods, e.g. He and Xavier, have been effective in avoiding the problem of vanishing or exploding gradients in neural networks. However, they only use simple pointwise distributions, which model one-dimensional variables. Moreover, they ignore most information about the architecture and disregard past training experiences. These limitations can be overcome by employing generative models for initialisation. In this paper, we introduce two groups of new initialisation methods. First, we locally initialise weight groups by employing variational autoencoders. Secondly, we globally initialise full weight sets by employing graph hypernetworks. We thoroughly evaluate the impact of the employed generative models on state-of-the-art neural networks in terms of accuracy, convergence speed and ensembling. Our results show that global initialisations result in higher accuracy and faster initial convergence speed. However, the implementation through graph hypernetworks leads to diminished ensemble performance on out of distribution data. To counteract, we propose a modification called noise graph hypernetwork, which encourages diversity in the produced ensemble members. Furthermore, our approach might be able to transfer learned knowledge to different image distributions. Our work provides insights into the potential, the trade-offs and possible modifications of these new initialisation methods

arXiv.org e-Print Archive

Representation Learning with Fine-grained Patterns

Author: Hu Juhua
Jin Rong
Li Hao
Qian Qi
Xu Yuanhong
Publication venue
Publication date: 19/05/2020
Field of study

With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most of existing algorithms on benchmark data sets. Many efforts have been devoted to studying the mechanism of deep learning. One important observation is that deep learning can learn the discriminative patterns from raw materials directly in a task-dependent manner. Therefore, the representations obtained by deep learning outperform hand-crafted features significantly. However, those patterns are often learned from super-class labels due to a limited availability of fine-grained labels, while fine-grained patterns are desired in many real-world applications such as visual search in online shopping. To mitigate the challenge, we propose an algorithm to learn the fine-grained patterns sufficiently when only super-class labels are available. The effectiveness of our method can be guaranteed with the theoretical analysis. Extensive experiments on real-world data sets demonstrate that the proposed method can significantly improve the performance on target tasks corresponding to fine-grained classes, when only super-class information is available for training

arXiv.org e-Print Archive

University of Washington: UW Tacoma Digital Commons

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

Author: Fu Ruibo
Tao Jianhua
Wang Chenglong
Xu Le
Yi Jiangyan
Zhang Xiaohui
Publication venue
Publication date: 08/06/2023
Field of study

Self-supervised speech models are a rapidly developing research topic in fake audio detection. Many pre-trained models can serve as feature extractors, learning richer and higher-level speech features. However,when fine-tuning pre-trained models, there is often a challenge of excessively long training times and high memory consumption, and complete fine-tuning is also very expensive. To alleviate this problem, we apply low-rank adaptation(LoRA) to the wav2vec2 model, freezing the pre-trained model weights and injecting a trainable rank-decomposition matrix into each layer of the transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared with fine-tuning with Adam on the wav2vec2 model containing 317M training parameters, LoRA achieved similar performance by reducing the number of trainable parameters by 198 times.Comment: 6page

arXiv.org e-Print Archive

A fast multi-object tracking system using an object detector ensemble

Author: Abad Andres G.
Cobos Richard
Hernandez Jefferson
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/08/2019
Field of study

Multiple-Object Tracking (MOT) is of crucial importance for applications such as retail video analytics and video surveillance. Object detectors are often the computational bottleneck of modern MOT systems, limiting their use for real-time applications. In this paper, we address this issue by leveraging on an ensemble of detectors, each running every f frames. We measured the performance of our system in the MOT16 benchmark. The proposed model surpassed other online entries of the MOT16 challenge in speed, while maintaining an acceptable accuracy.Comment: 5 pages, 4 figures, 1 table, published in 2019 IEEE Colombian Conference on Applications in Computational Intelligence (ColCACI

arXiv.org e-Print Archive

Crossref

Visual In-Context Learning for Few-Shot Eczema Segmentation

Author: Aran Oya
Kumar Neelesh
Vasudevan Venugopal
Publication venue
Publication date: 28/09/2023
Field of study

Automated diagnosis of eczema from digital camera images is crucial for developing applications that allow patients to self-monitor their recovery. An important component of this is the segmentation of eczema region from such images. Current methods for eczema segmentation rely on deep neural networks such as convolutional (CNN)-based U-Net or transformer-based Swin U-Net. While effective, these methods require high volume of annotated data, which can be difficult to obtain. Here, we investigate the capabilities of visual in-context learning that can perform few-shot eczema segmentation with just a handful of examples and without any need for retraining models. Specifically, we propose a strategy for applying in-context learning for eczema segmentation with a generalist vision model called SegGPT. When benchmarked on a dataset of annotated eczema images, we show that SegGPT with just 2 representative example images from the training dataset performs better (mIoU: 36.69) than a CNN U-Net trained on 428 images (mIoU: 32.60). We also discover that using more number of examples for SegGPT may in fact be harmful to its performance. Our result highlights the importance of visual in-context learning in developing faster and better solutions to skin imaging tasks. Our result also paves the way for developing inclusive solutions that can cater to minorities in the demographics who are typically heavily under-represented in the training data

arXiv.org e-Print Archive