66 research outputs found
Improving novelty detection using the reconstructions of nearest neighbours
We show that using nearest neighbours in the latent space of autoencoders
(AE) significantly improves performance of semi-supervised novelty detection in
both single and multi-class contexts. Autoencoding methods detect novelty by
learning to differentiate between the non-novel training class(es) and all
other unseen classes. Our method harnesses a combination of the reconstructions
of the nearest neighbours and the latent-neighbour distances of a given input's
latent representation. We demonstrate that our nearest-latent-neighbours (NLN)
algorithm is memory and time efficient, does not require significant data
augmentation, nor is reliant on pre-trained networks. Furthermore, we show that
the NLN-algorithm is easily applicable to multiple datasets without
modification. Additionally, the proposed algorithm is agnostic to autoencoder
architecture and reconstruction error method. We validate our method across
several standard datasets for a variety of different autoencoding architectures
such as vanilla, adversarial and variational autoencoders using either
reconstruction, residual or feature consistent losses. The results show that
the NLN algorithm grants up to a 17% increase in Area Under the Receiver
Operating Characteristics (AUROC) curve performance for the multi-class case
and 8% for single-class novelty detection
Learning to detect radio frequency interference in radio astronomy without seeing it
Radio Frequency Interference (RFI) corrupts astronomical measurements, thus
affecting the performance of radio telescopes. To address this problem,
supervised segmentation models have been proposed as candidate solutions to RFI
detection. However, the unavailability of large labelled datasets, due to the
prohibitive cost of annotating, makes these solutions unusable. To solve these
shortcomings, we focus on the inverse problem; training models on only
uncontaminated emissions thereby learning to discriminate RFI from all known
astronomical signals and system noise. We use Nearest-Latent-Neighbours (NLN) -
an algorithm that utilises both the reconstructions and latent distances to the
nearest-neighbours in the latent space of generative autoencoding models for
novelty detection. The uncontaminated regions are selected using weak-labels in
the form of RFI flags (generated by classical RFI flagging methods) available
from most radio astronomical data archives at no additional cost. We evaluate
performance on two independent datasets, one simulated from the HERA telescope
and another consisting of real observations from LOFAR telescope. Additionally,
we provide a small expert-labelled LOFAR dataset (i.e., strong labels) for
evaluation of our and other methods. Performance is measured using AUROC, AUPRC
and the maximum F1-score for a fixed threshold. For the simulated data we
outperform the current state-of-the-art by approximately 1% in AUROC and 3% in
AUPRC for the HERA dataset. Furthermore, our algorithm offers both a 4%
increase in AUROC and AUPRC at a cost of a degradation in F1-score performance
for the LOFAR dataset, without any manual labelling
Lightning: Scaling the GPU Programming Model Beyond a Single GPU
The GPU programming model is primarily aimed at the development of applications that run one GPU. However, this limits the scalability of GPU code to the capabilities of a single GPU in terms of compute power and memory capacity. To scale GPU applications further, a great engineering effort is typically required: work and data must be divided over multiple GPUs by hand, possibly in multiple nodes, and data must be manually spilled from GPU memory to higher-level memories. We present Lightning: a framework that follows the common GPU programming paradigm but enables scaling to large problems with ease. Lightning supports multi-GPU execution of GPU kernels, even across multiple nodes, and seamlessly spills data to higher-level memories (main memory and disk). Existing CUDA kernels can easily be adapted for use in Lightning, with data access annotations on these kernels allowing Lightning to infer their data requirements and the dependencies between subsequent kernel launches. Lightning efficiently distributes the work/data across GPUs and maximizes efficiency by overlapping scheduling, data movement, and kernel execution when possible. We present the design and implementation of Lightning, as well as experimental results on up to 32 GPUs for eight benchmarks and one real-world application. Evaluation shows excellent performance and scalability, such as a speedup of 57.2 x over the CPU using Lighting with 16 GPUs over 4 nodes and 80 GB of data, far beyond the memory capacity of one GPU. </p
The distributed ASCI supercomputer project
The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project
- …