128 research outputs found
High-Performance Compression of Multibeam Echosounders Water Column Data
Over the last few decades, multibeam echosounders (MBES) have become the dominant technique to efficiently and accurately map the seafloor. They now allow to collect water column acoustic images along with the bathymetry, which is providing a wealth of new possibilities in oceans exploration. However, water column imagery generates vast amounts of data that poses obvious logistic, economic, and technical challenges. Surprisingly, very few studies have addressed this problem by providing efficient lossless or lossy data compression solutions. Currently, the available options are only lossless, providing low compression ratios at low speeds. In this paper, we adapt a data compression algorithm, the Fully Adaptive Prediction Error Coder (FAPEC), which was created to offer outstanding performance under the strong requirements of space data transmission. We have added to this entropy coder a specific pre-processing stage tailored to theKongsbergMaritime water column file formats. Here, we test it on data acquired with Kongsberg MBES models EM302, EM710, andEM2040.With this bespoke pre-processing, FAPEC provides good lossless compression ratios at high speeds, whereas lossy ratios reach water column file sizes even smaller than bathymetry raw files still with good image quality. We show the advantages over other lossless compression solutions, both in terms of compression ratios and speed.We illustrate the quality of water column images after lossy FAPEC compression, as well as its resilience to datagram errors and its potential for automatic detection of water column targets. We also show the successful integration in ARM microprocessors (like those used by smartphones and also by autonomous underwater vehicles), which provides a real-time solution for MBES water column data compression
Parallel Construction of Wavelet Trees on Multicore Architectures
The wavelet tree has become a very useful data structure to efficiently
represent and query large volumes of data in many different domains, from
bioinformatics to geographic information systems. One problem with wavelet
trees is their construction time. In this paper, we introduce two algorithms
that reduce the time complexity of a wavelet tree's construction by taking
advantage of nowadays ubiquitous multicore machines.
Our first algorithm constructs all the levels of the wavelet in parallel in
time and bits of working space, where
is the size of the input sequence and is the size of the alphabet. Our
second algorithm constructs the wavelet tree in a domain-decomposition fashion,
using our first algorithm in each segment, reaching time and
bits of extra space, where is the
number of available cores. Both algorithms are practical and report good
speedup for large real datasets.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sk{\l}odowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
On the design of architecture-aware algorithms for emerging applications
This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology.
We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot
ReCoDe: A Data Reduction and Compression Description for High Throughput Time-Resolved Electron Microscopy
Fast, direct electron detectors have significantly improved the
spatio-temporal resolution of electron microscopy movies. Preserving both
spatial and temporal resolution in extended observations, however, requires
storing prohibitively large amounts of data. Here, we describe an efficient and
flexible data reduction and compression scheme (ReCoDe) that retains both
spatial and temporal resolution by preserving individual electron events.
Running ReCoDe on a workstation we demonstrate on-the-fly reduction and
compression of raw data streaming off a detector at 3 GB/s, for hours of
uninterrupted data collection. The output was 100-fold smaller than the raw
data and saved directly onto network-attached storage drives over a 10 GbE
connection. We discuss calibration techniques that support electron detection
and counting (e.g. estimate electron backscattering rates, false positive
rates, and data compressibility), and novel data analysis methods enabled by
ReCoDe (e.g. recalibration of data post acquisition, and accurate estimation of
coincidence loss).Comment: 53 pages, 20 figure
Approachable Error Bounded Lossy Compression
Compression is commonly used in HPC applications to move and store data. Traditional lossless compression, however, does not provide adequate compression of floating point data often found in scientific codes. Recently, researchers and scientists have turned to lossy compression techniques that approximate the original data rather than reproduce it in order to achieve desired levels of compression. Typical lossy compressors do not bound the errors introduced into the data, leading to the development of error bounded lossy compressors (EBLC). These tools provide the desired levels of compression as mathematical guarantees on the errors introduced. However, the current state of EBLC leaves much to be desired. The existing EBLC all have different interfaces requiring codes to be changed to adopt new techniques; EBLC have many more configuration options than their predecessors, making them more difficult to use; and EBLC typically bound quantities like point wise errors rather than higher level metrics such as spectra, p-values, or test statistics that scientists typically use. My dissertation aims to provide a uniform interface to compression and to develop tools to allow application scientists to understand and apply EBLC. This dissertation proposal presents three groups of work: LibPressio, a standard interface for compression and analysis; FRaZ/LibPressio-Opt frameworks for the automated configuration of compressors using LibPressio; and work on tools for analyzing errors in particular domains
Improving Middleware Performance with AdOC: an Adaptive Online Compression Library for Data Transfer
http://csdl2.computer.org/In this article, we present the AdOC (Adaptive Online Compression) library. It is a user-level set of functions that enables data transmission with compression. The compression is performed dynamically during the transmission and the compression level is constantly adapted according to the environment. In order to ease the integration of AdOC into existing software the API is very close to the read and write UNIX system calls and respects their semantic. Moreover this library is thread-safe and is ported to many UNIXlike systems. We have tested AdOC under various conditions and with various data types. Results show that the library outperforms the POSIX read/write system calls on a broad range of networks (up to 100 Mbit LAN), whereas on Gbit Ethernet, it provides similar performance
Interactive Visualization of the Largest Radioastronomy Cubes
3D visualization is an important data analysis and knowledge discovery tool,
however, interactive visualization of large 3D astronomical datasets poses a
challenge for many existing data visualization packages. We present a solution
to interactively visualize larger-than-memory 3D astronomical data cubes by
utilizing a heterogeneous cluster of CPUs and GPUs. The system partitions the
data volume into smaller sub-volumes that are distributed over the rendering
workstations. A GPU-based ray casting volume rendering is performed to generate
images for each sub-volume, which are composited to generate the whole volume
output, and returned to the user. Datasets including the HI Parkes All Sky
Survey (HIPASS - 12 GB) southern sky and the Galactic All Sky Survey (GASS - 26
GB) data cubes were used to demonstrate our framework's performance. The
framework can render the GASS data cube with a maximum render time < 0.3 second
with 1024 x 1024 pixels output resolution using 3 rendering workstations and 8
GPUs. Our framework will scale to visualize larger datasets, even of Terabyte
order, if proper hardware infrastructure is available.Comment: 15 pages, 12 figures, Accepted New Astronomy July 201
Techniques of design optimisation for algorithms implemented in software
The overarching objective of this thesis was to develop tools for parallelising, optimising,
and implementing algorithms on parallel architectures, in particular General Purpose
Graphics Processors (GPGPUs). Two projects were chosen from different application areas
in which GPGPUs are used: a defence application involving image compression, and a
modelling application in bioinformatics (computational immunology). Each project had its
own specific objectives, as well as supporting the overall research goal.
The defence / image compression project was carried out in collaboration with the Jet
Propulsion Laboratories. The specific questions were: to what extent an algorithm designed
for bit-serial for the lossless compression of hyperspectral images on-board unmanned
vehicles (UAVs) in hardware could be parallelised, whether GPGPUs could be used to
implement that algorithm, and whether a software implementation with or without GPGPU
acceleration could match the throughput of a dedicated hardware (FPGA) implementation.
The dependencies within the algorithm were analysed, and the algorithm parallelised. The
algorithm was implemented in software for GPGPU, and optimised. During the optimisation
process, profiling revealed less than optimal device utilisation, but no further optimisations
resulted in an improvement in speed. The design had hit a local-maximum of performance.
Analysis of the arithmetic intensity and data-flow exposed flaws in the standard optimisation
metric of kernel occupancy used for GPU optimisation. Redesigning the implementation
with revised criteria (fused kernels, lower occupancy, and greater data locality) led to a new
implementation with 10x higher throughput. GPGPUs were shown to be viable for on-board
implementation of the CCSDS lossless hyperspectral image compression algorithm,
exceeding the performance of the hardware reference implementation, and providing
sufficient throughput for the next generation of image sensor as well.
The second project was carried out in collaboration with biologists at the University of
Arizona and involved modelling a complex biological system – VDJ recombination involved
in the formation of T-cell receptors (TCRs). Generation of immune receptors (T cell receptor
and antibodies) by VDJ recombination is an enormously complex process, which can
theoretically synthesize greater than 1018 variants. Originally thought to be a random
process, the underlying mechanisms clearly have a non-random nature that preferentially
creates a small subset of immune receptors in many individuals. Understanding this bias is a
longstanding problem in the field of immunology. Modelling the process of VDJ
recombination to determine the number of ways each immune receptor can be synthesized,
previously thought to be untenable, is a key first step in determining how this special
population is made. The computational tools developed in this thesis have allowed
immunologists for the first time to comprehensively test and invalidate a longstanding theory
(convergent recombination) for how this special population is created, while generating the
data needed to develop novel hypothesis
- …