518 research outputs found
Random Access to Grammar Compressed Strings
Grammar based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. In this paper, we
present a novel grammar representation that allows efficient random access to
any character or substring without decompressing the string.
Let be a string of length compressed into a context-free grammar
of size . We present two representations of
achieving random access time, and either
construction time and space on the pointer machine model, or
construction time and space on the RAM. Here, is the inverse of
the row of Ackermann's function. Our representations also efficiently
support decompression of any substring in : we can decompress any substring
of length in the same complexity as a single random access query and
additional time. Combining these results with fast algorithms for
uncompressed approximate string matching leads to several efficient algorithms
for approximate string matching on grammar-compressed strings without
decompression. For instance, we can find all approximate occurrences of a
pattern with at most errors in time , where is the number of occurrences of in . Finally, we
generalize our results to navigation and other operations on grammar-compressed
ordered trees.
All of the above bounds significantly improve the currently best known
results. To achieve these bounds, we introduce several new techniques and data
structures of independent interest, including a predecessor data structure, two
"biased" weighted ancestor data structures, and a compact representation of
heavy paths in grammars.Comment: Preliminary version in SODA 201
Web tools for large-scale 3D biological images and atlases
<p>Abstract</p> <p>Background</p> <p>Large-scale volumetric biomedical image data of three or more dimensions are a significant challenge for distributed browsing and visualisation. Many images now exceed 10GB which for most users is too large to handle in terms of computer RAM and network bandwidth. This is aggravated when users need to access tens or hundreds of such images from an archive. Here we solve the problem for 2D section views through archive data delivering compressed tiled images enabling users to browse through very-large volume data in the context of a standard web-browser. The system provides an interactive visualisation for grey-level and colour 3D images including multiple image layers and spatial-data overlay.</p> <p>Results</p> <p>The standard Internet Imaging Protocol (IIP) has been extended to enable arbitrary 2D sectioning of 3D data as well a multi-layered images and indexed overlays. The extended protocol is termed IIP3D and we have implemented a matching server to deliver the protocol and a series of Ajax/Javascript client codes that will run in an Internet browser. We have tested the server software on a low-cost linux-based server for image volumes up to 135GB and 64 simultaneous users. The section views are delivered with response times independent of scale and orientation. The exemplar client provided multi-layer image views with user-controlled colour-filtering and overlays.</p> <p>Conclusions</p> <p>Interactive browsing of arbitrary sections through large biomedical-image volumes is made possible by use of an extended internet protocol and efficient server-based image tiling. The tools open the possibility of enabling fast access to large image archives without the requirement of whole image download and client computers with very large memory configurations. The system was demonstrated using a range of medical and biomedical image data extending up to 135GB for a single image volume.</p
Learning curves of generic features maps for realistic datasets with a teacher-student model
Teacher-student models provide a framework in which the typical-case
performance of high-dimensional supervised learning can be described in closed
form. The assumptions of Gaussian i.i.d. input data underlying the canonical
teacher-student model may, however, be perceived as too restrictive to capture
the behaviour of realistic data sets. In this paper, we introduce a Gaussian
covariate generalisation of the model where the teacher and student can act on
different spaces, generated with fixed, but generic feature maps. While still
solvable in a closed form, this generalization is able to capture the learning
curves for a broad range of realistic data sets, thus redeeming the potential
of the teacher-student framework. Our contribution is then two-fold: First, we
prove a rigorous formula for the asymptotic training loss and generalisation
error. Second, we present a number of situations where the learning curve of
the model captures the one of a realistic data set learned with kernel
regression and classification, with out-of-the-box feature maps such as random
projections or scattering transforms, or with pre-learned ones - such as the
features learned by training multi-layer neural networks. We discuss both the
power and the limitations of the framework.Comment: v3: NeurIPS camera-read
Approaching phase retrieval with deep learning
Phase retrieval is the process of reconstructing images from only magnitude measurements. The problem is particularly challenging as most of the information about the image is contained in the missing phase. An important phase retrieval problem is Fourier phase retrieval, where the magnitudes of the Fourier transform are given. This problem is relevant in many areas of science, e.g., in X-ray crystallography, astronomy, microscopy, array imaging, and optics. In addition to Fourier phase retrieval, we also take a closer look at two additional phase retrieval problems: Fourier phase retrieval with a reference image and compressive Gaussian phase retrieval.
Most methods for phase retrieval, e.g., the error-reduction algorithm or Fienup's hybrid-input output algorithms are optimization-based algorithms which solely minimize an error-function to reconstruct the image. These methods usually make strong assumptions about the measured magnitudes which do not always hold in practice. Thus, they only work reliably for easy instances of the phase retrieval problems but fail drastically for difficult instances.
With the recent advances in the development of graphics processing units (GPUs), deep neural networks (DNNs) have become fashionable again and have led to breakthroughs in many research areas. In this thesis, we show how DNNs can be applied to solve the more difficult instances of phase retrieval problems when training data is available. On the one hand, we show how supervised learning can be used to greatly improve the reconstruction quality when training images and their corresponding measurements are available. We analyze the ability of these methods to generalize to out-of-distribution data. On the other hand, we take a closer look at an existing unsupervised method that relies on generative models. Unsupervised methods are agnostic toward the measurement process which is particularly useful for Gaussian phase retrieval. We apply this method to the Fourier phase retrieval problem and demonstrate how the reconstruction performance can be further improved with different initialization schemes. Furthermore, we demonstrate how optimizing intermediate representations of the underlying generative model can help overcoming the limited range of the model and, thus, can help to reach better solutions. Finally, we show how backpropagation can be used to learn reference images using a modification of the well-established error-reduction algorithm and discuss whether learning a reference image is always efficient. As it is common in machine learning research, we evaluate all methods on benchmark image datasets as it allows for easy reproducibility of the experiments and comparability to related methods. To better understand how the methods work, we perform extensive ablation experiments, and also analyze the influence of measurement noise and missing measurements
Deep synthesis of cloud lighting
Current appearance models for the sky are able to represent clear sky illumination to a high degree of accuracy. However, these models all lack a common feature of real-skies: clouds. These are an essential component for many applications which rely on realistic skies, such as image editing and synthesis. While clouds can be added to existing sky models through rendering, this is hard to achieve due to the difficulties of representing clouds and the complexities of volumetric light transport. In this work, an alternative approach to this problem is proposed whereby clouds are synthesized using a learned data-driven representation. This leverages a captured collection of High Dynamic Range cloudy sky imagery, and combines this dataset with clear sky models to produce plausible cloud appearance from a coarse representation of cloud positions. This representation is artist controllable, allowing for novel cloudscapes to be rapidly synthesized and used for lighting virtual environments
Deep Synthesis of Cloud Lighting
Current appearance models for the sky are able to represent clear sky illumination to a high degree of accuracy. However, these models all lack a common feature of real-skies: clouds. These are an essential component for many applications which rely on realistic skies, such as image editing and synthesis. While clouds can be added to existing sky models through rendering, this is hard to achieve due to the difficulties of representing clouds and the complexities of volumetric light transport. In this work, an alternative approach to this problem is proposed whereby clouds are synthesized using a learned data-driven representation. This leverages a captured collection of High Dynamic Range cloudy sky imagery, and combines this dataset with clear sky models to produce plausible cloud appearance from a coarse representation of cloud positions. This representation is artist controllable, allowing for novel cloud scapes to be rapidly synthesized, and used for lighting virtual environments
- …