518 research outputs found

    Random Access to Grammar Compressed Strings

    Full text link
    Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let SS be a string of length NN compressed into a context-free grammar S\mathcal{S} of size nn. We present two representations of S\mathcal{S} achieving O(logN)O(\log N) random access time, and either O(nαk(n))O(n\cdot \alpha_k(n)) construction time and space on the pointer machine model, or O(n)O(n) construction time and space on the RAM. Here, αk(n)\alpha_k(n) is the inverse of the kthk^{th} row of Ackermann's function. Our representations also efficiently support decompression of any substring in SS: we can decompress any substring of length mm in the same complexity as a single random access query and additional O(m)O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern PP with at most kk errors in time O(n(min{Pk,k4+P}+logN)+occ)O(n(\min\{|P|k, k^4 + |P|\} + \log N) + occ), where occocc is the number of occurrences of PP in SS. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.Comment: Preliminary version in SODA 201

    Web tools for large-scale 3D biological images and atlases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large-scale volumetric biomedical image data of three or more dimensions are a significant challenge for distributed browsing and visualisation. Many images now exceed 10GB which for most users is too large to handle in terms of computer RAM and network bandwidth. This is aggravated when users need to access tens or hundreds of such images from an archive. Here we solve the problem for 2D section views through archive data delivering compressed tiled images enabling users to browse through very-large volume data in the context of a standard web-browser. The system provides an interactive visualisation for grey-level and colour 3D images including multiple image layers and spatial-data overlay.</p> <p>Results</p> <p>The standard Internet Imaging Protocol (IIP) has been extended to enable arbitrary 2D sectioning of 3D data as well a multi-layered images and indexed overlays. The extended protocol is termed IIP3D and we have implemented a matching server to deliver the protocol and a series of Ajax/Javascript client codes that will run in an Internet browser. We have tested the server software on a low-cost linux-based server for image volumes up to 135GB and 64 simultaneous users. The section views are delivered with response times independent of scale and orientation. The exemplar client provided multi-layer image views with user-controlled colour-filtering and overlays.</p> <p>Conclusions</p> <p>Interactive browsing of arbitrary sections through large biomedical-image volumes is made possible by use of an extended internet protocol and efficient server-based image tiling. The tools open the possibility of enabling fast access to large image archives without the requirement of whole image download and client computers with very large memory configurations. The system was demonstrated using a range of medical and biomedical image data extending up to 135GB for a single image volume.</p

    Learning curves of generic features maps for realistic datasets with a teacher-student model

    Get PDF
    Teacher-student models provide a framework in which the typical-case performance of high-dimensional supervised learning can be described in closed form. The assumptions of Gaussian i.i.d. input data underlying the canonical teacher-student model may, however, be perceived as too restrictive to capture the behaviour of realistic data sets. In this paper, we introduce a Gaussian covariate generalisation of the model where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. While still solvable in a closed form, this generalization is able to capture the learning curves for a broad range of realistic data sets, thus redeeming the potential of the teacher-student framework. Our contribution is then two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error. Second, we present a number of situations where the learning curve of the model captures the one of a realistic data set learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the framework.Comment: v3: NeurIPS camera-read

    Approaching phase retrieval with deep learning

    Get PDF
    Phase retrieval is the process of reconstructing images from only magnitude measurements. The problem is particularly challenging as most of the information about the image is contained in the missing phase. An important phase retrieval problem is Fourier phase retrieval, where the magnitudes of the Fourier transform are given. This problem is relevant in many areas of science, e.g., in X-ray crystallography, astronomy, microscopy, array imaging, and optics. In addition to Fourier phase retrieval, we also take a closer look at two additional phase retrieval problems: Fourier phase retrieval with a reference image and compressive Gaussian phase retrieval. Most methods for phase retrieval, e.g., the error-reduction algorithm or Fienup's hybrid-input output algorithms are optimization-based algorithms which solely minimize an error-function to reconstruct the image. These methods usually make strong assumptions about the measured magnitudes which do not always hold in practice. Thus, they only work reliably for easy instances of the phase retrieval problems but fail drastically for difficult instances. With the recent advances in the development of graphics processing units (GPUs), deep neural networks (DNNs) have become fashionable again and have led to breakthroughs in many research areas. In this thesis, we show how DNNs can be applied to solve the more difficult instances of phase retrieval problems when training data is available. On the one hand, we show how supervised learning can be used to greatly improve the reconstruction quality when training images and their corresponding measurements are available. We analyze the ability of these methods to generalize to out-of-distribution data. On the other hand, we take a closer look at an existing unsupervised method that relies on generative models. Unsupervised methods are agnostic toward the measurement process which is particularly useful for Gaussian phase retrieval. We apply this method to the Fourier phase retrieval problem and demonstrate how the reconstruction performance can be further improved with different initialization schemes. Furthermore, we demonstrate how optimizing intermediate representations of the underlying generative model can help overcoming the limited range of the model and, thus, can help to reach better solutions. Finally, we show how backpropagation can be used to learn reference images using a modification of the well-established error-reduction algorithm and discuss whether learning a reference image is always efficient. As it is common in machine learning research, we evaluate all methods on benchmark image datasets as it allows for easy reproducibility of the experiments and comparability to related methods. To better understand how the methods work, we perform extensive ablation experiments, and also analyze the influence of measurement noise and missing measurements

    Deep synthesis of cloud lighting

    Get PDF
    Current appearance models for the sky are able to represent clear sky illumination to a high degree of accuracy. However, these models all lack a common feature of real-skies: clouds. These are an essential component for many applications which rely on realistic skies, such as image editing and synthesis. While clouds can be added to existing sky models through rendering, this is hard to achieve due to the difficulties of representing clouds and the complexities of volumetric light transport. In this work, an alternative approach to this problem is proposed whereby clouds are synthesized using a learned data-driven representation. This leverages a captured collection of High Dynamic Range cloudy sky imagery, and combines this dataset with clear sky models to produce plausible cloud appearance from a coarse representation of cloud positions. This representation is artist controllable, allowing for novel cloudscapes to be rapidly synthesized and used for lighting virtual environments

    Deep Synthesis of Cloud Lighting

    Get PDF
    Current appearance models for the sky are able to represent clear sky illumination to a high degree of accuracy. However, these models all lack a common feature of real-skies: clouds. These are an essential component for many applications which rely on realistic skies, such as image editing and synthesis. While clouds can be added to existing sky models through rendering, this is hard to achieve due to the difficulties of representing clouds and the complexities of volumetric light transport. In this work, an alternative approach to this problem is proposed whereby clouds are synthesized using a learned data-driven representation. This leverages a captured collection of High Dynamic Range cloudy sky imagery, and combines this dataset with clear sky models to produce plausible cloud appearance from a coarse representation of cloud positions. This representation is artist controllable, allowing for novel cloud scapes to be rapidly synthesized, and used for lighting virtual environments
    corecore