Search CORE

4,072 research outputs found

Deferred Maintenance of Disk-Based Random Samples

Author: Gemulla Rainer
Lehner Wolfgang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/01/2023
Field of study

Random sampling is a well-known technique for approximate processing of large datasets. We introduce a set of algorithms for incremental maintenance of large random samples on secondary storage. We show that the sample maintenance cost can be reduced by refreshing the sample in a deferred manner. We introduce a novel type of log file which follows the intuition that only a “sample” of the operations on the base data has to be considered to maintain a random sample in a statistically correct way. Additionally, we develop a deferred refresh algorithm which updates the sample by using fast sequential disk access only, and which does not require any main memory. We conducted an extensive set of experiments and found, that our algorithms reduce maintenance cost by several orders of magnitude

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Structural characterization and statistical-mechanical model of epidermal patterns

Author: Aw Wen Yih
Chen Duyu
Devenport Danelle
Torquato Salvatore
Publication venue: 'Elsevier BV'
Publication date: 01/12/2016
Field of study

In proliferating epithelia of mammalian skin, cells of irregular polygonal-like shapes pack into complex nearly flat two-dimensional structures that are pliable to deformations. In this work, we employ various sensitive correlation functions to quantitatively characterize structural features of evolving packings of epithelial cells across length scales in mouse skin. We find that the pair statistics in direct and Fourier spaces of the cell centroids in the early stages of embryonic development show structural directional dependence, while in the late stages the patterns tend towards statistically isotropic states. We construct a minimalist four-component statistical-mechanical model involving effective isotropic pair interactions consisting of hard-core repulsion and extra short-ranged soft-core repulsion beyond the hard core, whose length scale is roughly the same as the hard core. The model parameters are optimized to match the sample pair statistics in both direct and Fourier spaces. By doing this, the parameters are biologically constrained. Our model predicts essentially the same polygonal shape distribution and size disparity of cells found in experiments as measured by Voronoi statistics. Moreover, our simulated equilibrium liquid-like configurations are able to match other nontrivial unconstrained statistics, which is a testament to the power and novelty of the model. We discuss ways in which our model might be extended so as to better understand morphogenesis (in particular the emergence of planar cell polarity), wound-healing, and disease progression processes in skin, and how it could be applied to the design of synthetic tissues

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

PubMed Central

Space Station Furnace Facility. Volume 2: Requirements definition and conceptual design study

Author
Publication venue
Publication date
Field of study

The Space Station Freedom Furnace (SSFF) Project is divided into two phases: phase 1, a definition study phase, and phase 2, a design and development phase. TBE was awarded a research study entitled, 'Space Station Furnace Facility Requirements Definition and Conceptual Design Study' on June 2, 1989. This report addresses the definition study phase only. Phase 2 is to be complete after completion of phase 1. The contract encompassed a requirements definition study and culminated in hardware/facility conceptual designs and hardware demonstration development models to test these conceptual designs. The study was divided into two parts. Part 1 (the basic part of the effort) encompassed preliminary requirements definition and assessment; conceptional design of the SSFF Core; fabrication of mockups; and preparation for the support of a conceptional design review (CoDR). Part 2 (the optional part of the effort) included detailed definition of the engineering and design requirements, as derived from the science requirements; refinement of the conceptual design of the SSFF Core; fabrication and testing of the 'breadboards' or development models; and preparation for and support of a requirements definition review

NASA Technical Reports Server

VerdictDB: Universalizing Approximate Query Processing

Author: Bickel P. J.
Bootstrapping Sample Survey Data Comparing Recent
Canty A. J.
Condie T.
Eykholt K.
Flajolet P.
Ganti V.
Hall P.
Kleiner A.
Mayo D. G.
Meliou A.
Mozafari B.
Mozafari B.
Mozafari B.
Mozafari B.
Olston C.
Park Y.
Politis D. N.
Sidirourgos L.
Su H.
Vrbsky S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/11/2018
Field of study

Despite 25 years of research in academia, approximate query processing (AQP) has had little industrial adoption. One of the major causes of this slow adoption is the reluctance of traditional vendors to make radical changes to their legacy codebases, and the preoccupation of newer vendors (e.g., SQL-on-Hadoop products) with implementing standard features. Additionally, the few AQP engines that are available are each tied to a specific platform and require users to completely abandon their existing databases---an unrealistic expectation given the infancy of the AQP technology. Therefore, we argue that a universal solution is needed: a database-agnostic approximation engine that will widen the reach of this emerging technology across various platforms. Our proposal, called VerdictDB, uses a middleware architecture that requires no changes to the backend database, and thus, can work with all off-the-shelf engines. Operating at the driver-level, VerdictDB intercepts analytical queries issued to the database and rewrites them into another query that, if executed by any standard relational engine, will yield sufficient information for computing an approximate answer. VerdictDB uses the returned result set to compute an approximate answer and error estimates, which are then passed on to the user or application. However, lack of access to the query execution layer introduces significant challenges in terms of generality, correctness, and efficiency. This paper shows how VerdictDB overcomes these challenges and delivers up to 171

\times

speedup (18.45

\times

on average) for a variety of existing engines, such as Impala, Spark SQL, and Amazon Redshift, while incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache License.Comment: Extended technical report of the paper that appeared in Proceedings of the 2018 International Conference on Management of Data, pp. 1461-1476. ACM, 201

arXiv.org e-Print Archive

Crossref

Sampling Algorithms for Evolving Datasets

Author: Gemulla Rainer
Publication venue: Technische Universität Dresden
Publication date: 20/10/2008
Field of study

Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such samples are widely used to speed up the processing of analytic queries and data-mining tasks, to enhance query optimization, and to facilitate information integration. Most of the existing work on database sampling focuses on how to create or exploit a random sample of a static database, that is, a database that does not change over time. The assumption of a static database, however, severely limits the applicability of these techniques in practice, where data is often not static but continuously evolving. In order to maintain the statistical validity of the sample, any changes to the database have to be appropriately reflected in the sample. In this thesis, we study efficient methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions, updates, and deletions. We consider instances of the maintenance problem that arise when sampling from an evolving set, from an evolving multiset, from the distinct items in an evolving multiset, or from a sliding window over a data stream. Our algorithms completely avoid any accesses to the base data and can be several orders of magnitude faster than algorithms that do rely on such expensive accesses. The improved efficiency of our algorithms comes at virtually no cost: the resulting samples are provably uniform and only a small amount of auxiliary information is associated with the sample. We show that the auxiliary information not only facilitates efficient maintenance, but it can also be exploited to derive unbiased, low-variance estimators for counts, sums, averages, and the number of distinct items in the underlying dataset. In addition to sample maintenance, we discuss methods that greatly improve the flexibility of random sampling from a system's point of view. More specifically, we initiate the study of algorithms that resize a random sample upwards or downwards. Our resizing algorithms can be exploited to dynamically control the size of the sample when the dataset grows or shrinks; they facilitate resource management and help to avoid under- or oversized samples. Furthermore, in large-scale databases with data being distributed across several remote locations, it is usually infeasible to reconstruct the entire dataset for the purpose of sampling. To address this problem, we provide efficient algorithms that directly combine the local samples maintained at each location into a sample of the global dataset. We also consider a more general problem, where the global dataset is defined as an arbitrary set or multiset expression involving the local datasets, and provide efficient solutions based on hashing

Technische Universität Dresden: Qucosa

Mobile graphics: SIGGRAPH Asia 2017 course

Author: Agus Marco
Gobbetti Enrico
Marton Fabio
Pintore Giovanni
Vázquez Alcocer Pere Pau
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A dip in the reservoir: Maintaining sample synopses of evolving datasets

Author: Gemulla Rainer
Haas Peter J.
Lehner Wolfgang
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 30/05/2022
Field of study

Perhaps the most flexible synopsis of a database is a random sample of the data; such samples are widely used to speed up processing of analytic queries and data-mining tasks, enhance query optimization, and facilitate information integration. In this paper, we study methods for incrementally maintaining a uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions and deletions. For “stable” datasets whose sizeremains roughly constant over time, we provide a novel sampling scheme, called “random pairing” (RP) which maintains a bounded-size uniform sample by using newly inserted data items to compensate for previous deletions. The RP algorithm is the first extension of the almost 40-year-old reservoir sampling algorithm to handle deletions. Experiments show that, when dataset-size fluctuations over time are not too extreme, RP is the algorithm of choice with respect to speed and sample-size stability. For “growing” datasets, we consider algorithms for periodically “resizing” a bounded-size random sample upwards. We prove that any such algorithm cannot avoid accessing the base data, and provide a novel resizing algorithm that minimizes the time needed to increase the sample size

Technische Universität Dresden: Qucosa

Forage Quality of Intensive Rotationally Grazed Pastures 1988-1990 Seneca Trail RC&D

Author: Rayburn Edward Barrow
Publication venue: The Research Repository @ WVU
Publication date: 15/04/1991
Field of study

WVU-Extension fact shee

The Research Repository @ WVU (West Virginia University)