108 research outputs found

    Improved Bounds and Schemes for the Declustering Problem

    Get PDF
    The declustering problem is to allocate given data on parallel working storage devices in such a manner that typical requests find their data evenly distributed on the devices. Using deep results from discrepancy theory, we improve previous work of several authors concerning range queries to higher-dimensional data. We give a declustering scheme with an additive error of Od(logā”dāˆ’1M)O_d(\log^{d-1} M) independent of the data size, where dd is the dimension, MM the number of storage devices and dāˆ’1d-1 does not exceed the smallest prime power in the canonical decomposition of MM into prime powers. In particular, our schemes work for arbitrary MM in dimensions two and three. For general dd, they work for all Mā‰„dāˆ’1M\geq d-1 that are powers of two. Concerning lower bounds, we show that a recent proof of a Ī©d(logā”dāˆ’12M)\Omega_d(\log^{\frac{d-1}{2}} M) bound contains an error. We close the gap in the proof and thus establish the bound.Comment: 19 pages, 1 figur

    ZONALNA PROCJENA I INTERPOLACIJA KAO ISTOVREMENI PRISTUPI U SLUČAJU MALOG ULAZNOG B ROJA PODATAKA (PRIMJER POLJA ŠANDROVAC, SJEVERNA HRVATSKA)

    Get PDF
    The Bjelovar Subdepression area in Northern Croatia was analysed, especially the Å androvac Field that is located in the northern part of the subdepression. In this example, e-log depth marker Zā€™, i.e. the Pannonian and Pontian boundary, was used as an input data. The data were statistically analysed for the entire subdepression from 497 data readings from the regular grid with cell size of 1x1 km that covers the existing palaeostructural map. Then is selected 18 well data within the Å androvac Field where e-log markers are recognised (an example of a small number of data). They are also read directly for given structural map and mapped using one of the declustering methods known as Thiessen polygon method or Kriging. It is concluded when the mapping includes small number of data, and consequently local uncertainties, the subsurface mapping need to be done on both ways and maps compared.Analiziran je prostor Bjelovarske subdepresije, osobito polje Å androvac koje se nalazi u sjevernom dijelu. U ovom primjeru kao ulazni podatci uporabljene su dubine EK-markera Z\u27, tj. granice panona i ponta. One su statistički analizirane na razini cijele subdepresije iz 497 podataka očitanih iz pravilne mreže s ćelijama veličine 1x1 km kojom je prekrivena postojeća paleostrukturna karta. Nadalje, odabrano je 18 buÅ”otinskih smjestiÅ”ta unutar polja Å androvac gdje su karotažom određene dubine markera (primjer malog ulaznog broja podataka). I oni su očitani izravno sa spomenute karte te kartirani jednom od deklasterizacijskih metoda, tj. metodom Thiessenovih poligona ili kriginga. Zaključeno je kada kartiranje uključuje značaje lokalne nesigurnosti te mali broj podataka, opravdano je dubinsko kartiranje na oba prikazana načina te usporedba rjeÅ”enja

    Scalability analysis of declustering methods for multidimensional range queries

    Get PDF
    Abstractā€”Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many data-intensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partial-match and best-match queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methodsĀ¦Disk Modulo and Fieldwise XorĀ¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page

    Asymptotically optimal declustering schemes for 2-dim range queries

    Get PDF
    AbstractDeclustering techniques have been widely adopted in parallel storage systems (e.g. disk arrays) to speed up bulk retrieval of multidimensional data. A declustering scheme distributes data items among multiple disks, thus enabling parallel data access and reducing query response time. We measure the performance of any declustering scheme as its worst case additive deviation from the ideal scheme. The goal thus is to design declustering schemes with as small an additive error as possible. We describe a number of declustering schemes with additive error O(logM) for 2-dimensional range queries, where M is the number of disks. These are the first results giving O(logM) upper bound for all values of M. Our second result is a lower bound on the additive error. It is known that except for a few stringent cases, additive error of any 2-dimensional declustering scheme is at least one. We strengthen this lower bound to Ī©((logM)(dāˆ’1/2)) for d-dimensional schemes and to Ī©(logM) for 2-dimensional schemes, thus proving that the 2-dimensional schemes described in this paper are (asymptotically) optimal. These results are obtained by establishing a connection to geometric discrepancy. We also present simulation results to evaluate the performance of these schemes in practice

    Decomposing spatio-temporal seismicity patterns

    Get PDF
    Seismicity is a distributed process of great spatial and temporal variability and complexity. Efforts to characterise and describe the evolution of seismicity patterns have a long history. Today, the detection of changes in the spatial distribution of seismicity is still regarded as one of the most important approaches in monitoring and understanding seismicity. The problem of how to best describe these spatio-temporal changes remains, also in view of the detection of possible precursors for large earthquakes. In particular, it is difficult to separate the superimposed effects of different origin and to unveil the subtle (precursory) effects in the presence of stronger but irrelevant constituents. I present an approach to the latter two problems which relies on the Principal Components Analysis (PCA), a method based on eigen-structure analysis, by taking a time series approach and separating the seismicity rate patterns into a background component and components of change. I show a sample application to the Southern California area and discuss the promising results in view of their implications, potential applications and with respect to their possible precursory qualities

    Soil Spatial Scaling: Modelling variability of soil properties across scales using legacy data

    Get PDF
    Understanding how soil variability changes with spatial scale is critical to our ability to understand and model soil processes at scales relevant to decision makers. This thesis uses legacy data to address the ongoing challenge of understanding soil spatial variability in a number of complementary ways. We use a range of information: precision agriculture studies; compiled point datasets; and remotely observed raster datasets. We use classical geostatistics, but introduce a new framework for comparing variability of spatial properties across scales. My thesis considers soil spatial variability from a number of geostatistical angles. We find the following: ā€¢ Field scale variograms show differing variance across several magnitudes. Further work is required to ensure consistency between survey design, experimental methodology and statistical methodology if these results are to become useful for comparison. ā€¢ Declustering is a useful tool to deal with the patchy design of legacy data. It is not a replacement for an evenly distributed dataset, but it does allow the use of legacy data which would otherwise have limited utility. ā€¢ A framework which allows ā€˜roughnessā€™ to be expressed as a continuous variable appears to fit the data better than the mono-fractal or multi-fractal framework generally associated with multiā€“scale modelling of soil spatial variability. ā€¢ Soil appears to have a similar degree of stochasticity to short range topographic variability, and a higher degree of stochasticity at short ranges (less than 10km and 100km) than vegetation and Radiometrics respectively. ā€¢ At longer ranges of variability (i.e. around 100km) only rainfall and height above sea level show distinctly different stochasticity. ā€¢ Global variograms show strong isotropy, unlike the variograms for the Australian continent

    An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System

    Get PDF
    The growing importance of geospatial databases has made it essential to perform complex spatial queries efficiently. To achieve acceptable performance levels, database systems have been increasingly required to make use of parallelism. The spatial join is a computationally expensive operator. Efficient implementation of the join operator is, thus, desirable. The work presented in this document attempts to improve the performance of spatial join queries by distributing the data set across several nodes of a cluster and executing queries across these nodes in parallel. This document discusses a new parallel algorithm that implements the spatial join in an efficient manner. This algorithm is compared to an existing parallel spatial-join algorithm, the clone join. Both algorithms have been implemented on a Beowulf cluster and compared using real datasets. An extensive experimental analysis reveals that the proposed algorithm exhibits superior performance both in declustering time as well as in the execution time of the join query

    Fractal Dimension Study of Southern California Temporospatial Seismicity Patterns from 1982 to 2020:

    Get PDF
    Thesis advisor: John E. EbelPower-law scaling relationships concerning the earthquake frequency-magnitude distribution and the fractal geometry of spatial seismicity patterns may provide applications to earthquake forecasting and earthquake hazard studies. Past studies on the fractal characteristics of seismic phenomena have observed spatial and temporal differences in earthquake clustering and b value in relation to fractal dimension value. In this thesis, an investigation of the spatiotemporal seismicity patterns in southern California for the years 1982 to 2020 was conducted. The range and temporospatial distribution of b and D2 values for earthquake hypocenters contained in the Southern California Earthquake Data Center catalogue were calculated and shown in time series and spatial distribution maps. b values were calculated using both the Least SquaresMethod and the Maximum Likelihood Method while D2 values were calculated for length scales between 1 km to 10 km. A set of b and D2 values were calculated after declustering for foreshocks and aftershocks using Gardner and Knopoffā€™s declustering algorithm. b values decreased while D2 values increased on the dates of M > 6.0 earthquakes, whereas b values increased and D2 values decreased on the dates after M > 6.0 earthquakes. Declustering results suggest an influence of earthquake aftershocks to increase D2 values while decreasing b values. The role for b values and D2 values to delineate both the temporal and spatial extent of aftershock sequences for large earthquakes may prove to have an application in earthquake hazard studies.Thesis (MS) ā€” Boston College, 2022.Submitted to: Boston College. Graduate School of Arts and Sciences.Discipline: Earth and Environmental Sciences
    • ā€¦
    corecore