35,482 research outputs found

    Data Cube Approximation and Mining using Probabilistic Modeling

    Get PDF
    On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

    Hierarchical Structure of Magnetohydrodynamic Turbulence In Position-Position-Velocity Space

    Get PDF
    Magnetohydrodynamic turbulence is able to create hierarchical structures in the interstellar medium that are correlated on a wide range of scales via the energy cascade. We use hierarchical tree diagrams known as dendrograms to characterize structures in synthetic Position-Position-Velocity (PPV) emission cubes of optically thin isothermal magnetohydrodynamic turbulence. We show that the structures and degree of hierarchy observed in PPV space are related to the physics of the gas, i.e. self-gravity and the global sonic and Alfvenic Mach number. Simulations with higher Alfvenic Mach number, self-gravity and supersonic flows display enhanced hierarchical structure. We observed a strong sonic and Alfvenic dependency when we apply the the statistical moments (i.e. mean, variance, skewness, kurtosis) to the dendrogram distribution. Larger magnetic field and sonic Mach number correspond to larger values of the moments. Application of the dendrogram to 3D density cubes, also known as Position-Position-Position cubes (PPP), reveals that the dominant emission contours in PPP and PPV are related for supersonic gas but not for subsonic. We also explore the effects of smoothing, thermal broadening and velocity resolution on the dendrograms in order to make our study more applicable to observational data. These results all point to hierarchical tree diagrams as being a promising additional tool for studying ISM turbulence and star forming regions in the direction of obtaining information on the degree of self-gravity, the Mach numbers and the complicated relationship between PPV and PPP.Comment: submitted to Ap

    JP3D compression of solar data-cubes: photospheric imaging and spectropolarimetry

    Full text link
    Hyperspectral imaging is an ubiquitous technique in solar physics observations and the recent advances in solar instrumentation enabled us to acquire and record data at an unprecedented rate. The huge amount of data which will be archived in the upcoming solar observatories press us to compress the data in order to reduce the storage space and transfer times. The correlation present over all dimensions, spatial, temporal and spectral, of solar data-sets suggests the use of a 3D base wavelet decomposition, to achieve higher compression rates. In this work, we evaluate the performance of the recent JPEG2000 Part 10 standard, known as JP3D, for the lossless compression of several types of solar data-cubes. We explore the differences in: a) The compressibility of broad-band or narrow-band time-sequence; I or V stokes profiles in spectropolarimetric data-sets; b) Compressing data in [x,y,λ\lambda] packages at different times or data in [x,y,t] packages of different wavelength; c) Compressing a single large data-cube or several smaller data-cubes; d) Compressing data which is under-sampled or super-sampled with respect to the diffraction cut-off

    A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing

    Full text link
    The overwhelmingly increasing amount of stored data has spurred researchers seeking different methods in order to optimally take advantage of it which mostly have faced a response time problem as a result of this enormous size of data. Most of solutions have suggested materialization as a favourite solution. However, such a solution cannot attain Real- Time answers anyhow. In this paper we propose a framework illustrating the barriers and suggested solutions in the way of achieving Real-Time OLAP answers that are significantly used in decision support systems and data warehouses

    Attribute Value Reordering For Efficient Hybrid OLAP

    Get PDF
    The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(d n log(n)) for data cubes of size n^d. When dimensions are not independent, we propose and evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19%-30% more efficient than ROLAP, but normalization can improve it further by 9%-13% for a total gain of 29%-44% over ROLAP
    corecore