35,482 research outputs found
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
Hierarchical Structure of Magnetohydrodynamic Turbulence In Position-Position-Velocity Space
Magnetohydrodynamic turbulence is able to create hierarchical structures in
the interstellar medium that are correlated on a wide range of scales via the
energy cascade. We use hierarchical tree diagrams known as dendrograms to
characterize structures in synthetic Position-Position-Velocity (PPV) emission
cubes of optically thin isothermal magnetohydrodynamic turbulence. We show that
the structures and degree of hierarchy observed in PPV space are related to the
physics of the gas, i.e. self-gravity and the global sonic and Alfvenic Mach
number. Simulations with higher Alfvenic Mach number, self-gravity and
supersonic flows display enhanced hierarchical structure. We observed a strong
sonic and Alfvenic dependency when we apply the the statistical moments (i.e.
mean, variance, skewness, kurtosis) to the dendrogram distribution. Larger
magnetic field and sonic Mach number correspond to larger values of the
moments. Application of the dendrogram to 3D density cubes, also known as
Position-Position-Position cubes (PPP), reveals that the dominant emission
contours in PPP and PPV are related for supersonic gas but not for subsonic. We
also explore the effects of smoothing, thermal broadening and velocity
resolution on the dendrograms in order to make our study more applicable to
observational data. These results all point to hierarchical tree diagrams as
being a promising additional tool for studying ISM turbulence and star forming
regions in the direction of obtaining information on the degree of
self-gravity, the Mach numbers and the complicated relationship between PPV and
PPP.Comment: submitted to Ap
JP3D compression of solar data-cubes: photospheric imaging and spectropolarimetry
Hyperspectral imaging is an ubiquitous technique in solar physics
observations and the recent advances in solar instrumentation enabled us to
acquire and record data at an unprecedented rate. The huge amount of data which
will be archived in the upcoming solar observatories press us to compress the
data in order to reduce the storage space and transfer times. The correlation
present over all dimensions, spatial, temporal and spectral, of solar data-sets
suggests the use of a 3D base wavelet decomposition, to achieve higher
compression rates. In this work, we evaluate the performance of the recent
JPEG2000 Part 10 standard, known as JP3D, for the lossless compression of
several types of solar data-cubes. We explore the differences in: a) The
compressibility of broad-band or narrow-band time-sequence; I or V stokes
profiles in spectropolarimetric data-sets; b) Compressing data in
[x,y,] packages at different times or data in [x,y,t] packages of
different wavelength; c) Compressing a single large data-cube or several
smaller data-cubes; d) Compressing data which is under-sampled or super-sampled
with respect to the diffraction cut-off
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
Attribute Value Reordering For Efficient Hybrid OLAP
The normalization of a data cube is the ordering of the attribute values. For
large multidimensional arrays where dense and sparse chunks are stored
differently, proper normalization can lead to improved storage efficiency. We
show that it is NP-hard to compute an optimal normalization even for 1x3
chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are
nearly statistically independent, we show that dimension-wise attribute
frequency sorting is an optimal normalization and takes time O(d n log(n)) for
data cubes of size n^d. When dimensions are not independent, we propose and
evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is
already 19%-30% more efficient than ROLAP, but normalization can improve it
further by 9%-13% for a total gain of 29%-44% over ROLAP
- …