11,991 research outputs found
A new analytics model for large scale multidimensional data visualization
© Springer International Publishing Switzerland 2015. With The Rise Of Big Data, The Challenge For Modern Multidimen-Sional Data Analysis And Visualization Is How It Grows Very Quickly In Size And Complexity. In This Paper, We First Present A Classification Method Called The 5ws Dimensions Which Classifies Multidimensional Data Into The 5ws Definitions. The 5ws Dimensions Can Be Applied To Multiple Datasets Such As Text Datasets, Audio Datasets And Video Datasets. Second, We Establish A Pair-Density Model To Analyze The Data Patterns To Compare The Multidimensional Data On The 5ws Patterns. Third, We Created Two Additional Parallel Axes By Using Pair-Density For Visualization. The Attributes Has Been Shrunk To Reduce Data Over-Crowding In Pair-Density Parallel Coordinates. This Has Achieved More Than 80% Clutter Reduction Without The Loss Of Information. The Experiment Shows That Our Model Can Be Efficiently Used For Big Data Analysis And Visualization
Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis
Exploring data requires a fast feedback loop from the analyst to the system,
with a latency below about 10 seconds because of human cognitive limitations.
When data becomes large or analysis becomes complex, sequential computations
can no longer be completed in a few seconds and data exploration is severely
hampered. This article describes a novel computation paradigm called
Progressive Computation for Data Analysis or more concisely Progressive
Analytics, that brings at the programming language level a low-latency
guarantee by performing computations in a progressive fashion. Moving this
progressive computation at the language level relieves the programmer of
exploratory data analysis systems from implementing the whole analytics
pipeline in a progressive way from scratch, streamlining the implementation of
scalable exploratory data analysis systems. This article describes the new
paradigm through a prototype implementation called ProgressiVis, and explains
the requirements it implies through examples.Comment: 10 page
Benchmarking SciDB Data Import on HPC Systems
SciDB is a scalable, computational database management system that uses an
array model for data storage. The array data model of SciDB makes it ideally
suited for storing and managing large amounts of imaging data. SciDB is
designed to support advanced analytics in database, thus reducing the need for
extracting data for analysis. It is designed to be massively parallel and can
run on commodity hardware in a high performance computing (HPC) environment. In
this paper, we present the performance of SciDB using simulated image data. The
Dynamic Distributed Dimensional Data Model (D4M) software is used to implement
the benchmark on a cluster running the MIT SuperCloud software stack. A peak
performance of 2.2M database inserts per second was achieved on a single node
of this system. We also show that SciDB and the D4M toolbox provide more
efficient ways to access random sub-volumes of massive datasets compared to the
traditional approaches of reading volumetric data from individual files. This
work describes the D4M and SciDB tools we developed and presents the initial
performance results. This performance was achieved by using parallel inserts, a
in-database merging of arrays as well as supercomputing techniques, such as
distributed arrays and single-program-multiple-data programming.Comment: 5 pages, 4 figures, IEEE High Performance Extreme Computing (HPEC)
2016, best paper finalis
- …