9,913 research outputs found
Big Data Visualization Tools
Data visualization is the presentation of data in a pictorial or graphical
format, and a data visualization tool is the software that generates this
presentation. Data visualization provides users with intuitive means to
interactively explore and analyze data, enabling them to effectively identify
interesting patterns, infer correlations and causalities, and supports
sense-making activities.Comment: This article appears in Encyclopedia of Big Data Technologies,
Springer, 201
Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis
Exploring data requires a fast feedback loop from the analyst to the system,
with a latency below about 10 seconds because of human cognitive limitations.
When data becomes large or analysis becomes complex, sequential computations
can no longer be completed in a few seconds and data exploration is severely
hampered. This article describes a novel computation paradigm called
Progressive Computation for Data Analysis or more concisely Progressive
Analytics, that brings at the programming language level a low-latency
guarantee by performing computations in a progressive fashion. Moving this
progressive computation at the language level relieves the programmer of
exploratory data analysis systems from implementing the whole analytics
pipeline in a progressive way from scratch, streamlining the implementation of
scalable exploratory data analysis systems. This article describes the new
paradigm through a prototype implementation called ProgressiVis, and explains
the requirements it implies through examples.Comment: 10 page
Applications and Challenges of Real-time Mobile DNA Analysis
The DNA sequencing is the process of identifying the exact order of
nucleotides within a given DNA molecule. The new portable and relatively
inexpensive DNA sequencers, such as Oxford Nanopore MinION, have the potential
to move DNA sequencing outside of laboratory, leading to faster and more
accessible DNA-based diagnostics. However, portable DNA sequencing and analysis
are challenging for mobile systems, owing to high data throughputs and
computationally intensive processing performed in environments with unreliable
connectivity and power.
In this paper, we provide an analysis of the challenges that mobile systems
and mobile computing must address to maximize the potential of portable DNA
sequencing, and in situ DNA analysis. We explain the DNA sequencing process and
highlight the main differences between traditional and portable DNA sequencing
in the context of the actual and envisioned applications. We look at the
identified challenges from the perspective of both algorithms and systems
design, showing the need for careful co-design
Interactive Data Exploration of Distributed Raw Files: A Systematic Mapping Study
When exploring big amounts of data without a clear target, providing an interactive experience
becomes really dif cult, since this tentative inspection usually defeats any early decision on data structures
or indexing strategies. This is also true in the physics domain, speci cally in high-energy physics, where
the huge volume of data generated by the detectors are normally explored via C++ code using batch
processing, which introduces a considerable latency. An interactive tool, when integrated into the existing
data management systems, can add a great value to the usability of these platforms. Here, we intend to
review the current state-of-the-art of interactive data exploration, aiming at satisfying three requirements:
access to raw data les, stored in a distributed environment, and with a reasonably low latency. This paper
follows the guidelines for systematic mapping studies, which is well suited for gathering and classifying
available studies.We summarize the results after classifying the 242 papers that passed our inclusion criteria.
While there are many proposed solutions that tackle the problem in different manners, there is little evidence
available about their implementation in practice. Almost all of the solutions found by this paper cover a
subset of our requirements, with only one partially satisfying the three. The solutions for data exploration
abound. It is an active research area and, considering the continuous growth of data volume and variety,
is only to become harder. There is a niche for research on a solution that covers our requirements, and the
required building blocks are there
Combining in-situ and in-transit processing to enable extreme-scale scientific analysis
pre-printWith the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight
- …