5,609 research outputs found
Interactive Visualization of the Largest Radioastronomy Cubes
3D visualization is an important data analysis and knowledge discovery tool,
however, interactive visualization of large 3D astronomical datasets poses a
challenge for many existing data visualization packages. We present a solution
to interactively visualize larger-than-memory 3D astronomical data cubes by
utilizing a heterogeneous cluster of CPUs and GPUs. The system partitions the
data volume into smaller sub-volumes that are distributed over the rendering
workstations. A GPU-based ray casting volume rendering is performed to generate
images for each sub-volume, which are composited to generate the whole volume
output, and returned to the user. Datasets including the HI Parkes All Sky
Survey (HIPASS - 12 GB) southern sky and the Galactic All Sky Survey (GASS - 26
GB) data cubes were used to demonstrate our framework's performance. The
framework can render the GASS data cube with a maximum render time < 0.3 second
with 1024 x 1024 pixels output resolution using 3 rendering workstations and 8
GPUs. Our framework will scale to visualize larger datasets, even of Terabyte
order, if proper hardware infrastructure is available.Comment: 15 pages, 12 figures, Accepted New Astronomy July 201
Hillview:A trillion-cell spreadsheet for big data
Hillview is a distributed spreadsheet for browsing very large datasets that
cannot be handled by a single machine. As a spreadsheet, Hillview provides a
high degree of interactivity that permits data analysts to explore information
quickly along many dimensions while switching visualizations on a whim. To
provide the required responsiveness, Hillview introduces visualization
sketches, or vizketches, as a simple idea to produce compact data
visualizations. Vizketches combine algorithmic techniques for data
summarization with computer graphics principles for efficient rendering. While
simple, vizketches are effective at scaling the spreadsheet by parallelizing
computation, reducing communication, providing progressive visualizations, and
offering precise accuracy guarantees. Using Hillview running on eight servers,
we can navigate and visualize datasets of tens of billions of rows and
trillions of cells, much beyond the published capabilities of competing
systems
Slicer
Explorative data visualization is a widespread tool for gaining insights from datasets. Investigating data in linked visualizations lets users explore potential relationships in their data at will. Furthermore, this type of analysis does not require any technical knowledge, widening the userbase from developers to anyone. Implementing explorative data visualizations in web browsers makes data analysis accessible to anyone with a PC. In addition to accessibility, the available types of visualizations and their interactive latency are essential for the utility of data exploration. Available visualizations limit the number of datasets eligible for use in the application, and latency limits how much exploring the users are willing to do.
Existing solutions often do all the computation involved in either the client application or on a backend server. However, using the client limits performance and data size since hardware resources in web browsers are scarce, and sending large datasets over a network is not feasible. Whereas server-based computation often comes with high requirements for server hardware and is limited by network latency and bandwidth on each interaction.
This thesis presents Slicer, a framework for creating explorative data visualizations in web browsers. Applications can be created with minimal developer effort, requiring only a description of the visualizations. Slicer implements bar charts and choropleth maps. The visualizations are linked and can be filtered either by brushing or clicking on single targets. To overcome the hurdles of pure client- and server-reliant solutions, Slicer uses a hybrid approach, where prioritized interactions are handled client-side.
Recognizing that different types of interactions have different latency thresholds, we trade the cost of switching views for low latency on filtering. To achieve real-time filtering performance, we follow the principle that the chosen resolution of the visualizations, not data size, should limit interactive scalability. We describe use of data tiles accommodating more interactions than shown in earlier work, using an approach based on delta differencing, which ensures constant time complexity when filtering. For computing data tiles, we present techniques for efficient computation on consumer hardware.
Our results show that Slicer can offer real-time interactivity on latency-sensitive interactions regardless of data size, averaging above 150Hz on a consumer laptop. For less sensitive interactions, acceptable latency is shown for datasets with tens of millions of records, depending on the resolution of the visualizations
The archive solution for distributed workflow management agents of the CMS experiment at LHC
The CMS experiment at the CERN LHC developed the Workflow Management Archive
system to persistently store unstructured framework job report documents
produced by distributed workflow management agents. In this paper we present
its architecture, implementation, deployment, and integration with the CMS and
CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster.
The system leverages modern technologies such as a document oriented database
and the Hadoop eco-system to provide the necessary flexibility to reliably
process, store, and aggregate (1M) documents on a daily basis. We
describe the data transformation, the short and long term storage layers, the
query language, along with the aggregation pipeline developed to visualize
various performance metrics to assist CMS data operators in assessing the
performance of the CMS computing system.Comment: This is a pre-print of an article published in Computing and Software
for Big Science. The final authenticated version is available online at:
https://doi.org/10.1007/s41781-018-0005-
End-User Visualization and Manipulation of Distributed Aggregate Data
Aggregate visualization and manipulation enables the viewing and interaction of dynamically changing data sets in a graphically meaningful way. However, off-the-shelf applications typically provide only limited ways to view static aggregates and generally do not support manipulation of aggregate data through the resulting visualization. To be fully dynamic, an aggregate visualization should be customizable to suit the individual's needs and should allow end-users to modify the data through direct manipulation. This paper describes a software system that empowers end-users to create interactive aggregate visualizations through a visual language interface. Included are mechanisms for specifying how aggregate data is processed from multiple sources of a distributed application, providing functionality similar to project, select, join, and cross product of relational databases. This approach gives end-users the power to create customized, interactive visualizations of dynamically changing ..
MOLNs: A cloud platform for interactive, reproducible and scalable spatial stochastic computational experiments in systems biology using PyURDME
Computational experiments using spatial stochastic simulations have led to
important new biological insights, but they require specialized tools, a
complex software stack, as well as large and scalable compute and data analysis
resources due to the large computational cost associated with Monte Carlo
computational workflows. The complexity of setting up and managing a
large-scale distributed computation environment to support productive and
reproducible modeling can be prohibitive for practitioners in systems biology.
This results in a barrier to the adoption of spatial stochastic simulation
tools, effectively limiting the type of biological questions addressed by
quantitative modeling. In this paper, we present PyURDME, a new, user-friendly
spatial modeling and simulation package, and MOLNs, a cloud computing appliance
for distributed simulation of stochastic reaction-diffusion models. MOLNs is
based on IPython and provides an interactive programming platform for
development of sharable and reproducible distributed parallel computational
experiments
- …