5,635 research outputs found
A Model-Driven Approach to Automate Data Visualization in Big Data Analytics
In big data analytics, advanced analytic techniques operate on big data sets aimed at complementing the role of traditional OLAP for decision making. To enable companies to take benefit of these techniques despite the lack of in-house technical skills, the H2020 TOREADOR Project adopts a model-driven architecture for streamlining analysis processes, from data preparation to their visualization. In this paper we propose a new approach named SkyViz focused on the visualization area, in particular on (i) how to specify the user's objectives and describe the dataset to be visualized, (ii) how to translate this specification into a platform-independent visualization type, and (iii) how to concretely implement this visualization type on the target execution platform. To support step (i) we define a visualization context based on seven prioritizable coordinates for assessing the user's objectives and conceptually describing the data to be visualized. To automate step (ii) we propose a skyline-based technique that translates a visualization context into a set of most-suitable visualization types. Finally, to automate step (iii) we propose a skyline-based technique that, with reference to a specific platform, finds the best bindings between the columns of the dataset and the graphical coordinates used by the visualization type chosen by the user. SkyViz can be transparently extended to include more visualization types on the one hand, more visualization coordinates on the other. The paper is completed by an evaluation of SkyViz based on a case study excerpted from the pilot applications of the TOREADOR Project
Continuous Performance Benchmarking Framework for ROOT
Foundational software libraries such as ROOT are under intense pressure to
avoid software regression, including performance regressions. Continuous
performance benchmarking, as a part of continuous integration and other code
quality testing, is an industry best-practice to understand how the performance
of a software product evolves over time. We present a framework, built from
industry best practices and tools, to help to understand ROOT code performance
and monitor the efficiency of the code for a several processor architectures.
It additionally allows historical performance measurements for ROOT I/O,
vectorization and parallelization sub-systems.Comment: 8 pages, 5 figures, CHEP 2018 - 23rd International Conference on
Computing in High Energy and Nuclear Physic
Data-driven model reduction and transfer operator approximation
In this review paper, we will present different data-driven dimension
reduction techniques for dynamical systems that are based on transfer operator
theory as well as methods to approximate transfer operators and their
eigenvalues, eigenfunctions, and eigenmodes. The goal is to point out
similarities and differences between methods developed independently by the
dynamical systems, fluid dynamics, and molecular dynamics communities such as
time-lagged independent component analysis (TICA), dynamic mode decomposition
(DMD), and their respective generalizations. As a result, extensions and best
practices developed for one particular method can be carried over to other
related methods
Software Challenges For HL-LHC Data Analysis
The high energy physics community is discussing where investment is needed to
prepare software for the HL-LHC and its unprecedented challenges. The ROOT
project is one of the central software players in high energy physics since
decades. From its experience and expectations, the ROOT team has distilled a
comprehensive set of areas that should see research and development in the
context of data analysis software, for making best use of HL-LHC's physics
potential. This work shows what these areas could be, why the ROOT team
believes investing in them is needed, which gains are expected, and where
related work is ongoing. It can serve as an indication for future research
proposals and cooperations
Cytoscape: the network visualization tool for GenomeSpace workflows.
Modern genomic analysis often requires workflows incorporating multiple best-of-breed tools. GenomeSpace is a web-based visual workbench that combines a selection of these tools with mechanisms that create data flows between them. One such tool is Cytoscape 3, a popular application that enables analysis and visualization of graph-oriented genomic networks. As Cytoscape runs on the desktop, and not in a web browser, integrating it into GenomeSpace required special care in creating a seamless user experience and enabling appropriate data flows. In this paper, we present the design and operation of the Cytoscape GenomeSpace app, which accomplishes this integration, thereby providing critical analysis and visualization functionality for GenomeSpace users. It has been downloaded over 850 times since the release of its first version in September, 2013
Obvious: a meta-toolkit to encapsulate information visualization toolkits. One toolkit to bind them all
This article describes “Obvious”: a meta-toolkit that abstracts and encapsulates information visualization toolkits implemented in the Java language. It intends to unify their use and postpone the choice of which concrete toolkit(s) to use later-on in the development of visual analytics applications. We also report on the lessons we have learned when wrapping popular toolkits with Obvious, namely Prefuse, the InfoVis Toolkit, partly Improvise, JUNG and other data management libraries. We show several examples on the uses of Obvious, how the different toolkits can be combined, for instance sharing their data models. We also show how Weka and RapidMiner, two popular machine-learning toolkits, have been wrapped with Obvious and can be used directly with all the other wrapped toolkits. We expect Obvious to start a co-evolution process: Obvious is meant to evolve when more components of Information Visualization systems will become consensual. It is also designed to help information visualization systems adhere to the best practices to provide a higher level of interoperability and leverage the domain of visual analytics
Metabolomics Data Processing and Data Analysis—Current Best Practices
Metabolomics data analysis strategies are central to transforming raw metabolomics data files into meaningful biochemical interpretations that answer biological questions or generate novel hypotheses. This book contains a variety of papers from a Special Issue around the theme “Best Practices in Metabolomics Data Analysis”. Reviews and strategies for the whole metabolomics pipeline are included, whereas key areas such as metabolite annotation and identification, compound and spectral databases and repositories, and statistical analysis are highlighted in various papers. Altogether, this book contains valuable information for researchers just starting in their metabolomics career as well as those that are more experienced and look for additional knowledge and best practice to complement key parts of their metabolomics workflows
- …