1,548 research outputs found
Enhancing Energy Production with Exascale HPC Methods
High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose
processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale
simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of
Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and
from the Brazilian Ministry of Science, Technology and Innovation through Rede
Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the
Intel Corporation, which enabled us to obtain the presented experimental results in
uncertainty quantification in seismic imagingPostprint (author's final draft
Applying future Exascale HPC methodologies in the energy sector
The appliance of new exascale HPC techniques to energy industry simulations is absolutely needed nowadays. In this sense, the common procedure is to customize these techniques to the specific energy sector they are of interest in order to go beyond the state-of-the-art in the required HPC exascale simulations. With this aim, the HPC4E project is developing new exascale methodologies to three different energy sources that are the present and the future of energy: wind energy production and design, efficient combustion systems for biomass-derived fuels (biogas), and exploration geophysics for hydrocarbon reservoirs. In this work, the general exascale advances proposed as part of HPC4E and its outcome to specific results in different domains are presented.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the Intel Corporation, which enabled us to obtain the presented experimental results in uncertainty quantification in seismic imaging.Postprint (author's final draft
<i>Active</i> provenance for Data-Intensive workflows: engaging users and developers
We present a practical approach for provenance capturing in Data-Intensive workflow systems. It provides contextualisation by recording injected domain metadata with the provenance stream. It offers control over lineage precision, combining automation with specified adaptations. We address provenance tasks such as extraction of domain metadata, injection of custom annotations, accuracy and integration of records from multiple independent workflows running in distributed contexts. To allow such flexibility, we introduce the concepts of programmable Provenance Types and Provenance Configuration.Provenance Types handle domain contextualisation and allow developers to model lineage patterns by re-defining API methods, composing easy-to-use extensions. Provenance Configuration, instead, enables users of a Data-Intensive workflow execution to prepare it for provenance capture, by configuring the attribution of Provenance Types to components and by specifying grouping into semantic clusters. This enables better searches over the lineage records. Provenance Types and Provenance Configuration are demonstrated in a system being used by computational seismologists. It is based on an extended provenance model, S-PROV.PublishedSan Diego (CA, USA)3IT. Calcolo scientific
explAIner: A Visual Analytics Framework for Interactive and Explainable Machine Learning
We propose a framework for interactive and explainable machine learning that
enables users to (1) understand machine learning models; (2) diagnose model
limitations using different explainable AI methods; as well as (3) refine and
optimize the models. Our framework combines an iterative XAI pipeline with
eight global monitoring and steering mechanisms, including quality monitoring,
provenance tracking, model comparison, and trust building. To operationalize
the framework, we present explAIner, a visual analytics system for interactive
and explainable machine learning that instantiates all phases of the suggested
pipeline within the commonly used TensorBoard environment. We performed a
user-study with nine participants across different expertise levels to examine
their perception of our workflow and to collect suggestions to fill the gap
between our system and framework. The evaluation confirms that our tightly
integrated system leads to an informed machine learning process while
disclosing opportunities for further extensions.Comment: 9 pages paper, 2 pages references, 5 pages supplementary material
(ancillary files
Exploratory search in time-oriented primary data
In a variety of research fields, primary data that describes scientific phenomena in an original condition is obtained.
Time-oriented primary data, in particular, is an indispensable data type, derived from complex measurements depending
on time. Today, time-oriented primary data is collected at rates that exceed the domain experts’ abilities to seek
valuable information undiscovered in the data. It is widely accepted that the magnitudes of uninvestigated data will
disclose tremendous knowledge in data-driven research, provided that domain experts are able to gain insight into the
data. Domain experts involved in data-driven research urgently require analytical capabilities. In scientific practice,
predominant activities are the generation and validation of hypotheses. In analytical terms, these activities are often
expressed in confirmatory and exploratory data analysis. Ideally, analytical support would combine the strengths of
both types of activities.
Exploratory search (ES) is a concept that seamlessly includes information-seeking behaviors ranging from search
to exploration. ES supports domain experts in both gaining an understanding of huge and potentially unknown data
collections and the drill-down to relevant subsets, e.g., to validate hypotheses. As such, ES combines predominant tasks
of domain experts applied to data-driven research. For the design of useful and usable ES systems (ESS), data scientists
have to incorporate different sources of knowledge and technology. Of particular importance is the state-of-the-art
in interactive data visualization and data analysis. Research in these factors is at heart of Information Visualization
(IV) and Visual Analytics (VA). Approaches in IV and VA provide meaningful visualization and interaction designs,
allowing domain experts to perform the information-seeking process in an effective and efficient way. Today, bestpractice
ESS almost exclusively exist for textual data content, e.g., put into practice in digital libraries to facilitate the
reuse of digital documents. For time-oriented primary data, ES mainly remains at a theoretical state.
Motivation and Problem Statement. This thesis is motivated by two main assumptions. First, we expect that
ES will have a tremendous impact on data-driven research for many research fields. In this thesis, we focus on
time-oriented primary data, as a complex and important data type for data-driven research. Second, we assume that
research conducted to IV and VA will particularly facilitate ES. For time-oriented primary data, however, novel
concepts and techniques are required that enhance the design and the application of ESS. In particular, we observe a
lack of methodological research in ESS for time-oriented primary data. In addition, the size, the complexity, and the
quality of time-oriented primary data hampers the content-based access, as well as the design of visual interfaces
for gaining an overview of the data content. Furthermore, the question arises how ESS can incorporate techniques
for seeking relations between data content and metadata to foster data-driven research. Overarching challenges for
data scientists are to create usable and useful designs, urgently requiring the involvement of the targeted user group
and support techniques for choosing meaningful algorithmic models and model parameters. Throughout this thesis,
we will resolve these challenges from conceptual, technical, and systemic perspectives. In turn, domain experts can
benefit from novel ESS as a powerful analytical support to conduct data-driven research.
Concepts for Exploratory Search Systems (Chapter 3). We postulate concepts for the ES in time-oriented primary
data. Based on a survey of analysis tasks supported in IV and VA research, we present a comprehensive selection of
tasks and techniques relevant for search and exploration activities. The assembly guides data scientists in the choice of
meaningful techniques presented in IV and VA. Furthermore, we present a reference workflow for the design and
the application of ESS for time-oriented primary data. The workflow divides the data processing and transformation
process into four steps, and thus divides the complexity of the design space into manageable parts. In addition, the
reference workflow describes how users can be involved in the design. The reference workflow is the framework for
the technical contributions of this thesis.
Visual-Interactive Preprocessing of Time-Oriented Primary Data (Chapter 4). We present a visual-interactive
system that enables users to construct workflows for preprocessing time-oriented primary data. In this way, we
introduce a means of providing content-based access. Based on a rich set of preprocessing routines, users can create
individual solutions for data cleansing, normalization, segmentation, and other preprocessing tasks. In addition, the
system supports the definition of time series descriptors and time series distance measures. Guidance concepts support
users in assessing the workflow generalizability, which is important for large data sets. The execution of the workflows
transforms time-oriented primary data into feature vectors, which can subsequently be used for downstream search
and exploration techniques. We demonstrate the applicability of the system in usage scenarios and case studies.
Content-Based Overviews (Chapter 5). We introduce novel guidelines and techniques for the design of contentbased
overviews. The three key factors are the creation of meaningful data aggregates, the visual mapping of these
aggregates into the visual space, and the view transformation providing layouts of these aggregates in the display
space. For each of these steps, we characterize important visualization and interaction design parameters allowing the
involvement of users. We introduce guidelines supporting data scientists in choosing meaningful solutions. In addition,
we present novel visual-interactive quality assessment techniques enhancing the choice of algorithmic model and
model parameters. Finally, we present visual interfaces enabling users to formulate visual queries of the time-oriented
data content. In this way, we provide means of combining content-based exploration with content-based search.
Relation Seeking Between Data Content and Metadata (Chapter 6). We present novel visual interfaces enabling
domain experts to seek relations between data content and metadata. These interfaces can be integrated into ESS
to bridge analytical gaps between the data content and attached metadata. In three different approaches, we focus
on different types of relations and define algorithmic support to guide users towards most interesting relations.
Furthermore, each of the three approaches comprises individual visualization and interaction designs, enabling users
to explore both the data and the relations in an efficient and effective way. We demonstrate the applicability of our
interfaces with usage scenarios, each conducted together with domain experts. The results confirm that our techniques
are beneficial for seeking relations between data content and metadata, particularly for data-centered research.
Case Studies - Exploratory Search Systems (Chapter 7). In two case studies, we put our concepts and techniques
into practice. We present two ESS constructed in design studies with real users, and real ES tasks, and real timeoriented
primary data collections. The web-based VisInfo ESS is a digital library system facilitating the visual access to
time-oriented primary data content. A content-based overview enables users to explore large collections of time series
measurements and serves as a baseline for content-based queries by example. In addition, VisInfo provides a visual
interface for querying time oriented data content by sketch. A result visualization combines different views of the data
content and metadata with faceted search functionality. The MotionExplorer ESS supports domain experts in human
motion analysis. Two content-based overviews enhance the exploration of large collections of human motion capture
data from two perspectives. MotionExplorer provides a search interface, allowing domain experts to query human
motion sequences by example. Retrieval results are depicted in a visual-interactive view enabling the exploration of
variations of human motions. Field study evaluations performed for both ESS confirm the applicability of the systems
in the environment of the involved user groups. The systems yield a significant improvement of both the effectiveness
and the efficiency in the day-to-day work of the domain experts. As such, both ESS demonstrate how large collections
of time-oriented primary data can be reused to enhance data-centered research.
In essence, our contributions cover the entire time series analysis process starting from accessing raw time-oriented
primary data, processing and transforming time series data, to visual-interactive analysis of time series. We present
visual search interfaces providing content-based access to time-oriented primary data. In a series of novel explorationsupport
techniques, we facilitate both gaining an overview of large and complex time-oriented primary data collections
and seeking relations between data content and metadata. Throughout this thesis, we introduce VA as a means of
designing effective and efficient visual-interactive systems. Our VA techniques empower data scientists to choose
appropriate models and model parameters, as well as to involve users in the design. With both principles, we support
the design of usable and useful interfaces which can be included into ESS. In this way, our contributions bridge the gap
between search systems requiring exploration support and exploratory data analysis systems requiring visual querying
capability. In the ESS presented in two case studies, we prove that our techniques and systems support data-driven
research in an efficient and effective way
KNIT: Ontology reusability through knowledge graph exploration
Ontologies have become a standard for knowledge representation across several domains. In Life Sciences, numerous ontologies have been introduced to represent human knowledge, often providing overlapping or conflicting perspectives. These ontologies are usually published as OWL or OBO, and are often registered in open repositories, e.g., BioPortal. However, the task of finding the concepts (classes and their properties) defined in the existing ontologies and the relationships between these concepts across different ontologies – for example, for developing a new ontology aligned with the existing ones – requires a great deal of manual
effort in searching through the public repositories for candidate ontologies and their entities. In this work, we develop a new tool, KNIT, to automatically explore open repositories to help users fetch the previously designed concepts using keywords. User-specified keywords are then used to retrieve matching names of classes or properties. KNIT then creates a draft knowledge graph populated with the concepts and relationships retrieved from the existing ontologies. Furthermore, following the process of ontology learning, our tool refines this first draft of an ontology. We present three BioPortal-specific use cases for our tool. These use cases outline the development of new knowledge graphs and ontologies in the sub-domains of biology: genes and diseases,
virome and drugs.This work has been funded by grant PID2020-112540RB-C4121, AETHER-UMA (A smart data holistic approach for context-aware data analytics: semantics and context exploitation).
Funding for open access charge: Universidad de Málaga / CBUA
- …