10 research outputs found
Policies for Web Services
Web services are predominantly used to implement service-oriented architectures (SOA). However, there are several areas such as temporal dimensions, real-time, streaming, or efficient and flexible file transfers where web service functionality should be extended. These extensions can, for example, be achieved by using policies. Since there are often alternative solutions to provide functionality (e.g., different protocols can be used to achieve the transfer of data), the WS-Policy standard is especially useful to extend web services with policies. It allows to create policies to generally state the properties under which a service is provided and to explicitly express alternative properties. To extend the functionality of web services, two policies are introduced in this thesis: the Temporal Policy and the Communication Policy.
The temporal policy is the foundation for adding temporal dimensions to a WS-Policy. The temporal policy itself is not a WS-Policy but an independent policy language that describes temporal dimensions of and dependencies between temporal policies and WS-Policies. Switching of protocol dependencies, pricing of services, quality of service, and security are example areas for using a temporal policy.
To describe protocol dependencies of a service for streaming, real-time and file transfers, a communication policy can be utilized. The communication policy is a concrete WS-Policy. With the communication policy, a service can expose the protocols it depends on for a communication after its invocation. Thus, a web service client knows the protocols required to support a communication with the service. Therefore, it is possible to evaluate beforehand whether an invocation of a service is reasonable. On top of the newly introduced policies, novel mechanisms and tools are provided to alleviate service use and enable flexible and efficient data handling. Furthermore, the involvement of the end user in the development process can be achieved more easily.
The Flex-SwA architecture, the first component in this thesis based on the newly introduced policies, implements the actual file transfers and streaming protocols that are described as dependencies in a communication policy. Several communication patterns support the flexible handling of the communication. A reference concept enables seamless message forwarding with reduced data movement.
Based on the Flex-SwA implementation and the communication policy, it is possible to improve usability - especially in the area of service-oriented Grids - by integrating data transfers into an automatically generated web and Grid service client. The Web and Grid Service Browser is introduced in this thesis as such a generic client. It provides a familiar environment for using services by offering the client generation as part of the browser. Data transfers are directly integrated into service invocation without having to perform data transmissions explicitly. For multimedia MIME types, special plugins allow the consumption of multimedia data.
To enable an end user to build applications that also leverage high performance computing resources, the Service-enabled Mashup Editor is presented that lets the user combine popular web applications with web and Grid services. Again, the communication policy provides descriptive means for file transfers and Flex-SwAs reference concept is used for data exchange.
To show the applicability of these novel concepts, several use cases from the area of multimedia processing have been selected. Based on the temporal policy, the communication policy, Flex-SwA, the Web and Grid Service Browser, and the Service-enabled Mashup Editor, the development of a scalable service-oriented multimedia architecture is presented. The multimedia SOA offers, among others, a face detection workflow, a video-on-demand service, and an audio resynthesis service.
More precisely, a video-on-demand service describes its dependency on a multicast protocol by using a communication policy. A temporal policy is then used to perform the description of a protocol switch from one multicast protocol to another one by changing the communication policy at the end of its validity period. The Service-enabled Mashup Editor is used as a client for the new multicast protocol after the multicast protocol has been switched. To stream single frames from a frame decoder service to a face detection service (which are both part of the face detection workflow) and to transfer audio files with the different Flex-SwA communication patterns to an audio resynthesis service, Flex-SwA is used. The invocation of the face detection workflow and the audio resynthesis service is realized with the Web and Grid Service Browser
Optimisation of the enactment of fine-grained distributed data-intensive work flows
The emergence of data-intensive science as the fourth science paradigm has posed a
data deluge challenge for enacting scientific work-flows. The scientific community is
facing an imminent flood of data from the next generation of experiments and simulations,
besides dealing with the heterogeneity and complexity of data, applications and
execution environments. New scientific work-flows involve execution on distributed and
heterogeneous computing resources across organisational and geographical boundaries,
processing gigabytes of live data streams and petabytes of archived and simulation data,
in various formats and from multiple sources. Managing the enactment of such work-flows not only requires larger storage space and faster machines, but the capability to
support scalability and diversity of the users, applications, data, computing resources
and the enactment technologies.
We argue that the enactment process can be made efficient using optimisation techniques
in an appropriate architecture. This architecture should support the creation
of diversified applications and their enactment on diversified execution environments,
with a standard interface, i.e. a work-flow language. The work-flow language should
be both human readable and suitable for communication between the enactment environments.
The data-streaming model central to this architecture provides a scalable
approach to large-scale data exploitation. Data-flow between computational elements
in the scientific work-flow is implemented as streams. To cope with the exploratory
nature of scientific work-flows, the architecture should support fast work-flow prototyping,
and the re-use of work-flows and work-flow components. Above all, the enactment
process should be easily repeated and automated.
In this thesis, we present a candidate data-intensive architecture that includes an intermediate
work-flow language, named DISPEL. We create a new fine-grained measurement
framework to capture performance-related data during enactments, and design
a performance database to organise them systematically. We propose a new enactment
strategy to demonstrate that optimisation of data-streaming work-flows can be
automated by exploiting performance data gathered during previous enactments
Active provenance for data intensive research
The role of provenance information in data-intensive research is a significant topic of
discussion among technical experts and scientists. Typical use cases addressing traceability,
versioning and reproducibility of the research findings are extended with more
interactive scenarios in support, for instance, of computational steering and results
management. In this thesis we investigate the impact that lineage records can have on
the early phases of the analysis, for instance performed through near-real-time systems
and Virtual Research Environments (VREs) tailored to the requirements of a specific
community. By positioning provenance at the centre of the computational research
cycle, we highlight the importance of having mechanisms at the data-scientistsâ side
that, by integrating with the abstractions offered by the processing technologies, such
as scientific workflows and data-intensive tools, facilitate the expertsâ contribution to
the lineage at runtime. Ultimately, by encouraging tuning and use of provenance for
rapid feedback, the thesis aims at improving the synergy between different user groups
to increase productivity and understanding of their processes.
We present a model of provenance, called S-PROV, that uses and further extends
PROV and ProvONE. The relationships and properties characterising the workflowâs
abstractions and their concrete executions are re-elaborated to include aspects related
to delegation, distribution and steering of stateful streaming operators. The model is
supported by the Active framework for tuneable and actionable lineage ensuring the
userâs engagement by fostering rapid exploitation. Here, concepts such as provenance
types, configuration and explicit state management allow users to capture complex
provenance scenarios and activate selective controls based on domain and user-defined
metadata. We outline how the traces are recorded in a new comprehensive system,
called S-ProvFlow, enabling different classes of consumers to explore the provenance
data with services and tools for monitoring, in-depth validation and comprehensive
visual-analytics. The work of this thesis will be discussed in the context of an existing
computational framework and the experience matured in implementing provenance-aware
tools for seismology and climate VREs. It will continue to evolve through
newly funded projects, thereby providing generic and user-centred solutions for data-intensive
research
A Model for Scientific Workflows with Parallel and Distributed Computing
In the last decade we witnessed an immense evolution of the computing infrastructures
in terms of processing, storage and communication. On one hand, developments in hardware architectures have made it possible to run multiple virtual machines on a single physical machine. On the other hand, the increase of the available network communication bandwidth has enabled the widespread use of distributed computing infrastructures, for example based on clusters, grids and clouds. The above factors enabled different scientific communities to aim for the development and implementation of complex scientific applications possibly involving large amounts of data. However, due to their structural complexity, these applications require decomposition models to allow multiple tasks running in parallel and distributed environments.
The scientific workflow concept arises naturally as a way to model applications composed of multiple activities. In fact, in the past decades many initiatives have been
undertaken to model application development using the workflow paradigm, both in
the business and in scientific domains. However, despite such intensive efforts, current
scientific workflow systems and tools still have limitations, which pose difficulties to the
development of emerging large-scale, distributed and dynamic applications.
This dissertation proposes the AWARD model for scientific workflows with parallel
and distributed computing. AWARD is an acronym for Autonomic Workflow Activities
Reconfigurable and Dynamic.
The AWARD model has the following main characteristics.
It is based on a decentralized execution control model where multiple autonomic
workflow activities interact by exchanging tokens through input and output ports. The
activities can be executed separately in diverse computing environments, such as in a
single computer or on multiple virtual machines running on distributed infrastructures,
such as clusters and clouds.
It provides basic workflow patterns for parallel and distributed application decomposition and other useful patterns supporting feedback loops and load balancing. The model is suitable to express applications based on a finite or infinite number of iterations, thus allowing to model long-running workflows, which are typical in scientific experimention. A distintive contribution of the AWARD model is the support for dynamic reconfiguration
of long-running workflows. A dynamic reconfiguration allows to modify the
structure of the workflow, for example, to introduce new activities, modify the connections
between activity input and output ports. The activity behavior can also be modified,
for example, by dynamically replacing the activity algorithm.
In addition to the proposal of a new workflow model, this dissertation presents the
implementation of a fully functional software architecture that supports the AWARD
model. The implemented prototype was used to validate and refine the model across
multiple workflow scenarios whose usefulness has been demonstrated in practice clearly, through experimental results, demonstrating the advantages of the major characteristics and contributions of the AWARD model. The implemented prototype was also used to develop application cases, such as a workflow to support the implementation of the MapReduce model and a workflow to support a text mining application developed by an external user.
The extensive experimental work confirmed the adequacy of the AWARD model and
its implementation for developing applications that exploit parallelism and distribution
using the scientific workflows paradigm
Technologies and Applications for Big Data Value
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part âTechnologies and Methodsâ contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part âProcesses and Applicationsâ details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems
Technologies and Applications for Big Data Value
This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part âTechnologies and Methodsâ contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part âProcesses and Applicationsâ details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems
Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System
Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra.
This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEGâs optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes.
We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists
Querying heterogeneous data in an in-situ unified agile system
Data integration provides a unified view of data by combining different data sources. In todayâs multi-disciplinary and collaborative research environments, data is often produced and consumed by various means, multiple researchers operate on the data in different divisions to satisfy various research requirements, and using different query processors and analysis tools. This makes data integration a crucial component of any successful data intensive research activity. The fundamental difficulty is that data is heterogeneous not only in syntax, structure, and semantics, but also in the way it is accessed and queried. We introduce QUIS (QUery In-Situ), an agile query system equipped with a unified query language and a federated execution engine. It is capable of running queries on heterogeneous data sources in an in-situ manner. Its language provides advanced features such as virtual schemas, heterogeneous joins, and polymorphic result set representation. QUIS utilizes a federation of agents to transform a given input query written in its language to a (set of) computation models that are executable on the designated data sources. Federative query virtualization has the disadvantage that some aspects of a query may not be supported by the designated data sources. QUIS ensures that input queries are always fully satisfied. Therefore, if the target data sources do not fulfill all of the query requirements, QUIS detects the features that are lacking and complements them in a transparent manner. QUIS provides union and join capabilities over an unbound list of heterogeneous data sources; in addition, it offers solutions for heterogeneous query planning and optimization. In brief, QUIS is intended to mitigate data access heterogeneity through query virtualization, on-the-fly transformation, and federated execution. It offers in-Situ querying, agile querying, heterogeneous data source querying, unifeied execution, late-bound virtual schemas, and Remote execution