210 research outputs found
High Energy Physics Forum for Computational Excellence: Working Group Reports (I. Applications Software II. Software Libraries and Tools III. Systems)
Computing plays an essential role in all aspects of high energy physics. As
computational technology evolves rapidly in new directions, and data throughput
and volume continue to follow a steep trend-line, it is important for the HEP
community to develop an effective response to a series of expected challenges.
In order to help shape the desired response, the HEP Forum for Computational
Excellence (HEP-FCE) initiated a roadmap planning activity with two key
overlapping drivers -- 1) software effectiveness, and 2) infrastructure and
expertise advancement. The HEP-FCE formed three working groups, 1) Applications
Software, 2) Software Libraries and Tools, and 3) Systems (including systems
software), to provide an overview of the current status of HEP computing and to
present findings and opportunities for the desired HEP computational roadmap.
The final versions of the reports are combined in this document, and are
presented along with introductory material.Comment: 72 page
Intelligent business processes composition based on mas, semantic and cloud integration (IPCASCI)
[EN]Component reuse is one of the techniques that most clearly contributes to the
evolution of the software industry by providing efficient mechanisms to create quality
software. Reuse increases both software reliability, due to the fact that it uses
previously tested software components, and development productivity, and leads to a
clear reduction in cost.
Web services have become are an standard for application development on cloud
computing environments and are essential in business process development. These
services facilitate a software construction that is relatively fast and efficient, two
aspects which can be improved by defining suitable models of reuse. This research
work is intended to define a model which contains the construction requirements of
new services from service composition. To this end, the composition is based on
tested Web services and artificial intelligent tools at our disposal.
It is believed that a multi-agent architecture based on virtual organizations is a
suitable tool to facilitate the construction of cloud computing environments for
business processes from other existing environments, and with help from ontological
models as well as tools providing the standard BPEL (Business Process Execution
Language). In the context of this proposal, we must generate a new business process
from the available services in the platform, starting with the requirement
specifications that the process should meet. These specifications will be composed of a
semi-free description of requirements to describe the new service.
The virtual organizations based on a multi-agent system will manage the tasks
requiring intelligent behaviour. This system will analyse the input (textual description
of the proposal) in order to deconstruct it into computable functionalities, which will
be subsequently treated. Web services (or business processes) stored to be reused
have been created from the perspective of SOA architectures and associated with an
ontological component, which allows the multi-agent system (based on virtual
organizations) to identify the services to complete the reuse process.
The proposed model develops a service composition by applying a standard BPEL
once the services that will compose the solution business process have been
identified. This standard allows us to compose Web services in an easy way and
provides the advantage of a direct mapping from Business Process Management
Notation diagrams
Recommended from our members
What is the contribution of personal information management systems (PIMS) to the Working Model and personal work system of knowledge workers?
The thesis reports research into a phenomenon which it calls the personal working model of an individual knowledge worker.
The principal conjecture addressed in this thesis is that each of us has a personal working model which is supported by a personal work system enabled by a personal information management system. For some people, these are well defined; for most they are not even explicit. By means of structured self-reflection aided by conceptual knowledge modelling within the context of a process of action learning they can be improved. That personal working model is predicted by Ashby's law of requisite variety and by the good regulator theorem of Conant and Ashby. The latter theorem states that the only good regulator of a system is a model of that system.
The thesis and the work it reports result from a systemic approach to identifying the personal information management system and personal work system which together contribute to the personal working model. Starting with abductive conjecture, the author has sought to understand what models are and to explore ways in which those models can themselves be expressed. The thesis shows how a new approach to the conceptual modelling of aspects of the personal knowledge of knowledge worker was designed, built and then used. Similarly, the actual data used by a knowledge worker had to be stored, and for this purpose a personal information management system was also designed. Both these artefacts are evaluated in accordance with principles drawn from the literature of design science research. The research methodology adopted in the first phase of the research now ending also included a relatively novel approach in which the PhD student attempted to observe himself over the last five years of his PhD research – this approach is sometimes called autoethnography. This autoethnographic element is one of a number of methods used within an overall framework grounded by the philosophical approach called critical realism.
The work reported in the thesis is initial exploratory research which, it is planned, will continue in empirical action research involving mentored action learning undertaken by professional knowledge workers
Storage and aggregation for fast analytics systems
Computing in the last decade has been characterized by the rise of data- intensive scalable computing (DISC) systems. In particular, recent years have wit- nessed a rapid growth in the popularity of fast analytics systems. These systems exemplify a trend where queries that previously involved batch-processing (e.g., run- ning a MapReduce job) on a massive amount of data, are increasingly expected to be answered in near real-time with low latency. This dissertation addresses the problem that existing designs for various components used in the software stack for DISC sys- tems do not meet the requirements demanded by fast analytics applications. In this work, we focus specifically on two components:
1. Key-value storage: Recent work has focused primarily on supporting reads with high throughput and low latency. However, fast analytics applications require that new data entering the system (e.g., new web-pages crawled, currently trend- ing topics) be quickly made available to queries and analysis codes. This means that along with supporting reads efficiently, these systems must also support writes with high throughput, which current systems fail to do. In the first part of this work, we solve this problem by proposing a new key-value storage system – called the WriteBuffer (WB) Tree – that provides up to 30× higher write per- formance and similar read performance compared to current high-performance systems.
2. GroupBy-Aggregate: Fast analytics systems require support for fast, incre- mental aggregation of data for with low-latency access to results. Existing techniques are memory-inefficient and do not support incremental aggregation efficiently when aggregate data overflows to disk. In the second part of this dis- sertation, we propose a new data structure called the Compressed Buffer Tree (CBT) to implement memory-efficient in-memory aggregation. We also show how the WB Tree can be modified to support efficient disk-based aggregation.Ph.D
Optimisation of the enactment of fine-grained distributed data-intensive work flows
The emergence of data-intensive science as the fourth science paradigm has posed a
data deluge challenge for enacting scientific work-flows. The scientific community is
facing an imminent flood of data from the next generation of experiments and simulations,
besides dealing with the heterogeneity and complexity of data, applications and
execution environments. New scientific work-flows involve execution on distributed and
heterogeneous computing resources across organisational and geographical boundaries,
processing gigabytes of live data streams and petabytes of archived and simulation data,
in various formats and from multiple sources. Managing the enactment of such work-flows not only requires larger storage space and faster machines, but the capability to
support scalability and diversity of the users, applications, data, computing resources
and the enactment technologies.
We argue that the enactment process can be made efficient using optimisation techniques
in an appropriate architecture. This architecture should support the creation
of diversified applications and their enactment on diversified execution environments,
with a standard interface, i.e. a work-flow language. The work-flow language should
be both human readable and suitable for communication between the enactment environments.
The data-streaming model central to this architecture provides a scalable
approach to large-scale data exploitation. Data-flow between computational elements
in the scientific work-flow is implemented as streams. To cope with the exploratory
nature of scientific work-flows, the architecture should support fast work-flow prototyping,
and the re-use of work-flows and work-flow components. Above all, the enactment
process should be easily repeated and automated.
In this thesis, we present a candidate data-intensive architecture that includes an intermediate
work-flow language, named DISPEL. We create a new fine-grained measurement
framework to capture performance-related data during enactments, and design
a performance database to organise them systematically. We propose a new enactment
strategy to demonstrate that optimisation of data-streaming work-flows can be
automated by exploiting performance data gathered during previous enactments
Compilation and Code Optimization for Data Analytics
The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance.
The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation.
As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive.
The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code
Compilation and Code Optimization for Data Analytics
The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance.
The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation.
As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive.
The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code
- …