Search CORE

4,479 research outputs found

Data-Intensive architecture for scientific knowledge discovery

Author: Atkinson Malcolm
Corcho Oscar
Galea Michelle
Krause Amrey
Liew Chee Sun
Martin Paul
Mouat Adrian
Snelling D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This paper presents a data-intensive architecture that demonstrates the ability to support applications from a wide range of application domains, and support the different types of users involved in defining, designing and executing data-intensive processing tasks. The prototype architecture is introduced, and the pivotal role of DISPEL as a canonical language is explained. The architecture promotes the exploration and exploitation of distributed and heterogeneous data and spans the complete knowledge discovery process, from data preparation, to analysis, to evaluation and reiteration. The architecture evaluation included large-scale applications from astronomy, cosmology, hydrology, functional genetics, imaging processing and seismology

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Towards optimising distributed data streaming graphs using parallel streams

Author: C. S. Liew
J. I. Van Hemert
L. Han
M. P. Atkinson
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Modern scientific collaborations have opened up the op-portunity of solving complex problems that involve multi-disciplinary expertise and large-scale computational experi-ments. These experiments usually involve large amounts of data that are located in distributed data repositories running various software systems, and managed by different organi-sations. A common strategy to make the experiments more manageable is executing the processing steps as a work-flow. In this paper, we look into the implementation of fine-grained data-flow between computational elements in a scientific workflow as streams. We model the distributed computation as a directed acyclic graph where the nodes rep-resent the processing elements that incrementally implement specific subtasks. The processing elements are connected in a pipelined streaming manner, which allows task executions to overlap. We further optimise the execution by splitting pipelines across processes and by introducing extra parallel streams. We identify performance metrics and design a mea-surement tool to evaluate each enactment. We conducted ex-periments to evaluate our optimisation strategies with a real world problem in the Life Sciences—EURExpress-II. The paper presents our distributed data-handling model, the op-timisation and instrumentation strategies and the evaluation experiments. We demonstrate linear speed up and argue that this use of data-streaming to enable both overlapped pipeline and parallelised enactment is a generally applicable optimisation strategy

CiteSeerX

Crossref

Effective Computation Resilience in High Performance and Distributed Environments

Author: Astaloš Ján
Hluchý Ladislav
Nguyen Binh Minh
Nguyen Giang
Tran Viet
Šipková Viera
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 10/02/2017
Field of study

The work described in this paper aims at effective computation resilience for complex simulations in high performance and distributed environments. Computation resilience is a complicated and delicate area; it deals with many types of simulation cores, many types of data on various input levels and also with many types of end-users, which have different requirements and expectations. Predictions about system and computation behaviors must be done based on deep knowledge about underlying infrastructures, and simulations' mathematical and realization backgrounds. Our conceptual framework is intended to allow independent collaborations between domain experts as end-users and providers of the computational power by taking on all of the deployment troubles arising within a given computing environment. The goal of our work is to provide a generalized approach for effective scalable usage of the computing power and to help domain-experts, so that they could concentrate more intensive on their domain solutions without the need of investing efforts in learning and adapting to the new IT backbone technologies

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

USING ADVANCED DATA MINING AND INTEGRATION IN ENVIRONMENTAL PREDICTION SCENARIOS

Author: Hluchy Ladislav
Krammer Peter
Ondrej Habala
Seleng Martin
Tran Viet
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2012
Field of study

We present one of the meteorological and hydrological experiments performed in the FP7 project ADMIRE. It serves as an experimental platform for hydrologists, and we have used it also as a testing platform for a suite of advanced data integration and data mining (DMI) tools, developed within ADMIRE. The idea of ADMIRE is to develop an advanced DMI platform accessible even to users who are not familiar with data mining techniques. To this end, we have designed a novel DMI architecture, supported by a set of software tools, managed by DMI process descriptions written in a specialized high-level DMI language called DISPEL, and controlled via several different user interfaces, each performing a different set of tasks and targeting different user group

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Directory of Open Access Journals

Advanced Data Mining and Integration Research for Europe

Author: Atkinson Malcolm
Brezany Peter
Corcho Oscar
Han Liangxiu
Hluchy Ladislav
Hume Ally
Janciak Ivan
Krause Amrey
Snelling David
van Hemert Jano
Wohrer Alex
Publication venue
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer