Search CORE

15,805 research outputs found

Distributed data mining in grid computing environments

Author: Cannataro
Cannataro
Fernandez-Baca
Kevin Lü
Kwok
Ping Luo
Qing He
Zhongzhi Shi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

The official published version of this article can be found at the link below.The computing-intensive data mining for inherently Internet-wide distributed data, referred to as Distributed Data Mining (DDM), calls for the support of a powerful Grid with an effective scheduling framework. DDM often shares the computing paradigm of local processing and global synthesizing. It involves every phase of Data Mining (DM) processes, which makes the workflow of DDM very complex and can be modelled only by a Directed Acyclic Graph (DAG) with multiple data entries. Motivated by the need for a practical solution of the Grid scheduling problem for the DDM workflow, this paper proposes a novel two-phase scheduling framework, including External Scheduling and Internal Scheduling, on a two-level Grid architecture (InterGrid, IntraGrid). Currently a DM IntraGrid, named DMGCE (Data Mining Grid Computing Environment), has been developed with a dynamic scheduling framework for competitive DAGs in a heterogeneous computing environment. This system is implemented in an established Multi-Agent System (MAS) environment, in which the reuse of existing DM algorithms is achieved by encapsulating them into agents. Practical classification problems from oil well logging analysis are used to measure the system performance. The detailed experiment procedure and result analysis are also discussed in this paper

Crossref

Brunel University Research Archive

Grid Data Management in Action: Experience in Running and Supporting Data Management Services in the EU DataGrid Project

Author: Andronico Giuseppe
Donno Flavia
Kunszt Peter
Laure Erwin
Millar Paul
Muzaffar Shahzad
Stockinger Heinz
Publication venue
Publication date: 02/06/2003
Field of study

In the first phase of the EU DataGrid (EDG) project, a Data Management System has been implemented and provided for deployment. The components of the current EDG Testbed are: a prototype of a Replica Manager Service built around the basic services provided by Globus, a centralised Replica Catalogue to store information about physical locations of files, and the Grid Data Mirroring Package (GDMP) that is widely used in various HEP collaborations in Europe and the US for data mirroring. During this year these services have been refined and made more robust so that they are fit to be used in a pre-production environment. Application users have been using this first release of the Data Management Services for more than a year. In the paper we present the components and their interaction, our implementation and experience as well as the feedback received from our user communities. We have resolved not only issues regarding integration with other EDG service components but also many of the interoperability issues with components of our partner projects in Europe and the U.S. The paper concludes with the basic lessons learned during this operation. These conclusions provide the motivation for the architecture of the next generation of Data Management Services that will be deployed in EDG during 2003.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 9 pages, LaTeX, PSN: TUAT007 all figures are in the directory "figures

arXiv.org e-Print Archive

CERN Document Server

A reflective middleware approach to the provision of grid middleware

Author: Blair Gordon S.
Cai W.
Coulson G.
Parlavantzas N.
Yeung W. K.
Publication venue
Publication date: 01/06/2003
Field of study

Lancaster E-Prints

EGI user forum 2011 : book of abstracts

Author
Publication venue
Publication date: 01/01/2011
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

A batch scheduler with high level components

Author: Capit Nicolas
Da Costa Georges
Georgiou Yiannis
Huard Guillaume
Martin Cyrille
Mounié Grégory
Neyron Pierre
Richard Olivier
Publication venue
Publication date: 01/01/2005
Field of study

In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emphasizes on low software complexity by using high level tools. The global architecture is built upon the scripting language Perl and the relational database engine Mysql. The goal of the project OAR is to prove that it is possible today to build a complex system for ressource management using such tools without sacrificing efficiency and scalability. Currently, our system offers most of the important features implemented by other batch schedulers such as priority scheduling (by queues), reservations, backfilling and some global computing support. Despite the use of high level tools, our experiments show that our system has performances close to other systems. Furthermore, OAR is currently exploited for the management of 700 nodes (a metropolitan GRID) and has shown good efficiency and robustness

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server