Search CORE

46 research outputs found

Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid

Author: Desprez Frédéric
Vernois Antoine
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

One of the first motivations of using grids comes from applications managing large data sets like for example in High Energy Physic or Life Sciences. To improve the global throughput of software environments, replicas are usually put at wisely selected sites. Moreover, computation requests have to be scheduled among the available resources. To get the best performance, scheduling and data replication have to be tightly coupled which is not always the case in existing approaches. This paper presents an algorithm that combines data management and scheduling at the same time using a steady-state approach. Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS).L'une des principales motivations pour utiliser les grilles de calcul vient des applications utilisant de larges ensembles de données comme, par exemple, en Physique des Hautes Energies ou en Science de la Vie. Pour améliorer le rendement global des environnements logiciels utilisées pour porter ces applications sur les grilles, des réplicats des données sont déposées sur différents sites sélectionnés. De plus es requêtes de calcul doivent être ordonnancées en tenant compte des ressources disponibles. Pour obtenir de meilleures performances, l'ordonnancement des requêtes et la réplication des données doivent être étroitement couplés ce qui n'est généralement pas le cas dans les approches existantes. Cet article présente un algorithme qui combine la gestion des données et l'ordonnancement en utilisant une approche en régime permanent. Nos résultats théoriques sont validés par simulations et par l'utilisation des traces d'un serveur de calcul d'application de Sciences de la Vie(ACIGRIDGRIPPS)

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Recommended from our members

Hadoop performance modeling and job optimization for big data analytics

Author: Khan Mukhtaj
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonBig data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is that the cloud service providers do not a have resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user responsibility to estimate the require amount of resources for their job running in a public cloud. This thesis presents a Hadoop performance model that accurately estimates the execution duration of a job and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model employs Locally Weighted Linear Regression (LWLR) model to estimate execution time of a job and Lagrange Multiplier technique for resource provisioning to satisfy user job with a given deadline. The performance of the propose model is extensively evaluated in both in-house Hadoop cluster and Amazon EC2 Cloud. Experimental results show that the proposed model is highly accurate in job execution estimation and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model. In addition, the Hadoop framework has over 190 configuration parameters and some of them have significant effects on the performance of a Hadoop job. Manually setting the optimum values for these parameters is a challenging task and also a time consuming process. This thesis presents optimization works that enhances the performance of Hadoop by automatically tuning its parameter values. It employs Gene Expression Programming (GEP) technique to build an objective function that represents the performance of a job and the correlation among the configuration parameters. For the purpose of optimization, Particle Swarm Optimization (PSO) is employed to find automatically an optimal or a near optimal configuration settings. The performance of the proposed work is intensively evaluated on a Hadoop cluster and the experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings.Abdul Wali Khan University Marda

Brunel University Research Archive

Recommended from our members

Project Report on DOE Young Investigator Grant (Contract No. DE-FG02-02ER25525) Dynamic Scheduling and Fusion of Irregular Computation (August 15, 2002 to August 14, 2005)

Author: Ding Chen
Publication venue: 'University of Rochester Press'
Publication date: 16/08/2005
Field of study

Computer simulation has become increasingly important in many scientiï¬c disciplines, but its performance and scalability are severely limited by the memory throughput on todayâs computer systems. With the support of this grant, we ï¬rst designed training-based prediction, which accurately predicts the memory performance of large applications before their execution. Then we developed optimization techniques using dynamic computation fusion and large-scale data transformation. The research work has three major components. The ï¬rst is modeling and prediction of cache behav- ior. We have developed a new technique, which uses reuse distance information from training inputs then extracts a parameterized model of the programâs cache miss rates for any input size and for any size of fully associative cache. Using the model we have built a web-based tool using three dimensional visualization. The new model can help to build cost-effective computer systems, design better benchmark suites, and improve task scheduling on heterogeneous systems. The second component is global computation for improving cache performance. We have developed an algorithm for dynamic data partitioning using sampling theory and probability distribution. Recent work from a number of groups show that manual or semi-manual computation fusion has signiï¬cant beneï¬ts in physical, mechanical, and biological simulations as well as information retrieval and machine veriï¬cation. We have developed an au- tomatic tool that measures the potential of computation fusion. The new system can be used by high-performance application programmers to estimate the potential of locality improvement for a program before trying complex transformations for a speciï¬c cache system. The last component studies models of spatial locality and the problem of data layout. In scientiï¬c programs, most data are stored in arrays. Grand challenge problems such as hydrodynamics simulation and data mining may use an enormous number of data elements. To optimize the layout across multiple arrays, we have developed a formal model called reference afï¬nity. We collaborated with the IBM production compiler group and designed an efï¬cient compiler analysis that performs as well as data or code proï¬ling does. Based on these results, the IBM group has ï¬led a patent and is including this technique in their product compiler. A major part of the project is the development of software tools. We have developed web-based visu- alization for program locality. In addition, we have implemented a prototype of array regrouping in the IBM compiler. The full implementation is expected to come out of IBM in the near future and to beneï¬t scientiï¬c applications running on IBM supercomputers. We have also developed a test environment for studying the limit of computation fusion. Finally, our work has directly inï¬uenced the design of the Intel Itanium compiler. The project has strengthened the research relation between the PIâs group and groups in DoE labs. The PI was an invited speaker at the Center for Applied Scientiï¬c Computing Seminar Series at the early stage of the project. The question that the most audience was curious about was the limit of computation fusion, which has been studied in depth in this research. In addition, the seminar directly helped a group at Lawrence Livermore to achieve four times speedup on an important DoE code. The PI helped to organize a number of high-performance computing forums, including the founding of a workshop on memory system performance (MSP). In the past two years, one fourth of the papers in the workshop came from researchers in Lawrence Livermore, Argonne, Las Alamos, and Lawrence Berkeley national laboratories. The PI lectured frequently on DoE funded research. In a broader context, high performance computing is central to Americaâs scientiï¬c and economic stature in the world, and addresses many of the most scientiï¬cally and socially important problems of our day. This research has improved the programming support for a variety of computational paradigms, including dynamic mesh, hydrodynamics, molecular dynamics, multi-grid methods, matrix algebra, and sequential and parallel sorting. In the process, the PIâs group has developed and strengthened relationships with DoE laboratories and major hardware and software vendors

UNT Digital Library

Resilient networking in wireless sensor networks

Author: Erdene-Ochir Ochirkhand
Kountouris Apostolos
Minier Marine
Valois Fabrice
Publication venue
Publication date: 15/03/2010
Field of study

This report deals with security in wireless sensor networks (WSNs), especially in network layer. Multiple secure routing protocols have been proposed in the literature. However, they often use the cryptography to secure routing functionalities. The cryptography alone is not enough to defend against multiple attacks due to the node compromise. Therefore, we need more algorithmic solutions. In this report, we focus on the behavior of routing protocols to determine which properties make them more resilient to attacks. Our aim is to find some answers to the following questions. Are there any existing protocols, not designed initially for security, but which already contain some inherently resilient properties against attacks under which some portion of the network nodes is compromised? If yes, which specific behaviors are making these protocols more resilient? We propose in this report an overview of security strategies for WSNs in general, including existing attacks and defensive measures. In this report we focus at the network layer in particular, and an analysis of the behavior of four particular routing protocols is provided to determine their inherent resiliency to insider attacks. The protocols considered are: Dynamic Source Routing (DSR), Gradient-Based Routing (GBR), Greedy Forwarding (GF) and Random Walk Routing (RWR)

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Metascheduling of HPC Jobs in Day-Ahead Electricity Markets

Author: Murali Prakash
Vadhiyar Sathish
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/12/2017
Field of study

High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. In this paper, we present a metascheduling algorithm to optimize the placement of jobs in a compute grid which consumes electricity from the day-ahead wholesale market. We formulate the scheduling problem as a Minimum Cost Maximum Flow problem and leverage queue waiting time and electricity price predictions to accurately estimate the cost of job execution at a system. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System

arXiv.org e-Print Archive

Faculty Publications and Creative Works 2004

Author: Office of the Vice President for Research
Publication venue: UNM Digital Repository
Publication date: 01/01/2004
Field of study

Faculty Publications & Creative Works is an annual compendium of scholarly and creative activities of University of New Mexico faculty during the noted calendar year. Published by the Office of the Vice President for Research and Economic Development, it serves to illustrate the robust and active intellectual pursuits conducted by the faculty in support of teaching and research at UNM

AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

Author: Anyakoha Chukwudi
Bauerdick H.
Gottfried B.
Mintram Robert
Muthuraman S.
Phalp Keith T.
Vincent Jonathan
Publication venue: 'Indiana University Press (Project Muse)'
Publication date: 15/07/2006
Field of study

This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

Bournemouth University Research Online

Parallel scientific computing with message-passing toolboxes

Author: Anguita Mancia
Fernández Francisco J.
Publication venue: 'Corporation Universidad de la Costa, CUC'
Publication date: 31/10/2012
Field of study

Los usuarios de Entornos de Computación Científica (SCE, por sus siglas en inglés) siempre requieren mayor potencia de cálculo para sus aplicaciones. Utilizando las herramientas propuestas, los usuarios de las conocidas plataformas Matlab® y Octave, en un cluster de computadores, pueden paralelizar sus aplicaciones interpretadas utilizando paso de mensajes, como el proporcionado por PVM (Parallel Virtual Machine) o MPI (Message Passing Interface). Para muchas aplicaciones SCE es posible encontrar un esquema de paralelización con ganancia en velocidad casi lineal. Estas herramientas son interfaces prácticamente exhaustivas a las correspondientes librerías, soportan todos los tipos de datos compatibles en el SCE base y se han diseñado teniendo en cuenta el rendimiento y la facilidad de mantenimiento. En este artículo se resumen trabajos anteriores, su repercusión, y algunos resultados obtenidos por usuarios finales. Con base en la herramienta más reciente, la Toolbox MPI para Octave, se describen brevemente sus características principales, y se presenta un estudio de caso, el conjunto de Mandelbrotusers of Scientific Computing Environments (SCE) always demand more computing power for their CPu-intensive SCE applications. using the proposed toolboxes, users of the well-known Matlab® and Octave platforms in a computer cluster can parallelize their interpreted applications using the native multi-computer programming paradigm of message-passing, such as that provided by PVM (Parallel Virtual Machine) and MPI (Message Passing Inter-face). For many SCE applications, a parallelization scheme can be found so that the resulting speedup is nearly linear on the number of computers used. The toolboxes are almost compre-hensive interfaces to the corresponding libraries, they support all the compatible data types in the base SCE and they have been designed with performance and maintainability in mind. In this paper, we summarize our previous work, its repercussion, and some results obtained by end-users. Focusing on our most recent MPI Toolbox for Octave, we briefly describe its main features, and introduce a case study: the Mandelbrot se

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Digital CUC