13 research outputs found

    Choosing between remote I/O versus staging in distributed environments

    Get PDF
    Today, scientifi_x000C_c applications and experiments have become increasingly complex and more demanding in terms of their computational and data requirements. The amount of data generated and used has grown at a very rapid rate. As tens or hundreds of terabytes of data for a single application is very common today; petabytes and even exabytes of data will be very common in a few years. One of the major challenges in distributed computing environments is how to access these large datasets remotely over the network. Data staging and remote I/O are the most widely used data access methods for distributed applications. Application developers generally chose one over the other intuitively without making any scienti_x000C_fic comparison specifi_x000C_c to their applications since there is no generic model available that they can use. In this thesis, we develop generic models and set guidelines for the application developers which would help them to choose the most appropriate data access method for their application. We de_x000C_fine the parameters that potentially aff_x000B_ect the end-to-end performance of the distributed applications which need to access remote data. To achieve our goal, we implement a series of synthetic benchmark applications to simulate di_x000B_fferent data access patterns. We run these benchmark applications on diff_x000B_erent distributed computing settings with di_x000B_fferent parameters, such as network bandwidth, server and client capabilities, and data access ratio. We also use di_x000B_fferent remote I/O protocols to show the importance of the protocol in making a decision. We use regression analysis to develop applicable generic models for comparing diff_x000B_erent data access methods, and test our models in a real life application. The main contribution of our thesis is generic models that can be applied to most data-intensive distributed applications to decide the best data access technique for those applications. Our models provide the scientists and application developers an opportunity to choose the best data access method before actually running the application

    A survey of general-purpose experiment management tools for distributed systems

    Get PDF
    International audienceIn the field of large-scale distributed systems, experimentation is particularly difficult. The studied systems are complex, often nondeterministic and unreliable, software is plagued with bugs, whereas the experiment workflows are unclear and hard to reproduce. These obstacles led many independent researchers to design tools to control their experiments, boost productivity and improve quality of scientific results. Despite much research in the domain of distributed systems experiment management, the current fragmentation of efforts asks for a general analysis. We therefore propose to build a framework to uncover missing functionality of these tools, enable meaningful comparisons be-tween them and find recommendations for future improvements and research. The contribution in this paper is twofold. First, we provide an extensive list of features offered by general-purpose experiment management tools dedicated to distributed systems research on real platforms. We then use it to assess existing solutions and compare them, outlining possible future paths for improvements

    Leveraging business workflows in distributed systems research for the orchestration of reproducible and scalable experiments

    Get PDF
    National audienceWhile rapid research on distributed systems is observed, experiments in this field are often difficult to design, describe, conduct and reproduce. By overcoming these difficulties the research could be further stimulated and add more credibility to results in distributed systems research. The key factors responsible for this situation are technical (software bugs and hardware errors), methodological (incorrect practices), as well as social (reluctance to share work). In this paper, the existing approaches for the management of experiments on distributed systems are described and a novel approach using business process management (BPM) is presented to address their shortcomings. Then, the questions arising when such approach is taken, are addressed. We show that it can be a better alternative to the traditional way of performing experiments as it encourages better scientific practices and results in more valuable research and publications. Finally, a plan of our future work is outlined and other applications of this work are discussed.Malgré une activité de recherche sur les systèmes distribués très importante et très active, les expériences dans ce domaine sont souvent difficiles à concevoir, décrire, mener et reproduire. Surmonter ces difficultés pourrait permettre à ce domaine d'être encore plus stimulé, et aux résultats de gagner en crédibilité, à la fois dans le domaine des systèmes distribués. Les facteurs principaux responsables de cette situation sont techniques (bugs logiciels, problèmes matériels), méthodologiques (mauvaises pratiques), et sociaux (réticence à partager son travail). Dans cet article, les approches existantes pour la description et la conduite d'expériences sur les systèmes distribués sont décrites, et une nouvelle approche, utilisant le \textsl{Business Process Management (BPM)}, est présentée pour répondre à leurs limitations. Puis diverses questions se posant lors de l'utilisation d'une telle approche sont discutées. Nous montrons que cette approche peut être une meilleure alternative à la manière traditionnelle de conduire des expériences, qui encourage de meilleures pratiques scientifiques, et qui résulte en une recherche et des publications de meilleure qualité. Pour finir, notre plan de travail est décrit, et des applications possibles de ce travail dans d'autres domaines sont décrites

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

    Automatic Algorithm Selection for Complex Simulation Problems

    Get PDF
    To select the most suitable simulation algorithm for a given task is often difficult. This is due to intricate interactions between model features, implementation details, and runtime environment, which may strongly affect the overall performance. The thesis consists of three parts. The first part surveys existing approaches to solve the algorithm selection problem and discusses techniques to analyze simulation algorithm performance.The second part introduces a software framework for automatic simulation algorithm selection, which is evaluated in the third part.Die Auswahl des passendsten Simulationsalgorithmus für eine bestimmte Aufgabe ist oftmals schwierig. Dies liegt an der komplexen Interaktion zwischen Modelleigenschaften, Implementierungsdetails und Laufzeitumgebung. Die Arbeit ist in drei Teile gegliedert. Der erste Teil befasst sich eingehend mit Vorarbeiten zur automatischen Algorithmenauswahl, sowie mit der Leistungsanalyse von Simulationsalgorithmen. Der zweite Teil der Arbeit stellt ein Rahmenwerk zur automatischen Auswahl von Simulationsalgorithmen vor, welches dann im dritten Teil evaluiert wird

    Workload characterization, modeling, and prediction in grid Computing

    Get PDF
    Workloads play an important role in experimental performance studies of computer systems. This thesis presents a comprehensive characterization of real workloads on production clusters and Grids. A variety of correlation structures and rich scaling behavior are identified in workload attributes such as job arrivals and run times, including pseudo-periodicity, long range dependence, and strong temporal locality. Based on the analytic results workload models are developed to fit the real data. For job arrivals three different kinds of autocorrelations are investigated. For short to middle range dependent data, Markov modulated Poisson processes (MMPP) are good models because they can capture correlations between interarrival times while remaining analytically tractable. For long range dependent and multifractal processes, the multifractal wavelet model (MWM) is able to reconstruct the scaling behavior and it provides a coherent wavelet framework for analysis and synthesis. Pseudo-periodicity is a special kind of autocorrelation and it can be modeled by a matching pursuit approach. For workload attributes such as run time a new model is proposed that can fit not only the marginal distribution but also the second order statistics such as the autocorrelation function (ACF). The development of workload models enable the simulation studies of Grid scheduling strategies. By using the synthetic traces, the performance impacts of workload correlations in Grid scheduling is quantitatively evaluated. The results indicate that autocorrelations in workload attributes can cause performance degradation, in some situations the difference can be up to several orders of magnitude. The larger the autocorrelation, the worse the performance, it is proved both at the cluster and Grid level. This study shows the importance of realistic workload models in performance evaluation studies. Regarding performance predictions, this thesis treats the targeted resources as a ``black box'' and takes a statistical approach. It is shown that statistical learning based methods, after a well-thought and fine-tuned design, are able to deliver good accuracy and performance.UBL - phd migration 201

    Deriving Goal-oriented Performance Models by Systematic Experimentation

    Get PDF
    Performance modelling can require substantial effort when creating and maintaining performance models for software systems that are based on existing software. Therefore, this thesis addresses the challenge of performance prediction in such scenarios. It proposes a novel goal-oriented method for experimental, measurement-based performance modelling. We validated the approach in a number of case studies including standard industry benchmarks as well as a real development scenario at SAP

    Intelligent instrumentation techniques to improve the traces information-volume ratio

    Get PDF
    With ever more powerful machines being constantly deployed, it is crucial to manage the computational resources efficiently. This is important both from the point of view of the individual user, who expects fast results; and the supercomputing center hosting the whole infrastructure, that is interested in maximizing its overall productivity. Nevertheless, the real sustained performance achieved by the applications can be significantly lower than the theoretical peak performance of the machines. A key factor to bridge this performance gap is to understand how parallel computers behave. Performance analysis tools are essential not only to understand the behavior of parallel applications, but to identify why performance expectations might not have been met, serving as guidelines to improve the inefficiencies that caused poor performance, and driving both software and hardware optimizations. However, detailed analysis of the behavior of a parallel application requires to process a large amount of data that also grows extremely fast. Current large scale systems already comprise hundreds of thousands of cores, and upcoming exascale systems are expected to assemble more than a million processing elements. With such number of hardware components, the traditional analysis methodologies consisting in blindly collecting as much data as possible and then performing exhaustive lookups are no longer applicable, because the volume of performance data generated becomes absolutely unmanageable to store, process and analyze. The evolution of the tools suggests that more complex approaches are needed, incorporating intelligence to perform competently the challenging and important task of detailed analysis. In this thesis, we address the problem of scalability of performance analysis tools in large scale systems. In such scenarios, in-depth understanding of the interactions between all the system components is more compelling than ever for an effective use of the parallel resources. To this end, our work includes a thorough review of techniques that have been successfully applied to aid in the task of Big Data Analytics in fields like machine learning, data mining, signal processing and computer vision. We have leveraged these techniques to improve the analysis of large-scale parallel applications by automatically uncovering repetitive patterns, finding data correlations, detecting performance trends and further useful analysis information. Combinining their use, we have minimized the volume of performance data captured from an execution, while maximizing the benefit and insight gained from this data, and have proposed new and more effective methodologies for single and multi-experiment performance analysis.Con el incesante aumento de potencia y capacidad de los superordenadores, la habilidad de emplear de forma efectiva todos los recursos disponibles se ha convertido en un factor crucial. La necesidad de un uso eficiente radica tanto en la aspiración de los usuarios por obtener resultados en el menor tiempo posible, como en el interés del propio centro de cálculo que alberga la infraestructura computacional por maximizar la productividad de los recursos. Sin embargo, el rendimiento real que las aplicaciones son capaces de alcanzar suele ser significativamente menor que el rendimiento teórico de las máquinas. Y la clave para salvar esta distancia consiste en comprender el comportamiento de las máquinas paralelas. Las herramientas de análisis de rendimiento son instrumentos fundamentales no solo para entender como funcionan las aplicaciones paralelas, sino también para identificar los problemas por los que el rendimiento obtenido dista del esperado, sirviendo como guías para mejorar aquellas deficiencias software y/o hardware que son causas de degradación. No obstante, un análisis en detalle del comportamiento de una aplicación paralela requiere procesar una gran cantidad de datos que crece extremadamente rápido. Los sistemas actuales de gran escala ya comprenden cientos de miles de procesadores, y se espera que los inminentes sistemas exa-escala reunan millones de elementos de procesamiento. Con semejante número de componentes, las estrategias tradicionales de obtención indiscriminada de datos para mejorar la precisión de las herramientas de análisis caerán en desuso debido a las dificultades que entraña almacenarlos y procesarlos. En este aspecto, la evolución de las herramientas sugiere que son necesarios métodos más sofisticados, que incorporen inteligencia para desarrollar la tarea de análisis de manera más competente. Esta tesis aborda el problema de escalabilidad de las herramientas de análisis en sistemas de gran escala, donde es primordial el conocimiento detallado de las interacciones entre todos los componentes para emplear los recursos paralelos de la forma más óptima. Con este fin, esta investigación incluye una revisión exhaustiva de las técnicas que se han aplicado satisfactoriamente para extraer información de grandes volumenes de datos en otras áreas como aprendizaje automático, minería de datos y procesado de señal. Hemos adaptado estas técnicas para mejorar el análisis de aplicaciones paralelas de gran escala, detectando automáticamente patrones repetitivos, correlaciones de datos, tendencias de rendimiento, y demás información relevante. Combinando el uso de estas técnicas, se ha conseguido disminuir el volumen de datos generado durante una ejecución, a la vez que aumentar la cantidad de información útil que se puede extraer de los datos mediante la aplicación de nuevas y más efectivas metodologías de análisis para el estudio del rendimiento de experimentos individuales o en seri

    Deriving Goal-oriented Performance Models by Systematic Experimentation

    Get PDF
    Performance modelling can require substantial effort when creating and maintaining performance models for software systems that are based on existing software. Therefore, this thesis addresses the challenge of performance prediction in such scenarios. It proposes a novel goal-oriented method for experimental, measurement-based performance modelling. We validated the approach in a number of case studies including standard industry benchmarks as well as a real development scenario at SAP

    ZENTURIO: An Experiment Management System for Cluster and Grid Computing

    No full text
    The need to conduct and manage large sets of experiments for scientific applications dramatically increased over the last decade. However, there is still very little tool support for this complex and tedious process. In this paper we introduce the ZENTURIO experiment management system for parameter studies, performance analysis, and software testing for cluster and Grid architectures. ZENTURIO uses the ZEN directive-based language to specify arbitrary complex program executions. ZENTURIO is designed as a collection of Grid services that comprise: (1) a registry service which supports registering and locating Grid services; (2) an experiment generator that parses files with ZEN directives and instruments applications for performance analysis and parameter studies; (3) an experiment executor that compiles and controls the execution of experiments on the target machine. A graphical user portal allows the user to control and monitor the experiments and to automatically visualise performance and output data across multiple experiments. ZENTURIO has been implemented based on Java/Jini distributed technology. It supports experiment management on cluster architectures via PBS and on Grid infrastructures through GRAM. We report results of using ZENTURIO for performance analysis of an ocean simulation application and a parameter study of a computational finance code