6,614 research outputs found

    MonALISA : A Distributed Monitoring Service Architecture

    Full text link
    The MonALISA (Monitoring Agents in A Large Integrated Services Architecture) system provides a distributed monitoring service. MonALISA is based on a scalable Dynamic Distributed Services Architecture which is designed to meet the needs of physics collaborations for monitoring global Grid systems, and is implemented using JINI/JAVA and WSDL/SOAP technologies. The scalability of the system derives from the use of multithreaded Station Servers to host a variety of loosely coupled self-describing dynamic services, the ability of each service to register itself and then to be discovered and used by any other services, or clients that require such information, and the ability of all services and clients subscribing to a set of events (state changes) in the system to be notified automatically. The framework integrates several existing monitoring tools and procedures to collect parameters describing computational nodes, applications and network performance. It has built-in SNMP support and network-performance monitoring algorithms that enable it to monitor end-to-end network performance as well as the performance and state of site facilities in a Grid. MonALISA is currently running around the clock on the US CMS test Grid as well as an increasing number of other sites. It is also being used to monitor the performance and optimize the interconnections among the reflectors in the VRVS system.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 8 pages, pdf. PSN MOET00

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    Partial replication in distributed software transactional memory

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaDistributed software transactional memory (DSTM) is emerging as an interesting alternative for distributed concurrency control. Usually, DSTM systems resort to data distribution and full replication techniques in order to provide scalability and fault tolerance. Nevertheless, distribution does not provide support for fault tolerance and full replication limits the system’s total storage capacity. In this context, partial data replication rises as an intermediate solution that combines the best of the previous two trying to mitigate their disadvantages. This strategy has been explored by the distributed databases research field, but has been little addressed in the context of transactional memory and, to the best of our knowledge, it has never before been incorporated into a DSTM system for a general-purpose programming language. Thus, we defend the claim that it is possible to combine both full and partial data replication in such systems. Accordingly, we developed a prototype of a DSTM system combining full and partial data replication for Java programs. We built from an existent DSTM framework and extended it with support for partial data replication. With the proposed framework, we implemented a partially replicated DSTM. We evaluated the proposed system using known benchmarks, and the evaluation showcases the existence of scenarios where partial data replication can be advantageous, e.g., in scenarios with small amounts of transactions modifying fully replicated data. The results of this thesis show that we were able to sustain our claim by implementing a prototype that effectively combines full and partial data replication in a DSTM system. The modularity of the presented framework allows the easy implementation of its various components, and it provides a non-intrusive interface to applications.Fundação para a Ciência e Tecnologia - (FCT/MCTES) in the scope of the research project PTDC/EIA-EIA/113613/2009 (Synergy-VM

    Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming

    Full text link
    Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200

    Image Information Mining Systems

    Get PDF

    Analysis of simulation environment

    Get PDF
    In this paper the requirements for an ALN simulation environment are analysed, as needed in the CATNETS Project. A number of grid and general purpose simulators are evaluated regarding the identified requirements for simulating economical resource allocation mechanisms in ALNs. Subsequently a suitable simulator is chosen for usage in the CATNETS project. --CATNETS simulator,requirements analysis,simulator selection

    Enhancing reliability with Latin Square redundancy on desktop grids.

    Get PDF
    Computational grids are some of the largest computer systems in existence today. Unfortunately they are also, in many cases, the least reliable. This research examines the use of redundancy with permutation as a method of improving reliability in computational grid applications. Three primary avenues are explored - development of a new redundancy model, the Replication and Permutation Paradigm (RPP) for computational grids, development of grid simulation software for testing RPP against other redundancy methods and, finally, running a program on a live grid using RPP. An important part of RPP involves distributing data and tasks across the grid in Latin Square fashion. Two theorems and subsequent proofs regarding Latin Squares are developed. The theorems describe the changing position of symbols between the rows of a standard Latin Square. When a symbol is missing because a column is removed the theorems provide a basis for determining the next row and column where the missing symbol can be found. Interesting in their own right, the theorems have implications for redundancy. In terms of the redundancy model, the theorems allow one to state the maximum makespan in the face of missing computational hosts when using Latin Square redundancy. The simulator software was developed and used to compare different data and task distribution schemes on a simulated grid. The software clearly showed the advantage of running RPP, which resulted in faster completion times in the face of computational host failures. The Latin Square method also fails gracefully in that jobs complete with massive node failure while increasing makespan. Finally an Inductive Logic Program (ILP) for pharmacophore search was executed, using a Latin Square redundancy methodology, on a Condor grid in the Dahlem Lab at the University of Louisville Speed School of Engineering. All jobs completed, even in the face of large numbers of randomly generated computational host failures

    PYDAC: A DISTRIBUTED RUNTIME SYSTEM AND PROGRAMMING MODEL FOR A HETEROGENEOUS MANY-CORE ARCHITECTURE

    Get PDF
    Heterogeneous many-core architectures that consist of big, fast cores and small, energy-efficient cores are very promising for future high-performance computing (HPC) systems. These architectures offer a good balance between single-threaded perfor- mance and multithreaded throughput. Such systems impose challenges on the design of programming model and runtime system. Specifically, these challenges include (a) how to fully utilize the chip’s performance, (b) how to manage heterogeneous, un- reliable hardware resources, and (c) how to generate and manage a large amount of parallel tasks. This dissertation proposes and evaluates a Python-based programming framework called PyDac. PyDac supports a two-level programming model. At the high level, a programmer creates a very large number of tasks, using the divide-and-conquer strategy. At the low level, tasks are written in imperative programming style. The runtime system seamlessly manages the parallel tasks, system resilience, and inter- task communication with architecture support. PyDac has been implemented on both an field-programmable gate array (FPGA) emulation of an unconventional het- erogeneous architecture and a conventional multicore microprocessor. To evaluate the performance, resilience, and programmability of the proposed system, several micro-benchmarks were developed. We found that (a) the PyDac abstracts away task communication and achieves programmability, (b) the micro-benchmarks are scalable on the hardware prototype, but (predictably) serial operation limits some micro-benchmarks, and (c) the degree of protection versus speed could be varied in redundant threading that is transparent to programmers

    Perturbing and imaging nuclear compartments to reveal mechanisms of transcription regulation and telomere maintenance

    Get PDF
    The cell nucleus is organized into functional domains that form around chromatin, which serves as a scaffold composed of DNA, proteins, and associated RNAs. On the 0.1-1 µm mesoscale these domains can form spatially defined compartments with distinct composition and properties that enrich specific genomic activities like transcription, chromatin modification or DNA repair. In addition, extrachromosomal DNA elements and RNAs can separate from the chromatin template and assemble with proteins into nuclear bodies. The resulting accumulations of proteins and nucleic acids in the nucleus modulate chromatin-templated processes and their organization. The assembly of these compartments occurs in a self-organizing manner via direct and indirect binding of proteins to DNA and/or RNA. Recently, it has been proposed that multivalent interactions drive compartmentalization by inducing phase separation with a non-stoichiometric accumulation of factors into biomolecular condensates. Despite the importance of compartments for genome regulation, insights into their structure and material properties and how these affect their function is limited. To address this issue, it is important to devise approaches that can perturb nuclear compartments in a targeted manner, while also measuring changes in genome activities within the same cell. In this thesis, the methodology to reveal the underlying structure-function relationships of nuclear compartments has been advanced and applied to compartments involved in activation and silencing of chromatin, and telomere maintenance in cancer cells. I first established a toolbox of chromatin effector constructs to probe and perturb properties of nuclear compartments in living cells that comprised different combinations of DNA binding, transcription activation and light-dependent interaction domains. In addition, I developed workflows to quantitatively assess relevant compartment features by fluorescence microscopy. These methods were employed to study the compaction mechanism of mouse pericentric heterochromatin (PCH) foci and to investigate the interplay between transcriptional co-activators, phase separation and transcription at an inducible reporter gene cluster. It revealed determinants of PCH compaction and identified differential co-activator usage and multivalent interactions as contributors to transcription factor (TF) strength. The results furthermore challenged the model of TF phase separation as a general positive driver of gene transcription. In the second part, I focused on exploiting the detection of compartments for measuring activity of the alternative lengthening of telomeres (ALT) pathway used by cancer cells to extend their telomeres in absence of telomerase. I developed ALT-FISH, a scalable and quantitative imaging assay that detects ALT pathway-specific compartments containing large amounts of single-stranded telomeric nucleic acids. I applied the method to cell line models from different cancer entities and to tumor tissue from leiomyosarcoma and neuroblastoma patients. By devising automated ALT-FISH data acquisition and analysis IV workflows, I implemented an approach, which enabled ALT activity measurements in hundreds of thousands of single cells. These technological advancements provided a quantitative description of ALT activity at single cell resolution and were used to characterize the spatial distribution of ALT activity in relation to other biological features and in response to perturbations. Finally, a novel approach for studying the regulation of ALT in tumors could be established by integrating the method with the spatially resolved detection of single cell transcriptomes. In summary, this thesis introduced and utilized several methods to establish connections between nuclear compartment organization, chromatin features, transcription regulation, and telomere maintenance. These perturbation and imaging techniques are versatile and may be applied to dissect nuclear activities related to other compartments and biological model systems. Furthermore, the detection of ALT activity has demonstrated that compartments can offer valuable biological insights into how phenotypic cellular heterogeneity is encoded and linked to diseases such as cancer
    corecore