116 research outputs found

    Distributed computing practice for large-scale science and engineering applications

    Get PDF
    It is generally accepted that the ability to develop large-scale distributed applications has lagged seriously behind other developments in cyberinfrastructure. In this paper, we provide insight into how such applications have been developed and an understanding of why developing applications for distributed infrastructure is hard. Our approach is unique in the sense that it is centered around half a dozen existing scientific applications; we posit that these scientific applications are representative of the characteristics, requirements, as well as the challenges of the bulk of current distributed applications on production cyberinfrastructure (such as the US TeraGrid). We provide a novel and comprehensive analysis of such distributed scientific applications. Specifically, we survey existing models and methods for large-scale distributed applications and identify commonalities, recurring structures, patterns and abstractions. We find that there are many ad hoc solutions employed to develop and execute distributed applications, which result in a lack of generality and the inability of distributed applications to be extensible and independent of infrastructure details. In our analysis, we introduce the notion of application vectors: a novel way of understanding the structure of distributed applications. Important contributions of this paper include identifying patterns that are derived from a wide range of real distributed applications, as well as an integrated approach to analyzing applications, programming systems and patterns, resulting in the ability to provide a critical assessment of the current practice of developing, deploying and executing distributed applications. Gaps and omissions in the state of the art are identified, and directions for future research are outlined

    PROPOSED MIDDLEWARE SOLUTION FOR RESOURCE-CONSTRAINED DISTRIBUTED EMBEDDED NETWORKS

    Get PDF
    The explosion in processing power of embedded systems has enabled distributed embedded networks to perform more complicated tasks. Middleware are sets of encapsulations of common and network/operating system-specific functionality into generic, reusable frameworks to manage such distributed networks. This thesis will survey and categorize popular middleware implementations into three adapted layers: host-infrastructure, distribution, and common services. This thesis will then apply a quantitative approach to grading and proposing a single middleware solution from all layers for two target platforms: CubeSats and autonomous unmanned aerial vehicles (UAVs). CubeSats are 10x10x10cm nanosatellites that are popular university-level space missions, and impose power and volume constraints. Autonomous UAVs are similarly-popular hobbyist-level vehicles that exhibit similar power and volume constraints. The MAVLink middleware from the host-infrastructure layer is proposed as the middleware to manage the distributed embedded networks powering these platforms in future projects. Finally, this thesis presents a performance analysis on MAVLink managing the ARM Cortex-M 32-bit processors that power the target platforms

    Development and Performance Evaluation Of A Lan-Based EDGE-Detection Tool

    Full text link

    Reliable massively parallel symbolic computing : fault tolerance for a distributed Haskell

    Get PDF
    As the number of cores in manycore systems grows exponentially, the number of failures is also predicted to grow exponentially. Hence massively parallel computations must be able to tolerate faults. Moreover new approaches to language design and system architecture are needed to address the resilience of massively parallel heterogeneous architectures. Symbolic computation has underpinned key advances in Mathematics and Computer Science, for example in number theory, cryptography, and coding theory. Computer algebra software systems facilitate symbolic mathematics. Developing these at scale has its own distinctive set of challenges, as symbolic algorithms tend to employ complex irregular data and control structures. SymGridParII is a middleware for parallel symbolic computing on massively parallel High Performance Computing platforms. A key element of SymGridParII is a domain specific language (DSL) called Haskell Distributed Parallel Haskell (HdpH). It is explicitly designed for scalable distributed-memory parallelism, and employs work stealing to load balance dynamically generated irregular task sizes. To investigate providing scalable fault tolerant symbolic computation we design, implement and evaluate a reliable version of HdpH, HdpH-RS. Its reliable scheduler detects and handles faults, using task replication as a key recovery strategy. The scheduler supports load balancing with a fault tolerant work stealing protocol. The reliable scheduler is invoked with two fault tolerance primitives for implicit and explicit work placement, and 10 fault tolerant parallel skeletons that encapsulate common parallel programming patterns. The user is oblivious to many failures, they are instead handled by the scheduler. An operational semantics describes small-step reductions on states. A simple abstract machine for scheduling transitions and task evaluation is presented. It defines the semantics of supervised futures, and the transition rules for recovering tasks in the presence of failure. The transition rules are demonstrated with a fault-free execution, and three executions that recover from faults. The fault tolerant work stealing has been abstracted in to a Promela model. The SPIN model checker is used to exhaustively search the intersection of states in this automaton to validate a key resiliency property of the protocol. It asserts that an initially empty supervised future on the supervisor node will eventually be full in the presence of all possible combinations of failures. The performance of HdpH-RS is measured using five benchmarks. Supervised scheduling achieves a speedup of 757 with explicit task placement and 340 with lazy work stealing when executing Summatory Liouville up to 1400 cores of a HPC architecture. Moreover, supervision overheads are consistently low scaling up to 1400 cores. Low recovery overheads are observed in the presence of frequent failure when lazy on-demand work stealing is used. A Chaos Monkey mechanism has been developed for stress testing resiliency with random failure combinations. All unit tests pass in the presence of random failure, terminating with the expected results

    Adaptive object management for distributed systems

    Get PDF
    This thesis describes an architecture supporting the management of pluggable software components and evaluates it against the requirement for an enterprise integration platform for the manufacturing and petrochemical industries. In a distributed environment, we need mechanisms to manage objects and their interactions. At the least, we must be able to create objects in different processes on different nodes; we must be able to link them together so that they can pass messages to each other across the network; and we must deliver their messages in a timely and reliable manner. Object based environments which support these services already exist, for example ANSAware(ANSA, 1989), DEC's Objectbroker(ACA,1992), Iona's Orbix(Orbix,1994)Yet such environments provide limited support for composing applications from pluggable components. Pluggability is the ability to install and configure a component into an environment dynamically when the component is used, without specifying static dependencies between components when they are produced. Pluggability is supported to a degree by dynamic binding. Components may be programmed to import references to other components and to explore their interfaces at runtime, without using static type dependencies. Yet thus overloads the component with the responsibility to explore bindings. What is still generally missing is an efficient general-purpose binding model for managing bindings between independently produced components. In addition, existing environments provide no clear strategy for dealing with fine grained objects. The overhead of runtime binding and remote messaging will severely reduce performance where there are a lot of objects with complex patterns of interaction. We need an adaptive approach to managing configurations of pluggable components according to the needs and constraints of the environment. Management is made difficult by embedding bindings in component implementations and by relying on strong typing as the only means of verifying and validating bindings. To solve these problems we have built a set of configuration tools on top of an existing distributed support environment. Specification tools facilitate the construction of independent pluggable components. Visual composition tools facilitate the configuration of components into applications and the verification of composite behaviours. A configuration model is constructed which maintains the environmental state. Adaptive management is made possible by changing the management policy according to this state. Such policy changes affect the location of objects, their bindings, and the choice of messaging system

    04451 Abstracts Collection -- Future Generation Grids

    Get PDF
    The Dagstuhl Seminar 04451 "Future Generation Grid" was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl from 1st to 5th November 2004. The focus of the seminar was on open problems and future challenges in the design of next generation Grid systems. A total of 45 participants presented their current projects, research plans, and new ideas in the area of Grid technologies. Several evening sessions with vivid discussions on future trends complemented the talks. This report gives an overview of the background and the findings of the seminar

    Designing Distributed, Component-Based Systems for Industrial Robotic Applications

    Get PDF
    none3noneM. Amoretti; S. Caselli; M. ReggianiM., Amoretti; S., Caselli; Reggiani, Monic

    A Service-Oriented Architecture and Language for Abstracted Distributed Algorithms

    Get PDF
    corecore