832 research outputs found

    Solving Multi-choice Secretary Problem in Parallel: An Optimal Observation-Selection Protocol

    Full text link
    The classical secretary problem investigates the question of how to hire the best secretary from nn candidates who come in a uniformly random order. In this work we investigate a parallel generalizations of this problem introduced by Feldman and Tennenholtz [14]. We call it shared QQ-queue JJ-choice KK-best secretary problem. In this problem, nn candidates are evenly distributed into QQ queues, and instead of hiring the best one, the employer wants to hire JJ candidates among the best KK persons. The JJ quotas are shared by all queues. This problem is a generalized version of JJ-choice KK-best problem which has been extensively studied and it has more practical value as it characterizes the parallel situation. Although a few of works have been done about this generalization, to the best of our knowledge, no optimal deterministic protocol was known with general QQ queues. In this paper, we provide an optimal deterministic protocol for this problem. The protocol is in the same style of the 1e1\over e-solution for the classical secretary problem, but with multiple phases and adaptive criteria. Our protocol is very simple and efficient, and we show that several generalizations, such as the fractional JJ-choice KK-best secretary problem and exclusive QQ-queue JJ-choice KK-best secretary problem, can be solved optimally by this protocol with slight modification and the latter one solves an open problem of Feldman and Tennenholtz [14]. In addition, we provide theoretical analysis for two typical cases, including the 1-queue 1-choice KK-best problem and the shared 2-queue 2-choice 2-best problem. For the former, we prove a lower bound 1O(ln2KK2)1-O(\frac{\ln^2K}{K^2}) of the competitive ratio. For the latter, we show the optimal competitive ratio is 0.372\approx0.372 while previously the best known result is 0.356 [14].Comment: This work is accepted by ISAAC 201

    A software architecture for consensus based replication

    Get PDF
    Orientador: Luiz Eduardo BuzatoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Esta tese explora uma das ferramentas fundamentais para construção de sistemas distribuídos: a replicação de componentes de software. Especificamente, procuramos resolver o problema de como simplificar a construção de aplicações replicadas que combinem alto grau de disponibilidade e desempenho. Como ferramenta principal para alcançar o objetivo deste trabalho de pesquisa desenvolvemos Treplica, uma biblioteca de replicação voltada para construção de aplicações distribuídas, porém com semântica de aplicações centralizadas. Treplica apresenta ao programador uma interface simples baseada em uma especificação orientada a objetos de replicação ativa. A conclusão que defendemos nesta tese é que é possível desenvolver um suporte modular e de uso simples para replicação que exibe alto desempenho, baixa latência e que permite recuperação eficiente em caso de falhas. Acreditamos que a arquitetura de software proposta tem aplicabilidade em qualquer sistema distribuído, mas é de especial interesse para sistemas que não são distribuídos pela ausência de uma forma simples, eficiente e confiável de replicá-losAbstract: This thesis explores one of the fundamental tools for the construction of distributed systems: the replication of software components. Specifically, we attempted to solve the problem of simplifying the construction of high-performance and high-availability replicated applications. We have developed Treplica, a replication library, as the main tool to reach this research objective. Treplica allows the construction of distributed applications that behave as centralized applications, presenting the programmer a simple interface based on an object-oriented specification for active replication. The conclusion we reach in this thesis is that it is possible to create a modular and simple to use support for replication, providing high performance, low latency and fast recovery in the presence of failures. We believe our proposed software architecture is applicable to any distributed system, but it is particularly interesting to systems that remain centralized due to the lack of a simple, efficient and reliable replication mechanismDoutoradoSistemas de ComputaçãoDoutor em Ciência da Computaçã

    Sixth Workshop and Tutorial on Practical Use of Coloured Petri Nets and the CPN Tools Aarhus, Denmark, October 24-26, 2005

    Get PDF
    This booklet contains the proceedings of the Sixth Workshop on Practical Use of Coloured Petri Nets and the CPN Tools, October 24-26, 2005. The workshop is organised by the CPN group at the Department of Computer Science, University of Aarhus, Denmark. The papers are also available in electronic form via the web pages: http://www.daimi.au.dk/CPnets/workshop0

    Performance measurement methodology for integrated services networks

    Get PDF
    With the emergence of advanced integrated services networks, the need for effective performance analysis techniques has become extremely important. Further advancements in these networks can only be possible if the practical performance issues of the existing networks are clearly understood. This thesis is concerned with the design and development of a measurement system which has been implemented on a large experimental network. The measurement system is based on dedicated traffic generators which have been designed and implemented on the Project Unison network. The Unison project is a multisite networking experiment for conducting research into the interconnection and interworking of local area network based multi-media application systems. The traffic generators were first developed for the Cambridge Ring based Unison network. Once their usefulness and effectiveness was proven, high performance traffic generators using transputer technology were built for the Cambridge Fast Ring based Unison network. The measurement system is capable of measuring the conventional performance parameters such as throughput and packet delay, and is able to characterise the operational performance of network bridging components under various loading conditions. In particular, the measurement system has been used in a 'measure and tune' fashion in order to improve the performance of a complex bridging device. Accurate measurement of packet delay in wide area networks is a recognised problem. The problem is associated with the synchronisation of the clocks between the distant machines. A chronological timestamping technique has been introduced in which the clocks are synchronised using a broadcast synchronisation technique. Rugby time clock receivers have been interfaced to each generator for the purpose of synchronisation. In order to design network applications, an accurate knowledge of the expected network performance under different loading conditions is essential. Using the measurement system, this has been achieved by examining the network characteristics at the network/user interface. Also, the generators are capable of emulating a variety of application traffic which can be injected into the network along with the traffic from real applications, thus enabling user oriented performance parameters to be evaluated in a mixed traffic environment. A number of performance measurement experiments have been conducted using the measurement system. Experimental results obtained from the Unison network serve to emphasise the power and effectiveness of the measurement methodology

    Dataflow computers: a tutorial and survey

    Get PDF
    Journal ArticleThe demand for very high performance computer has encouraged some researchers in the computer science field to consider alternatives to the conventional notions of program and computer organization. The dataflow computer is one attempt to form a new collection of consistent systems ideas to improve both computer performance and to alleviate the software design problems induced by the construction of highly concurrent programs

    Computer structures for distributed systems

    Get PDF

    Formalisation of asynchronous interactions

    Get PDF
    Large computing systems are generally built by connecting several distributed subsystems. The way these entities communicate is crucial to the proper functioning of the overall composed system. An in-depth study of these interactions makes sense in the context of the formal development and verification of such systems. The interactions fall in two categories: synchronous and asynchronous communication. In synchronous communication, the transmission of a piece of information - the message - is instantaneous. Asynchronous communication, on the other hand, splits the transmission in a send operation and a receive operation. This make the interleaving of other events possible and lead to new behaviours that may or may not be desirable. The asynchronous world is often viewed as a monolithic counterpart of the synchronous world. It actually comes in multiple models that provide a wide range of properties that can be studied and compared. This thesis focuses on communication models that order the delivery of messages: for instance, the "FIFO" models ensure that some messages are received in the order of their emission. We consider classic communication models from the literature as well as a few variations. We highlight the differences that are sometimes overlooked. First, we propose an abstract, logical, and homogeneous formalisation of the communication models and we establish a hierarchy that extends existing results. Second, we provide an operational approach with a tool that verifies the compatibility of compositions of peers. We mechanise this tool with the TLA+ specification language and its model checker TLC. The tool is designed in a modular fashion: the commmunicating peers, the temporal compatibility properties, and the communication models are specified independently. We rely on a set of uniform operational specifications of the communication models that are based on the concept of message history. We identify and prove the conditions under which they conform to the logical definitions and thus show the tool is trustworthy. Third, we consider concrete specifications of the communication models that are often found in the literature. Thus, the models are classified in terms of ordering properties and according to the level of abstraction of the different specifications. The concept of refinement covers these two aspects. Thus, we model asynchronous point-to-point communication along several levels of refinement and then, with the Event-B method, we establish and prove all the refinements between the communication models and the alternative specifications of each given model. This work results in a detailed map one can use to develop a new model or find the one that best fits given needs. Eventually we explore ways to extend our work to multicast communication that consists in sending messages to several recipients at once. In particular, we highlight the differences in the hierarchy of the models and how we modify our verification tool to handle this communication paradigm

    Reliability for exascale computing : system modelling and error mitigation for task-parallel HPC applications

    Get PDF
    As high performance computing (HPC) systems continue to grow, their fault rate increases. Applications running on these systems have to deal with rates on the order of hours or days. Furthermore, some studies for future Exascale systems predict the rates to be on the order of minutes. As a result, efficient fault tolerance solutions are needed to be able to tolerate frequent failures. A fault tolerance solution for future HPC and Exascale systems must be low-cost, efficient and highly scalable. It should have low overhead in fault-free execution and provide fast restart because long-running applications are expected to experience many faults during the execution. Meanwhile task-based dataflow parallel programming models (PM) are becoming a popular paradigm in HPC applications at large scale. For instance, we see the adaptation of task-based dataflow parallelism in OpenMP 4.0, OmpSs PM, Argobots and Intel Threading Building Blocks. In this thesis we propose fault-tolerance solutions for task-parallel dataflow HPC applications. Specifically, first we design and implement a checkpoint/restart and message-logging framework to recover from errors. We then develop performance models to investigate the benefits of our task-level frameworks when integrated with system-wide checkpointing. Moreover, we design and implement selective task replication mechanisms to detect and recover from silent data corruptions in task-parallel dataflow HPC applications. Finally, we introduce a runtime-based coding scheme to detect and recover from memory errors in these applications. Considering the span of all of our schemes, we see that they provide a fairly high failure coverage where both computation and memory is protected against errors.A medida que los Sistemas de Cómputo de Alto rendimiento (HPC por sus siglas en inglés) siguen creciendo, también las tasas de fallos aumentan. Las aplicaciones que se ejecutan en estos sistemas tienen una tasa de fallos que pueden estar en el orden de horas o días. Además, algunos estudios predicen que los fallos estarán en el orden de minutos en los Sistemas Exascale. Por lo tanto, son necesarias soluciones eficientes para la tolerancia a fallos que puedan tolerar fallos frecuentes. Las soluciones para tolerancia a fallos en los Sistemas futuros de HPC y Exascale tienen que ser de bajo costo, eficientes y altamente escalable. El sobrecosto en la ejecución sin fallos debe ser bajo y también se debe proporcionar reinicio rápido, ya que se espera que las aplicaciones de larga duración experimenten muchos fallos durante la ejecución. Por otra parte, los modelos de programación paralelas basados en tareas ordenadas de acuerdo a sus dependencias de datos, se están convirtiendo en un paradigma popular en aplicaciones HPC a gran escala. Por ejemplo, los siguientes modelos de programación paralela incluyen este tipo de modelo de programación OpenMP 4.0, OmpSs, Argobots e Intel Threading Building Blocks. En esta tesis proponemos soluciones de tolerancia a fallos para aplicaciones de HPC programadas en un modelo de programación paralelo basado tareas. Específicamente, en primer lugar, diseñamos e implementamos mecanismos “checkpoint/restart” y “message-logging” para recuperarse de los errores. Para investigar los beneficios de nuestras herramientas a nivel de tarea cuando se integra con los “system-wide checkpointing” se han desarrollado modelos de rendimiento. Por otra parte, diseñamos e implementamos mecanismos de replicación selectiva de tareas que permiten detectar y recuperarse de daños de datos silenciosos en aplicaciones programadas siguiendo el modelo de programación paralela basadas en tareas. Por último, se introduce un esquema de codificación que funciona en tiempo de ejecución para detectar y recuperarse de los errores de la memoria en estas aplicaciones. Todos los esquemas propuestos, en conjunto, proporcionan una cobertura bastante alta a los fallos tanto si estos se producen el cálculo o en la memoria.Postprint (published version
    corecore