9 research outputs found

    Constructing fail-controlled nodes for distributed systems: a software approach

    Get PDF
    PhD ThesisDesigning and implementing distributed systems which continue to provide specified services in the presence of processing site and communication failures is a difficult task. To facilitate their development, distributed systems have been built assuming that their underlying hardware components are Jail-controlled, i.e. present a well defined failure mode. However, if conventional hardware cannot provide the assumed failure mode, there is a need to build processing sites or nodes, and communication infra-structure that present the fail-controlled behaviour assumed. Coupling a number of redundant processors within a replicated node is a well known way of constructing fail-controlled nodes. Computation is replicated and executed simultaneously at each processor, and by employing suitable validation techniques to the outputs generated by processors (e.g. majority voting, comparison), outputs from faulty processors can be prevented from appearing at the application level. One way of constructing replicated nodes is by introducing hardwired mechanisms to couple replicated processors with specialised validation hardware circuits. Processors are tightly synchronised at the clock cycle level, and have their outputs validated by a reliable validation hardware. Another approach is to use software mechanisms to perform synchronisation of processors and validation of the outputs. The main advantage of hardware based nodes is the minimum performance overhead incurred. However, the introduction of special circuits may increase the complexity of the design tremendously. Further, every new microprocessor architecture requires considerable redesign overhead. Software based nodes do not present these problems, on the other hand, they introduce much bigger performance overheads to the system. In this thesis we investigate alternative ways of constructing efficient fail-controlled, software based replicated nodes. In particular, we present much more efficient order protocols, which are necessary for the implementation of these nodes. Our protocols, unlike others published to date, do not require processors' physical clocks to be explicitly synchronised. The main contribution of this thesis is the precise definition of the semantics of a software based Jail-silent node, along with its efficient design, implementation and performance evaluation.The Brazilian National Research Council (CNPq/Brasil)

    Sistemas de gerenciamento de aprendizagem: uma metodologia de avaliação

    No full text
    O mercado mundial de e-learning vem despertando interesse de grandes e pequenas empresas, dispostas a desenvolver tecnologias ou oferecer serviços variados. Das tecnologias disponíveis, as mais relevantes para o e-learning estão relacionadas à autoria de cursos e sistemas de gerenciamento de aprendizado os LMSs (Learning Management Systems). Um Learning Management System, é um software que controla o desenvolvimento, gerenciamento e acompanhamento de cursos de aprendizagem online. A maioria dos LMSs disponíveis no mercado possibilita acessar cursos online e outras atividades de aprendizagem; matricular alunos; coletar e armazenar dados sobre a atuação dos estudantes e administrar de estudantes e cursos. Existem atualmente no mercado diversos sistemas de gerenciamento de aprendizado, alguns desenvolvidos por empresas e outros por instituições de ensino ou pesquisa. Essa diversidade pode ser um agente complicador para a implantação de projetos de e-learning na fase de escolha do sistema de gerenciamento. Este artigo tem a intenção de propor uma metodologia de avaliação desses sistemas. A metodologia proposta apresenta um critério de pontuação e de pesos para as características dos produtos, gerando um resultado que pode ser utilizado para auxiliar o processo de aquisição de um desses produtos. Como forma de exemplificar e validar a aplicação dessa metodologia, mostraremos a nossa avaliação dos sistemas LearningSpace, TopClass, UniverSite e WebCT. A avaliação desses produtos foi realizada na Universidade Tiradentes como parte da implantação do projeto de elearning

    Efficient and robust adaptive consensus services based on oracles

    Get PDF
    Due to their fundamental role in the design of faulttolerant distributed systems, consensus protocols have been widely studied. Most of the research in this area has focused on providing ways for circumventing the impossibility of reaching consensus on a purely asynchronous system subject to failures. Of particular interest are the indulgent consensus protocols based upon weak failure detection oracles. Following the first works that were more concerned with the correctness of such protocols, performance issues related to them are now a topic that has gained considerable attention. In particular, a few studies have been conducted to analyze the impact that the quality of service of the underlying failure detection oracle has on the performance of consensus protocols. To achieve better performance, adaptive failure detectors have been proposed. Also, slowness oracles have been proposed to allow consensus protocols to adapt themselves to the changing conditions of the environment, enhancing their performance when there are substantial changes on the load to which the system is exposed. In this paper we further investigate the use of these oracles to design efficient consensus services. In particular, we provide efficient and robust implementations of slowness oracles based on techniques that have been previously used to implement adaptive failure detection oracles. Our experiments on a widearea distributed system show that by using a slowness oracle that is well matched with a failure detection oracle, one can achieve performance as much as 53.5% better than the alternative that does not use a slowness oracle

    Efficient and robust adaptive consensus services based on oracles

    Get PDF
    Due to their fundamental role in the design of faulttolerant distributed systems, consensus protocols have been widely studied. Most of the research in this area has focused on providing ways for circumventing the impossibility of reaching consensus on a purely asynchronous system subject to failures. Of particular interest are the indulgent consensus protocols based upon weak failure detection oracles. Following the first works that were more concerned with the correctness of such protocols, performance issues related to them are now a topic that has gained considerable attention. In particular, a few studies have been conducted to analyze the impact that the quality of service of the underlying failure detection oracle has on the performance of consensus protocols. To achieve better performance, adaptive failure detectors have been proposed. Also, slowness oracles have been proposed to allow consensus protocols to adapt themselves to the changing conditions of the environment, enhancing their performance when there are substantial changes on the load to which the system is exposed. In this paper we further investigate the use of these oracles to design efficient consensus services. In particular, we provide efficient and robust implementations of slowness oracles based on techniques that have been previously used to implement adaptive failure detection oracles. Our experiments on a widearea distributed system show that by using a slowness oracle that is well matched with a failure detection oracle, one can achieve performance as much as 53.5% better than the alternative that does not use a slowness oracle

    Trading Cycles for Information:

    No full text
    Scheduling independent tasks on heterogeneous environments, like grids, is not trivial. To make a good scheduling plan on this kind of environments, the scheduler usually needs some information such as host speed, host load, and task size. This kind of information is not always available and is often difficult to obtain. In this paper we propose a scheduling approach that does not use any kind of information but still delivers good performance. Our approach uses task replication to cope with the dynamic and heterogeneous nature of grids without depending on any information about machines or tasks. Our results show that task replication can deliver good and stable performance at the expense of additional resource consumption. By limiting replication, however, additional resource consumption can be controlled with little effect on performance
    corecore