29 research outputs found

    MASA-StarPU: Parallel Sequence Comparison with Multiple Scheduling Policies and Pruning

    Get PDF
    International audienceSequence comparison tools based on the Smith-Waterman (SW) algorithm provide the optimal result but have high execution times when the sequences compared are long, since a huge dynamic programming (DP) matrix is computed. Block pruning is an optimization that does not compute some parts of the DP matrix and can reduce considerably the execution time when the sequences compared are similar. However, block pruning's resulting task graph is dynamic and irregular. Since different pruning scenarios lead to different pruning shapes, we advocate that no single scheduling policy will behave the best for all scenarios. This paper proposes MASA-StarPU, a sequence aligner that integrates the domain specific framework MASA to the generic programming environment StarPU, creating a tool which has the benefits of StarPU (i.e., multiple task scheduling policies) and MASA (i.e., fast sequence alignment). MASA-StarPU was executed in two different multicore platforms and the results show that a bad choice of the scheduling policy may have a great impact on the performance. For instance, using 24 cores, the 5M x 5M comparison took 1484s with the dmdas policy whereas the same comparison took 3601s with lws. We also show that no scheduling policy behaves the best for all scenarios

    Uma abordagem de consciência de máquina ao controle de semáforos de tráfego urbano

    Get PDF
    Orientador: Ricardo Ribeiro GudwinTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Neste trabalho, apresentamos uma arquitetura cognitiva distribuída usada para o controle de tráfego em uma rede urbana. Essa arquitetura se baseia em uma abordagem de consciência de máquina - Teoria do Workspace Global - de forma a usar competição e difusão em broadcast, permitindo que um grupo de controladores de tráfego locais interajam, resultando em melhor desempenho do grupo. A ideia principal é que controladores locais geralmente realizam um comportamento reativo, definindo os tempos de verde e vermelho do semáforo, de acordo com informações locais. Esses controladores locais competem de forma a definir qual deles está experienciando a situação mais crítica. O controlador nas piores condições ganha acesso ao workspace global, e depois realiza uma difusão em broadcast de sua condição (e sua localização) para todos os outros controladores, pedindo sua ajuda para lidar com sua situação. Essa chamada do controlador que acessa o workspace global causará uma interferência no comportamento local reativo, para aqueles controladores locais com alguma chance de ajudar o controlador na situação crítica, contendo o tráfego na sua direção. Esse comportamento do grupo, coordenado pela estratégia do workspace global, transforma o comportamento reativo anterior em uma forma de comportamento deliberativo. Nós mostramos que essa estratégia é capaz de melhorar a média do tempo de viagem de todos os veículos que fluem na rede urbana. Um ganho consistente no desempenho foi conseguido com o controlador "Consciência de Máquina" durante todo o tempo da simulação, em diferentes cenários, indo de 10% até maisde 20%, quando comparado ao controlador "Reativo Paralelo" sem o mecanismo de consciência artificial, produzindo evidência para suportar a hipótese de que um mecanismo de consciência artificial, que difunde serialmente em broadcast conteúdo para processos automáticos, pode trazer vantagens para uma tarefa global realizada por uma sociedade de agentes paralelos que operam juntos por uma meta comumAbstract: In this work, we present a distributed cognitive architecture used to control the traffic in an urban network. This architecture relies on a machine consciousness approach - Global Workspace Theory - in order to use competition and broadcast, allowing a group of local traffic controllers to interact, resulting in a better group performance.The main idea is that the local controllers usually perform a purely reactive behavior, defining the times of red and green lights, according just to local information. These local controllers compete in order to define which of them is experiencing the most critical traffic situation. The controller in the worst condition gains access to the global workspace, further broadcasting its condition (and its location) to all other controllers, asking for their help in dealing with its situation. This call from the controller accessing the global workspace will cause an interference in the reactive local behavior, for those local controllers with some chance in helping the controller in a critical condition, by containing traffic in its direction. This group behavior, coordinated by the global workspace strategy, turns the once reactive behavior into a kind of deliberative one. We show that this strategy is capable of improving the overall mean travel time of vehicles flowing through the urban network. A consistent gain in performance with the "Machine Consciousness" traffic signal controller during all simulation time, throughout different simulated scenarios, could be observed, ranging from around 10% to more than 20%, when compared to the "Parallel Reactive" controller without the artificial consciousness mechanism, producing evidence to support the hypothesis that an artificial consciousness mechanism, which serially broadcasts content to automatic processes, can bring advantages to the global task performed by a society of parallel agents working together for a common goalDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica153206/2010-1CNPQCAPESFAPES

    Collective Mind, Part II: technical report

    Get PDF
    Nowadays, engineers have to develop software often without even knowing which hardware it will eventually run on in numerous mobile phones, tablets, laptops, data centers, supercomputers and cloud services. Unfortunately, optimizing compilers often fail to produce fast and energy efficient code across all hardware configurations. In this technical report, we present the first to our knowledge practical, collaborative, publicly available and Wikipedia-inspired solution to this problem based on our recent Collective Mind Infrastructure and Repository

    Masa-StarPU : estratégia com múltiplas políticas de escalonamento de tarefas para alinhamento de sequências com pruning

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2020.A comparação de sequências biológicas é uma tarefa importante executada com frequência na análise genética de organismos. Algoritmos que realizam este procedimento utilizando um método exato possuem complexidade quadrática de tempo, demandando alto poder computacional e uso de técnicas de paralelização. Muitas soluções têm sido propostas para tratar este problema utilizam aceleradores como GPUs e FPGAs, porém poucas soluções utilizam apenas CPUs. O MASA é uma ferramenta multiplataforma específica para realizar a comparação de sequências biológicas. Uma de suas maiores virtudes é a otimização block pruning que realiza a poda da matriz de programação dinâmica em tempo de execução acelerando o processamento, porém introduzindo um problema de desbalanceamento de carga. O StarPU é uma ferramenta de programação paralela que possui implementações de diversas políticas de escalonamento dinâmico de tarefas. Neste trabalho, propomos e avaliamos o MASA-StarPU, uma ferramenta que utiliza a estrutura do MASA para realizar a comparação de sequências biológicas e as políticas do StarPU adequadas ao block pruning com o objetivo de eliminar o problema de desbalanceamento de carga. O MASA-StarPU foi testado em dois ambientes, avaliando pares de sequências de DNA cujos tamanhos variam entre 10 KBP (milhares de pares de bases) e 47 MBP (milhões de pares de bases), e as políticas de escalonamento de tarefas foram avaliadas em diferentes casos. Quando comparado com outras soluções da literatura que utilizam apenas CPU, o MASA-StarPU obteve o melhor resultado para todas as comparações. O MASA-StarPU atingiu o máximo de 18,4 GCUPS (bilhões de células atualizadas por segundo).The comparison of biological sequences is an important task performed frequently in the genetic analysis of organisms. Algorithms that perform this procedure using an exact method have quadratic time complexity, demanding high computational power and, consequently parallelization techniques. Many solutions have been proposed to address this problem using accelerators such as GPUs and FPGAs, but few solutions use only CPUs. MASA is a domain-specific platform for performing biological sequence comparison. One of its greatest virtues is the optimization block pruning. Which prunes the dynamic programming matrix at run time introducing load imbalance. StarPU is a generic parallel programming tool that provides several dynamic task scheduling policies. In this work, we propose and evaluate MASA-StarPU, a tool that uses the MASA structure to carry out the comparison of biological sequences and uses the StarPU policies to accelerate the computation. MASA-StarPU was tested in two environments, evaluating pairs of DNA sequences whose sizes vary between 10 KBP (thousands of base pairs) and 47 MBP (millions of pairs of bases), and multiple task scheduling policies were evaluated in different cases. When compared to other solutions in the literature that use only CPU, MASA-StarPU obtained the best result for all comparisons and reached a maximum of 18.4 GCUPS (billions of cells updated by second)

    A discrete filled function method for the design of FIR filters with signed-powers-of-two coefficients

    Get PDF

    A Conceptual and Computational Model of Moral Decision Making in Human and Artificial Agents

    Get PDF
    Recently there has been a resurgence of interest in general, comprehensive models of human cognition. Such models aim to explain higher order cognitive faculties, such as deliberation and planning. Given a computational representation, the validity of these models can be tested in computer simulations such as software agents or embodied robots. The push to implement computational models of this kind has created the field of Artificial General Intelligence, or AGI. Moral decision making is arguably one of the most challenging tasks for computational approaches to higher order cognition. The need for increasingly autonomous artificial agents to factor moral considerations into their choices and actions has given rise to another new field of inquiry variously known as Machine Morality, Machine Ethics, Roboethics or Friendly AI. In this paper we discuss how LIDA, an AGI model of human cognition, can be adapted to model both affective and rational features of moral decision making. Using the LIDA model we will demonstrate how moral decisions can be made in many domains using the same mechanisms that enable general decision making. Comprehensive models of human cognition typically aim for compatibility with recent research in the cognitive and neural sciences. Global Workspace Theory (GWT), proposed by the neuropsychologist Bernard Baars (1988), is a highly regarded model of human cognition that is currently being computationally instantiated in several software implementations. LIDA (Franklin et al. 2005) is one such computational implementation. LIDA is both a set of computational tools and an underlying model of human cognition, which provides mechanisms that are capable of explaining how an agent’s selection of its next action arises from bottom-up collection of sensory data and top-down processes for making sense of its current situation. We will describe how the LIDA model helps integrate emotions into the human decision making process, and elucidate a process whereby an agent can work through an ethical problem to reach a solution that takes account of ethically relevant factors

    Continuum: an architecture for user evolvable collaborative virtual environments

    Get PDF
    Continuum is a software platform for collaborative virtual environments. Continuum\u27s architecture supplies a world model and defines how to combine object state, behavior code, and resource data into this single shared structure. The system frees distributed users from the constraints of monolithic centralized virtual world architectures and instead allows individual users to extend and evolve the virtual world by creating and controlling their own individual pieces of the larger world model. The architecture provides support for data distribution, code management, resource management, and rapid deployment through standardized viewers. This work not only provides this architecture, but it includes a proven implementation and the associated development tools to allow for creation of these worlds

    An Efficient NoC-based Framework To Improve Dataflow Thread Management At Runtime

    Get PDF
    This doctoral thesis focuses on how the application threads that are based on dataflow execution model can be managed at Network-on-Chip (NoC) level. The roots of the dataflow execution model date back to the early 1970’s. Applications adhering to such program execution model follow a simple producer-consumer communication scheme for synchronising parallel thread related activities. In dataflow execution environment, a thread can run if and only if all its required inputs are available. Applications running on a large and complex computing environment can significantly benefit from the adoption of dataflow model. In the first part of the thesis, the work is focused on the thread distribution mechanism. It has been shown that how a scalable hash-based thread distribution mechanism can be implemented at the router level with low overheads. To enhance the support further, a tool to monitor the dataflow threads’ status and a simple, functional model is also incorporated into the design. Next, a software defined NoC has been proposed to manage the distribution of dataflow threads by exploiting its reconfigurability. The second part of this work is focused more on NoC microarchitecture level. Traditional 2D-mesh topology is combined with a standard ring, to understand how such hybrid network topology can outperform the traditional topology (such as 2D-mesh). Finally, a mixed-integer linear programming based analytical model has been proposed to verify if the application threads mapped on to the free cores is optimal or not. The proposed mathematical model can be used as a yardstick to verify the solution quality of the newly developed mapping policy. It is not trivial to provide a complete low-level framework for dataflow thread execution for better resource and power management. However, this work could be considered as a primary framework to which improvements could be carried out

    Multi-GPU support on the marrow algorithmic skeleton framework

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaWith the proliferation of general purpose GPUs, workload parallelization and datatransfer optimization became an increasing concern. The natural evolution from using a single GPU, is multiplying the amount of available processors, presenting new challenges, as tuning the workload decompositions and load balancing, when dealing with heterogeneous systems. Higher-level programming is a very important asset in a multi-GPU environment, due to the complexity inherent to the currently used GPGPU APIs (OpenCL and CUDA), because of their low-level and code overhead. This can be obtained by introducing an abstraction layer, which has the advantage of enabling implicit optimizations and orchestrations such as transparent load balancing mechanism and reduced explicit code overhead. Algorithmic Skeletons, previously used in cluster environments, have recently been adapted to the GPGPU context. Skeletons abstract most sources of code overhead, by defining computation patterns of commonly used algorithms. The Marrow algorithmic skeleton library is one of these, taking advantage of the abstractions to automate the orchestration needed for an efficient GPU execution. This thesis proposes the extension of Marrow to leverage the use of algorithmic skeletons in the modular and efficient programming of multiple heterogeneous GPUs, within a single machine. We were able to achieve a good balance between simplicity of the programming model and performance, obtaining good scalability when using multiple GPUs, with an efficient load distribution, although at the price of some overhead when using a single-GPU.projects PTDC/EIA-EIA/102579/2008 and PTDC/EIA-EIA/111518/200

    Portability and performance in heterogeneous many core Systems

    Get PDF
    Dissertação de mestrado em InformáticaCurrent computing systems have a multiplicity of computational resources with different architectures, such as multi-core CPUs and GPUs. These platforms are known as heterogeneous many-core systems (HMS) and as computational resources evolve they are o ering more parallelism, as well as becoming more heterogeneous. Exploring these devices requires the programmer to be aware of the multiplicity of associated architectures, computing models and development framework. Portability issues, disjoint memory address spaces, work distribution and irregular workload patterns are major examples that need to be tackled in order to e ciently explore the computational resources of an HMS. This dissertation goal is to design and evaluate a base architecture that enables the identi cation and preliminary evaluation of the potential bottlenecks and limitations of a runtime system that addresses HMS. It proposes a runtime system that eases the programmer burden of handling all the devices available in a heterogeneous system. The runtime provides a programming and execution model with a uni ed address space managed by a data management system. An API is proposed in order to enable the programmer to express applications and data in an intuitive way. Four di erent scheduling approaches are evaluated that combine di erent data partitioning mechanisms with di erent work assignment policies and a performance model is used to provide some performance insights to the scheduler. The runtime e ciency was evaluated with three di erent applications - matrix multiplication, image convolution and n-body Barnes-Hut simulation - running in multicore CPUs and GPUs. In terms of productivity the results look promising, however, combining scheduling and data partitioning revealed some ine ciencies that compromise load balancing and needs to be revised, as well as the data management system that plays a crucial role in such systems. Performance model driven decisions were also evaluated which revealed that the accuracy of a performance model is also a compromising component
    corecore