726 research outputs found

    On the adequacy of lightweight thread approaches for high-level parallel programming models

    Get PDF
    High-level parallel programming models (PMs) are becoming crucial in order to extract the computational power of current on-node multi-threaded parallelism. The most popular PMs, such as OpenMP or OmpSs, are directive-based: the complexity of the hardware is hidden by the underlying runtime system, improving coding productivity. The implementations of OpenMP usually rely on POSIX threads (pthreads), offering excellent performance for coarse-grained parallelism and a perfect match with the current hardware. OmpSs is a task oriented PM based on an ad hoc runtime solution called Nanos++; it is the precursor of the tasking parallelism in the OpenMP tasking specification. A recent trend in runtimes and applications points to leveraging massive on-node parallelism in conjunction with fine-grained and dynamic scheduling paradigms. In this paper we analyze the behavior of the OpenMP and OmpSs PMs on top of the recently emerged Generic Lightweight Threads (GLT) API. GLT exposes a common API for lightweight thread (LWT) libraries that offers the possibility of running the same application over different native LWT solutions. We describe the design details of those high-level PMs implemented on top of GLT and analyze different scenarios in order to assess where the use of LWTs may benefit application performance. Our work reveals those scenarios where LWTs overperform pthread-based solutions and compares the performance between an ad hoc solution and a generic implementation.The researchers from the Universitat Jaume I de Castelló were supported by project TIN2014-53495-R of the MINECO, Spain and FEDER, Spain, the Generalitat Valenciana fellowship programme, Spain Vali+d 2015. Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness, Spain under Juan de la Cierva fellowship number IJCI-2015-23266. This work was partially supported by the U.S. Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research (SC-21), under contract DE-AC02-06CH11357. We gratefully acknowledge Enrique S. Quintana-Ortí (Universitat Jaume I) and Sangmin Seo (Samsung Corp.) for their advice in this work and the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.Peer ReviewedPostprint (author's final draft

    Sensor networks and distributed CSP: communication, computation and complexity

    Get PDF
    We introduce SensorDCSP, a naturally distributed benchmark based on a real-world application that arises in the context of networked distributed systems. In order to study the performance of Distributed CSP (DisCSP) algorithms in a truly distributed setting, we use a discrete-event network simulator, which allows us to model the impact of different network traffic conditions on the performance of the algorithms. We consider two complete DisCSP algorithms: asynchronous backtracking (ABT) and asynchronous weak commitment search (AWC), and perform performance comparison for these algorithms on both satisfiable and unsatisfiable instances of SensorDCSP. We found that random delays (due to network traffic or in some cases actively introduced by the agents) combined with a dynamic decentralized restart strategy can improve the performance of DisCSP algorithms. In addition, we introduce GSensorDCSP, a plain-embedded version of SensorDCSP that is closely related to various real-life dynamic tracking systems. We perform both analytical and empirical study of this benchmark domain. In particular, this benchmark allows us to study the attractiveness of solution repairing for solving a sequence of DisCSPs that represent the dynamic tracking of a set of moving objects.This work was supported in part by AFOSR (F49620-01-1-0076, Intelligent Information Systems Institute and MURI F49620-01-1-0361), CICYT (TIC2001-1577-C03-03 and TIC2003-00950), DARPA (F30602-00-2- 0530), an NSF CAREER award (IIS-9734128), and an Alfred P. Sloan Research Fellowship. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the US Government

    Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

    Full text link
    Asynchronous programming models (APM) are gaining more and more traction, allowing applications to expose the available concurrency to a runtime system tasked with coordinating the execution. While MPI has long provided support for multi-threaded communication and non-blocking operations, it falls short of adequately supporting APMs as correctly and efficiently handling MPI communication in different models is still a challenge. Meanwhile, new low-level implementations of light-weight, cooperatively scheduled execution contexts (fibers, aka user-level threads (ULT)) are meant to serve as a basis for higher-level APMs and their integration in MPI implementations has been proposed as a replacement for traditional POSIX thread support to alleviate these challenges. In this paper, we first establish a taxonomy in an attempt to clearly distinguish different concepts in the parallel software stack. We argue that the proposed tight integration of fiber implementations with MPI is neither warranted nor beneficial and instead is detrimental to the goal of MPI being a portable communication abstraction. We propose MPI Continuations as an extension to the MPI standard to provide callback-based notifications on completed operations, leading to a clear separation of concerns by providing a loose coupling mechanism between MPI and APMs. We show that this interface is flexible and interacts well with different APMs, namely OpenMP detached tasks, OmpSs-2, and Argobots.Comment: 12 pages, 7 figures Published in proceedings of EuroMPI/USA '20, September 21-24, 2020, Austin, TX, US

    A Unified Framework for Parallel Anisotropic Mesh Adaptation

    Get PDF
    Finite-element methods are a critical component of the design and analysis procedures of many (bio-)engineering applications. Mesh adaptation is one of the most crucial components since it discretizes the physics of the application at a relatively low cost to the solver. Highly scalable parallel mesh adaptation methods for High-Performance Computing (HPC) are essential to meet the ever-growing demand for higher fidelity simulations. Moreover, the continuous growth of the complexity of the HPC systems requires a systematic approach to exploit their full potential. Anisotropic mesh adaptation captures features of the solution at multiple scales while, minimizing the required number of elements. However, it also introduces new challenges on top of mesh generation. Also, the increased complexity of the targeted cases requires departing from traditional surface-constrained approaches to utilizing CAD (Computer-Aided Design) kernels. Alongside the functionality requirements, is the need of taking advantage of the ubiquitous multi-core machines. More importantly, the parallel implementation needs to handle the ever-increasing complexity of the mesh adaptation code. In this work, we develop a parallel mesh adaptation method that utilizes a metric-based approach for generating anisotropic meshes. Moreover, we enhance our method by interfacing with a CAD kernel, thus enabling its use on complex geometries. We evaluate our method both with fixed-resolution benchmarks and within a simulation pipeline, where the resolution of the discretization increases incrementally. With the Telescopic Approach for scalable mesh generation as a guide, we propose a parallel method at the node (multi-core) for mesh adaptation that is expected to scale up efficiently to the upcoming exascale machines. To facilitate an effective implementation, we introduce an abstract layer between the application and the runtime system that enables the use of task-based parallelism for concurrent mesh operations. Our evaluation indicates results comparable to state-of-the-art methods for fixed-resolution meshes both in terms of performance and quality. The integration with an adaptive pipeline offers promising results for the capability of the proposed method to function as part of an adaptive simulation. Moreover, our abstract tasking layer allows the separation of different aspects of the implementation without any impact on the functionality of the method

    A TrustZone-assisted secure silicon on a co-design framework

    Get PDF
    Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresEmbedded systems were for a long time, single-purpose and closed systems, characterized by hardware resource constraints and real-time requirements. Nowadays, their functionality is ever-growing, coupled with an increasing complexity and heterogeneity. Embedded applications increasingly demand employment of general-purpose operating systems (GPOSs) to handle operator interfaces and general-purpose computing tasks, while simultaneously ensuring the strict timing requirements. Virtualization, which enables multiple operating systems (OSs) to run on top of the same hardware platform, is gaining momentum in the embedded systems arena, driven by the growing interest in consolidating and isolating multiple and heterogeneous environments. The penalties incurred by classic virtualization approaches is pushing research towards hardware-assisted solutions. Among the existing commercial off-the-shelf (COTS) technologies for virtualization, ARM TrustZone technology is gaining momentum due to the supremacy and lower cost of TrustZone-enabled processors. Programmable system-on-chips (SoCs) are becoming leading players in the embedded systems space, because the combination of a plethora of hard resources with programmable logic enables the efficient implementation of systems that perfectly fit the heterogeneous nature of embedded applications. Moreover, novel disruptive approaches make use of field-programmable gate array (FPGA) technology to enhance virtualization mechanisms. This master’s thesis proposes a hardware-software co-design framework for easing the economy of addressing the new generation of embedded systems requirements. ARM TrustZone is exploited to implement the root-of-trust of a virtualization-based architecture that allows the execution of a GPOS side-by-side with a real-time OS (RTOS). RTOS services were offloaded to hardware, so that it could present simultaneous improvements on performance and determinism. Instead of focusing in a concrete application, the goal is to provide a complete framework, specifically tailored for Zynq-base devices, that developers can use to accelerate a bunch of distinct applications across different embedded industries.Os sistemas embebidos foram, durante muitos anos, sistemas com um simples e único propósito, caracterizados por recursos de hardware limitados e com cariz de tempo real. Hoje em dia, o número de funcionalidades começa a escalar, assim como o grau de complexidade e heterogeneidade. As aplicações embebidas exigem cada vez mais o uso de sistemas operativos (OSs) de uso geral (GPOS) para lidar com interfaces gráficas e tarefas de computação de propósito geral. Porém, os seus requisitos primordiais de tempo real mantém-se. A virtualização permite que vários sistemas operativos sejam executados na mesma plataforma de hardware. Impulsionada pelo crescente interesse em consolidar e isolar ambientes múltiplos e heterogéneos, a virtualização tem ganho uma crescente relevância no domínio dos sistemas embebidos. As adversidades que advém das abordagens de virtualização clássicas estão a direcionar estudos no âmbito de soluções assistidas por hardware. Entre as tecnologias comerciais existentes, a tecnologia ARM TrustZone está a ganhar muita relevância devido à supremacia e ao menor custo dos processadores que suportam esta tecnologia. Plataformas hibridas, que combinam processadores com lógica programável, estão em crescente penetração no domínio dos sistemas embebidos pois, disponibilizam um enorme conjunto de recursos que se adequam perfeitamente à natureza heterogénea dos sistemas atuais. Além disso, existem soluções recentes que fazem uso da tecnologia de FPGA para melhorar os mecanismos de virtualização. Esta dissertação propõe uma framework baseada em hardware-software de modo a cumprir os requisitos da nova geração de sistemas embebidos. A tecnologia TrustZone é explorada para implementar uma arquitetura que permite a execução de um GPOS lado-a-lado com um sistemas operativo de tempo real (RTOS). Os serviços disponibilizados pelo RTOS são migrados para hardware, para melhorar o desempenho e determinismo do OS. Em vez de focar numa aplicação concreta, o objetivo é fornecer uma framework especificamente adaptada para dispositivos baseados em System-on-chips Zynq, de forma a que developers possam usar para acelerar um vasto número de aplicações distintas em diferentes setores

    Application of a proposed TLS model in a Lean Productive System

    Get PDF
    The current market, becoming more rigid, forces companies to search continuously for innovation and improvement of their processes and products as a way to keep competitive and gain strategic advantages. Due to the global economic crisis, more and more companies try an approach through new management methodologies that allow better performances in terms of earning, profit and cost reduction. The present article proposes an integrated TOC (Theory of Constraints), Lean and Six-Sigma (TLS) model, with the objective of improving continuously a productive system, although it shows flexibility to be applied in other kinds of systems. The model synergistically integrate the best practices found in existing TOC, Lean and Six-Sigma models. The proposed model derivatives mainly from Eliyahu Goldratt’s TOC model of the “5 focus steps” and TLS model “Ultimate Improvement Cycle”, developed by Bob Sproull. The proposed TLS model was tested on an important Portuguese Manufacture. The implementation of a first continuous improvement cycle was completed and a second cycle began. The main results obtained by the implementation of the TLS model were extremely satisfactory

    Discrete and Continuous Optimization Methods for Self-Organization in Small Cell Networks - Models and Algorithms

    Get PDF
    Self-organization is discussed in terms of distributed computational methods and algorithms for resource allocation in cellular networks. In order to develop algorithms for different self-organization problems pertinent to small cell networks (SCN), a number of concepts from discrete and continuous optimization theory are employed. Self-organized resource allocation problems such as physical cell identifier (PCI) assignment and primary component carrier selection are formulated as discrete optimization problems. Distributed graph coloring and constraint satisfaction algorithms are used to solve these problems. The PCI assignment is also discussed for multi-operator heterogeneous networks. Furthermore, different variants of simulated annealing are proposed for solving a graph coloring formulation of the orthogonal resource allocation problem. In the continuous optimization domain, a network utility maximization approach is considered for solving different resource allocation problems. Network synchronization is addressed using greedy and gradient search algorithms. Primal and dual decomposition are discussed for transmit power and scheduling weight optimizations, under a network-wide power constraint. Joint optimization over transmit powers and multi-user scheduling weights is considered in a multi-carrier SCN, for both maximum rate and proportional-fair rate utilities. This formulation is extended for multiple-input multiple-output (MIMO) SCNs, where apart from transmit powers and multi-user scheduling weights, the transmit precoders are also optimized, for a generic alpha-fair utility function. Optimization of network resources over multiple degrees of freedom is particularly effective in reducing mutual interference, leading to significant gains in network utility. Finally, an alternate formulation of transmit power allocation is considered, in which the network transmit power is minimized subject to the data rate constraints of users. Thus, network resource allocation algorithms inspired by optimization theory constitute an effective approach for self-organization in contemporary as well as future cellular networks

    Play JBT – Mobile Application for the Tropical Botanical Garden of Lisbon

    Get PDF
    Trabalho de projecto de mestrado, Informática, Universidade de Lisboa, Faculdade de Ciências, 2020Com o progresso das tecnologias de informação e comunicação (TIC), as instituições culturais diversificaram as modalidades de interação com as pessoas. TIC permite hoje as várias instituições culturais de assumir papeis diferentes perante a comunidade (por exemplo, educação dos cidadãos e das suas associações; formador de várias competências; e de perito em vários programas governamentais para desenvolvimento de comunidades). Neste documento está apresentado o trabalho de desenvolvimento de uma aplicação móvel para Jardim Botânico Tropical de Lisboa. Técnicas diversas foram utilizadas no desenvolvimento de aplicação móvel (por exemplo, entrevistas, listagem de conteúdos, prototipagem, avaliação heurística, testes de usabilidade). São apresentados detalhes das tecnologias usadas (software e hardware), procedimentos de implementação, como também sobre arquitetura final do sistema desenvolvido. A aplicação móvel permite aos visitantes de Jardim Botânico Tropical interagir de formas diferentes com os componentes de jardim (plantas, aves e edifícios). Vários recursos educativos são incluídos na aplicação de modo a ser adaptados de modo automático ao perfil do utilizador. A aplicação permite também captar e armazenar os dados produzidos por utilizadores da aplicação de modo a serem utilizados para melhoria de experiência dos visitantes do jardim. Vários serviços Web foram incluídos para melhorar apresentação dos conteúdos e para melhorar os serviços do jardim. Foram também realizados testes com peritos no jardim e recolhido feedback dos utilizadores dos quais recebemos boas críticas e sugestões que foram integradas na aplicação. Foram também realizados um conjunto de testes de desempenho do servidor.Through the progress of information and communication technologies (ICT), cultural institutions have diversified the modalities of interacting with people. Today, ICTs allow various cultural institutions to take on different roles in the community (e.g. educating citizens and their associations; shaping various skills; supporting government programs for community development). This document introduces the process of development of a mobile application, which acts mainly as a helping guide for visitors of the Lisbon Tropical Botanical Garden. This mobile application allows these visitors to interact in different ways with garden components (plants, buildings and birds), as well as to have access to the several educational resources included in it, which are to be adapted to the user's profile. The application also allows them to capture and store the data produced, data which is also used for help with improving garden services. Web services have been developed to provide content and to centrally store data on the visitor’s trajectory in the garden and demographics. Furthermore, various techniques were used in the process of development (e.g. interviews, content listing, prototyping, heuristic evaluation, usability testing). Details on the technologies used (software and hardware), implementation procedures, as well as the final architecture of the developed system will be demonstrated. Finally, a set of usability tests is presented, from which we received positive feedback from the users as well as the performance tests executed on the server
    corecore