46 research outputs found

    Vector-thread architecture and implementation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 181-186).This thesis proposes vector-thread architectures as a performance-efficient solution for all-purpose computing. The VT architectural paradigm unifies the vector and multithreaded compute models. VT provides the programmer with a control processor and a vector of virtual processors. The control processor can use vector-fetch commands to broadcast instructions to all the VPs or each VP can use thread-fetches to direct its own control flow. A seamless intermixing of the vector and threaded control mechanisms allows a VT architecture to flexibly and compactly encode application parallelism and locality. VT architectures can efficiently exploit a wide variety of loop-level parallelism, including non-vectorizable loops with cross-iteration dependencies or internal control flow. The Scale VT architecture is an instantiation of the vector-thread paradigm designed for low-power and high-performance embedded systems. Scale includes a scalar RISC control processor and a four-lane vector-thread unit that can execute 16 operations per cycle and supports up to 128 simultaneously active virtual processor threads. Scale provides unit-stride and strided-segment vector loads and stores, and it implements cache refill/access decoupling. The Scale memory system includes a four-port, non-blocking, 32-way set-associative, 32 KB cache. A prototype Scale VT processor was implemented in 180 nm technology using an ASIC-style design flow. The chip has 7.1 million transistors and a core area of 16.6 mm2, and it runs at 260 MHz while consuming 0.4-1.1 W. This thesis evaluates Scale using a diverse selection of embedded benchmarks, including example kernels for image processing, audio processing, text and data processing, cryptography, network processing, and wireless communication.(cont.) Larger applications also include a JPEG image encoder and an IEEE 802.11 la wireless transmitter. Scale achieves high performance on a range of different types of codes, generally executing 3-11 compute operations per cycle. Unlike other architectures which improve performance at the expense of increased energy consumption, Scale is generally even more energy efficient than a scalar RISC processor.by Ronny Meir Krashinsky.Ph.D

    Hydrodynamics-Biology Coupling for Algae Culture and Biofuel Production

    Get PDF
    International audienceBiofuel production from microalgae represents an acute optimization problem for industry. There is a wide range of parameters that must be taken into account in the development of this technology. Here, mathematical modelling has a vital role to play. The potential of microalgae as a source of biofuel and as a technological solution for CO2 fixation is the subject of intense academic and industrial research. Large-scale production of microalgae has potential for biofuel applications owing to the high productivity that can be attained in high-rate raceway ponds. We show, through 3D numerical simulations, that our approach is capable of discriminating between situations where the paddle wheel is rapidly moving water or slowly agitating the process. Moreover, the simulated velocity fields can provide lagrangian trajectories of the algae. The resulting light pattern to which each cell is submitted when travelling from light (surface) to dark (bottom) can then be derived. It will then be reproduced in lab experiments to study photosynthesis under realistic light patterns

    Proceedings of the 7th International Conference on PGAS Programming Models

    Get PDF

    Run-time support for multi-level disjoint memory address spaces

    Get PDF
    High Performance Computing (HPC) systems have become widely used tools in many industry areas and research fields. Research to produce more powerful and efficient systems has grown in par with their popularity. As a consequence, the complexity of modern HPC architectures has increased in order to provide systems with the highest levels of performance. This increased complexity has also affected the way HPC systems are programmed. HPC users have to deal with new devices, languages and tools, and this is can be a significant access barrier to people that do not have a deep knowledge in computer science. On par with the evolution of HPC systems, programming models have also evolved to ease the task of developing applications for these machines. Two well-known examples are OpenMP and MPI. The former can be used in shared memory systems and is praised for offering an easy methodology of software development. The latter is more popular because it targets distributed environments but it is considered burdensome to use. Besides these two, many programming models have emerged to propose new methodologies or to handle new hardware devices. One of these models is OmpSs. OmpSs is a programming model for modern HPC systems that is based on OpenMP and StarSs. Developed by the Programming Models group at the Barcelona Supercomputing Center, it targets the latest generation of HPC systems while benefiting from the ease of use of OpenMP. OmpSs offers asynchronous parallelism with the concept of tasks with data dependencies. These tasks allow the specification of sections of code that can be executed in parallel while the dependencies specify the restrictions about the order in which the tasks can be executed. With this, OmpSs programs can adapt to a many different system configurations while fundamentally still being sequential programs with annotations. This thesis explores the benefits of providing OmpSs the capability to target architectures with complex memory hierarchies. An example of such systems can be the new generation of clusters that use accelerators to power their computing capabilities. The memory hierarchy of these machines is composed of a first level of distributed memory formed by the memory of each individual node, and a second level formed by the private memory of each accelerator devices. Our first contribution shows the implementation of the support of cluster of multi-cores for the OmpSs programming model. We also present two optimizations to boost the performance of applications running on top of cluster systems: a specific task scheduling policy and the addition of slave-to-slave transfers. We evaluate our implementation using a set of benchmarks coded in OmpSs and we also compare them against the same applications implemented using MPI, the most widely used programming model for these systems. We extend our initial implementation in our second contribution, which provides OmpSs with support for clusters of GPUs. We show that OmpSs programs targeting these complex systems are capable of achieving a good performance when compared against MPI+CUDA implementations. The third contribution of this thesis presents an implementation and evaluation of the performance and programmability impact of supporting non-contiguous memory regions. Offering this feature allows applications with complex data accesses to be easily annotated with OmpSs. This is important to widen the spectrum of applications that can be handled by the programming model.Els sistemes de computació d'altes prestacions (CAP) han esdevingut eines importants en diferents sectors industrials i camps de recerca. La recerca per produir sistemes més potents i eficients ha crescut proporcionalment a aquesta popularitat. Com a conseqüència, la complexitat d'aquest tipus de sistemes s'ha incrementat per tal de dotar-los d'altes prestacions. Aquest increment en la complexitat també ha afectat la manera de programar aquest tipus de sistemes. Els usuaris de sistemes CAP han de treballar amb nous dispositius, llenguatges i eines, i això pot convertir-se en una barrera d'entrada significativa per aquelles persones que no tinguin uns alts coneixements informàtics. Seguin l'evolució dels sistemes CAP, els models de programació també han evolucionat per tal de facilitar la tasca de desenvolupar aplicacions per aquests sistemes. Dos exemples ben coneguts son OpenMP i MPI. El primer es pot utilitzar en sistemes de memòria compartida i es reconegut per oferir una metodologia de desenvolupament senzilla. El segon és més popular perquè està dissenyat per sistemes distribuïts, però està considerat difícil d'utilitzar. A part d'aquests dos, altres models de programació han sorgit per proposar noves metodologies o per suportar nous components hardware. Un d'aquests nous models és OmpSs. OmpSs és un model de programació per sistemes CAP moderns que està basat en OpenMP i StarSs. Desenvolupat pel grup de Models de Programació del Barcelona Supercomputing Center, està dissenyat per suportar la darrera generació de sistemes CAP i alhora oferir la facilitat d'us d'OpenMP. OmpSs ofereix paral·lelisme asíncron mitjançant el concepte de tasques amb dependències de dades. Aquestes tasques permeten especificar regions de codi que poden ser executades en paral·lel, mentre que les dependències especifiquen les restriccions sobre l'ordre en que aquestes tasques poden ser executades. Amb això, els programes fets amb OmpSs poden adaptar-se a sistemes amb diferents configuracions tot i ser fonamentalment programes seqüencials amb anotacions. Aquesta tesi explora els beneficis de proveir a OmpSs amb la capacitat de funcionar sobre arquitectures amb jerarquies de memòria complexes. Un exemple d'un sistema així pot ser un dels clústers de nova generació que utilitzen acceleradors per tal d'oferir més capacitat de càlcul. La jerarquia de memòria en aquestes màquines està composada per un primer nivell de memòria distribuïda formada per la memòria de cada node individual, i el segon nivell està format per la memòria privada de cada accelerador. La primera contribució d'aquesta tesi mostra la implementació del suport de clústers de multi-cores pel model de programació OmpSs. També presentem dos optimitzacions per millorar el rendiment de les aplicacions quan s'executen en sistemes clúster: una política de planificació de tasques específica i la incorporació dels missatges entre nodes esclaus. Avaluem la nostra implementació usant un conjunt d'aplicacions programades en OmpSs i també les comparem amb les mateixes aplicacions implementades usant MPI, el model de programació més estès per aquest tipus de sistemes. En la segona contribució estenem la nostra implementació inicial per tal de dotar OmpSs de suport per clústers de GPUs. Mostrem que els programes OmpSs son capaços d'obtenir un bon rendiment sobre aquests tipus de sistemes, fins i tot quan els comparem amb versions implementades usant MPI+CUDA. La tercera contribució descriu la implementació i avaluació del rendiment i de l'impacte de suportar regions de memòria no contigües. Oferir aquesta funcionalitat permet implementar fàcilment amb OmpSs aplicacions amb accessos complexes a memòria, cosa que és important de cara a ampliar l'espectre d'aplicacions que poden ser tractades pel model de programació

    Cognitive Hyperconnected Digital Transformation

    Get PDF
    Cognitive Hyperconnected Digital Transformation provides an overview of the current Internet of Things (IoT) landscape, ranging from research, innovation and development priorities to enabling technologies in a global context. It is intended as a standalone book in a series that covers the Internet of Things activities of the IERC-Internet of Things European Research Cluster, including both research and technological innovation, validation and deployment. The book builds on the ideas put forward by the European Research Cluster, the IoT European Platform Initiative (IoT-EPI) and the IoT European Large-Scale Pilots Programme, presenting global views and state-of-the-art results regarding the challenges facing IoT research, innovation, development and deployment in the next years. Hyperconnected environments integrating industrial/business/consumer IoT technologies and applications require new IoT open systems architectures integrated with network architecture (a knowledge-centric network for IoT), IoT system design and open, horizontal and interoperable platforms managing things that are digital, automated and connected and that function in real-time with remote access and control based on Internet-enabled tools. The IoT is bridging the physical world with the virtual world by combining augmented reality (AR), virtual reality (VR), machine learning and artificial intelligence (AI) to support the physical-digital integrations in the Internet of mobile things based on sensors/actuators, communication, analytics technologies, cyber-physical systems, software, cognitive systems and IoT platforms with multiple functionalities. These IoT systems have the potential to understand, learn, predict, adapt and operate autonomously. They can change future behaviour, while the combination of extensive parallel processing power, advanced algorithms and data sets feed the cognitive algorithms that allow the IoT systems to develop new services and propose new solutions. IoT technologies are moving into the industrial space and enhancing traditional industrial platforms with solutions that break free of device-, operating system- and protocol-dependency. Secure edge computing solutions replace local networks, web services replace software, and devices with networked programmable logic controllers (NPLCs) based on Internet protocols replace devices that use proprietary protocols. Information captured by edge devices on the factory floor is secure and accessible from any location in real time, opening the communication gateway both vertically (connecting machines across the factory and enabling the instant availability of data to stakeholders within operational silos) and horizontally (with one framework for the entire supply chain, across departments, business units, global factory locations and other markets). End-to-end security and privacy solutions in IoT space require agile, context-aware and scalable components with mechanisms that are both fluid and adaptive. The convergence of IT (information technology) and OT (operational technology) makes security and privacy by default a new important element where security is addressed at the architecture level, across applications and domains, using multi-layered distributed security measures. Blockchain is transforming industry operating models by adding trust to untrusted environments, providing distributed security mechanisms and transparent access to the information in the chain. Digital technology platforms are evolving, with IoT platforms integrating complex information systems, customer experience, analytics and intelligence to enable new capabilities and business models for digital business

    ICS Materials. Towards a re-Interpretation of material qualities through interactive, connected, and smart materials.

    Get PDF
    The domain of materials for design is changing under the influence of an increased technological advancement, miniaturization and democratization. Materials are becoming connected, augmented, computational, interactive, active, responsive, and dynamic. These are ICS Materials, an acronym that stands for Interactive, Connected and Smart. While labs around the world are experimenting with these new materials, there is the need to reflect on their potentials and impact on design. This paper is a first step in this direction: to interpret and describe the qualities of ICS materials, considering their experiential pattern, their expressive sensorial dimension, and their aesthetic of interaction. Through case studies, we analyse and classify these emerging ICS Materials and identified common characteristics, and challenges, e.g. the ability to change over time or their programmability by the designers and users. On that basis, we argue there is the need to reframe and redesign existing models to describe ICS materials, making their qualities emerge

    Cognitive Hyperconnected Digital Transformation

    Get PDF
    Cognitive Hyperconnected Digital Transformation provides an overview of the current Internet of Things (IoT) landscape, ranging from research, innovation and development priorities to enabling technologies in a global context. It is intended as a standalone book in a series that covers the Internet of Things activities of the IERC-Internet of Things European Research Cluster, including both research and technological innovation, validation and deployment. The book builds on the ideas put forward by the European Research Cluster, the IoT European Platform Initiative (IoT-EPI) and the IoT European Large-Scale Pilots Programme, presenting global views and state-of-the-art results regarding the challenges facing IoT research, innovation, development and deployment in the next years. Hyperconnected environments integrating industrial/business/consumer IoT technologies and applications require new IoT open systems architectures integrated with network architecture (a knowledge-centric network for IoT), IoT system design and open, horizontal and interoperable platforms managing things that are digital, automated and connected and that function in real-time with remote access and control based on Internet-enabled tools. The IoT is bridging the physical world with the virtual world by combining augmented reality (AR), virtual reality (VR), machine learning and artificial intelligence (AI) to support the physical-digital integrations in the Internet of mobile things based on sensors/actuators, communication, analytics technologies, cyber-physical systems, software, cognitive systems and IoT platforms with multiple functionalities. These IoT systems have the potential to understand, learn, predict, adapt and operate autonomously. They can change future behaviour, while the combination of extensive parallel processing power, advanced algorithms and data sets feed the cognitive algorithms that allow the IoT systems to develop new services and propose new solutions. IoT technologies are moving into the industrial space and enhancing traditional industrial platforms with solutions that break free of device-, operating system- and protocol-dependency. Secure edge computing solutions replace local networks, web services replace software, and devices with networked programmable logic controllers (NPLCs) based on Internet protocols replace devices that use proprietary protocols. Information captured by edge devices on the factory floor is secure and accessible from any location in real time, opening the communication gateway both vertically (connecting machines across the factory and enabling the instant availability of data to stakeholders within operational silos) and horizontally (with one framework for the entire supply chain, across departments, business units, global factory locations and other markets). End-to-end security and privacy solutions in IoT space require agile, context-aware and scalable components with mechanisms that are both fluid and adaptive. The convergence of IT (information technology) and OT (operational technology) makes security and privacy by default a new important element where security is addressed at the architecture level, across applications and domains, using multi-layered distributed security measures. Blockchain is transforming industry operating models by adding trust to untrusted environments, providing distributed security mechanisms and transparent access to the information in the chain. Digital technology platforms are evolving, with IoT platforms integrating complex information systems, customer experience, analytics and intelligence to enable new capabilities and business models for digital business

    Building the Future Internet through FIRE

    Get PDF
    The Internet as we know it today is the result of a continuous activity for improving network communications, end user services, computational processes and also information technology infrastructures. The Internet has become a critical infrastructure for the human-being by offering complex networking services and end-user applications that all together have transformed all aspects, mainly economical, of our lives. Recently, with the advent of new paradigms and the progress in wireless technology, sensor networks and information systems and also the inexorable shift towards everything connected paradigm, first as known as the Internet of Things and lately envisioning into the Internet of Everything, a data-driven society has been created. In a data-driven society, productivity, knowledge, and experience are dependent on increasingly open, dynamic, interdependent and complex Internet services. The challenge for the Internet of the Future design is to build robust enabling technologies, implement and deploy adaptive systems, to create business opportunities considering increasing uncertainties and emergent systemic behaviors where humans and machines seamlessly cooperate

    Mechanisms for service-oriented resource allocation in IoT

    Get PDF
    Albeit several IoT applications have been recently deployed in several fields, including environment and industry monitoring, Smart Home, Smart Hospital and Smart Agriculture, current deployments are mostly host-oriented, which is undoubtedly limiting the attained benefits brought up by IoT. Indeed, future IoT applications shall benefit from service-oriented communications, where the communication establishment between end-points is not dependent on prior knowledge of the host devices in charge of providing the service execution. Rather, an end-user service execution request is mapped into the most suitable resources able to provide the requested service. Furthermore, this model is a key enabler for the design of future services in Smart Cities, e-Health, Intelligent Transportation Systems, among other smart scenarios. Recognized the benefits of this model in future applications, considerable research effort must be devoted for addressing several challenges yet unsolved, such as the ones brought up by the high dynamicity and heterogeneity inherent to these scenarios. In fact, service-oriented communication requires an updated view of available resources, mapping service requests into the most suitable resources taking several constraints and requirements into account, resilience provisioning, QoS-aware service allocation, just to name a few. This thesis aims at proposing and evaluating mechanisms for efficient resource allocation in service-oriented IoT scenarios through the employment of two distinct baseline technologies. In the first approach, the so-called Path Computation Element (PCE), designed to decouple the host-oriented routing function from GMPLS switches in a centralized element, is extended to the service-oriented PCE (S-PCE) architecture, where a service identifier (SID) is used to identify the service required by an end-user. In this approach, the service request is mapped to one or a set of resources by a 2-steps mapping scheme that enables both selection of suitable resources according to request and resources characteristics, and avoidance of service disruption due to possible changes on resources¿ location. In the meantime, the inception of fog computing, as an extension of the cloud computing concept, leveraging idle computing resources at the edge of the network through their organization as highly virtualized micro data centers (MDC) enabled the reduction on the network latency observed by services launched at edge devices, further reducing the traffic at the core network and the energy consumption by network and cloud data center equipment, besides other benefits. Envisioning the benefits of the distributed and coordinated employment of both fog and cloud resources, the Fog-to-Cloud (F2C) architecture has been recently proposed, further empowering the distributed allocation of services into the most suitable resources, be it in cloud, fog or both. Since future IoT applications shall present strict demands that may be satisfied through a combined fog-cloud solution, aligned to the F2C architecture, the second approach for the service-oriented resource allocation, considered in this thesis, aims at providing QoS-aware resource allocation through the deployment of a hierarchical F2C topology, where resource are logically distributed into layers providing distinct characteristics in terms of network latency, disruption probability, IT power, etc. Therefore, distinct strategies for service distribution in F2C architectures, taking into consideration features such as service transmission delay, energy consumption and network load. Concerning the need for failure recovery mechanisms, distinct demands of heterogeneous services are considered in order to assess distinct strategies for allocation of protection resources in the F2C hierarchy. In addition, the impact of the layered control topology on the efficient allocation of resources in F2C is further evaluated. Finally, avenues for future work are presented.Aunque son ya varias las aplicaciones que se han desarrollado en el área de IoT, especialmente en el campo ambiental, Smart Home o Smart Health, las implementaciones actuales son en su mayoría ¿host-oriented¿, lo que sin duda limita sus potenciales beneficios. Una posible estrategia para reducir esos efectos negativos se centra en que las futuras aplicaciones se beneficien de las comunicaciones orientadas a servicios, ¿service-oriented¿, donde el establecimiento de comunicación entre puntos finales no depende del conocimiento previo de los hosts a cargo de proporcionar la ejecución del servicio. En este escenario, una solicitud de ejecución de servicio se asigna a los recursos más adecuados capaces de proporcionar el servicio solicitado. Este modelo se considera clave para el despliegue de futuros servicios en Smart Cities, e-Health, Intelligent Transportation Systems, etc. Reconocidos los beneficios de este modelo en las aplicaciones futuras, un substancial esfuerzo de investigación es necesario para abordar varios desafíos aún no resueltos, como los surgidos por la alta dinámica y heterogeneidad inherente a estos escenarios. De hecho, la comunicación service-oriented requiere una vista actualizada de los recursos disponibles, así como la asignación de solicitudes de servicio en los recursos más adecuados teniendo en cuenta varias restricciones y requisitos. Esta tesis tiene como objetivo proponer y evaluar mecanismos para la asignación eficiente de recursos en escenarios IoT orientados a servicios a través del empleo de dos tecnologías básicas distintas. En el primer enfoque, el llamado Path Computation Element (PCE), diseñado para desacoplar la función de enrutamiento de los conmutadores GMPLS hacia un elemento centralizado, se extiende generando la arquitectura service-oriented PCE (S-PCE). En S-PCE se utiliza un identificador de servicio (SID) para identificar el servicio requerido por un usuario final, y la solicitud se asigna, bien a uno o bien a un conjunto de recursos, mediante un esquema de asignación de 2 pasos que permite la selección de los recursos adecuados, evitando la interrupción del servicio debido a posibles cambios en la ubicación de los recursos. Mientras tanto, el inicio de Fog computing, como una extensión de Cloud computing, basado conceptualmente en aprovechar la infraestructura y los recursos inactivos en el extremo de la red a través de su organización como micro data centers (MDC), ha supuesto la reducción de la latencia de la red para los servicios lanzados por dispositivos localizados en el extremo de la red, reduciendo el tráfico en el centro de la red (backbone) así como el consumo de energía, además de otros beneficios. Asumiendo las ventajas de la utilización distribuida y coordinada de los recursos fog y cloud, la arquitectura Fog-to-Cloud (F2C) ha sido recientemente propuesta, destinada a potenciar la asignación distribuida de servicios en los recursos más adecuados, sea en cloud, fog o ambos. Dado que las futuras aplicaciones IoT deben presentar demandas que podrían ser satisfechas a través de una solución alineada con la arquitectura F2C, el segundo enfoque para la asignación de recurso orientado a servicio, considerado en esta tesis, tiene como objetivo proporcionar una asignación de recursos mediante el despliegue de una topología F2C, donde los recursos se distribuyen lógicamente en capas que proporcionan características distintas en términos de latencia de red, probabilidad de interrupción, etc. Así, se proponen distintas estrategias para la distribución de servicios, teniendo en cuenta características tales como QoS y consumo de energía. Con respecto a la necesidad de mecanismos de recuperación de fallos, se evalúan distintas estrategias para la asignación de recursos de protección en la jerarquía F2C. Además, se evalúa el impacto de la topología de control en capas sobre la asignación eficiente de recursos en F2C. Finalmente, las sugerencias para trabajos futuros son presentadas

    High performance Java for multi-core systems

    Get PDF
    [Abstract] The interest in Java within the High Performance Computing (HPC) community has been rising during the last years thanks to its noticeable performance improvements and its productivity features. In a context where the trend to increase the number of cores per processor is leading to the generalization of many-core processors and accelerators, multithreading as an inherent feature of the language makes Java extremely interesting to exploit the performance provided by multi- and manycore architectures. This PhD Thesis presents a thorough analysis of the current state of the art regarding multi- and many-core programming in Java and provides the design, implementation and evaluation of several solutions to enable Java for the many-core era. To achieve this, a shared memory message-passing solution has been implemented to provide shared memory programming with the scalability of distributed memory paradigms, also with the benefits of a portable programming model that allows the developed codes to be run on distributed memory systems. Moreover, representative collective operations, involving computation and communication among different processes or threads, have been optimized, also introducing in Java new features for scalability from the MPI 3.0 specification, namely nonblocking collectives. Regarding the exploitation of many-core architectures, the lack of direct Java support forces to resort to wrappers or higher-level solutions to translate Java code into CUDA or OpenCL. The most relevant among these solutions have been evaluated and thoroughly analyzed in terms of performance and productivity. Guidelines for taking advantage of shared memory environments have been derived during the analysis and development of the proposed solutions, and the main conclusion is that the use of Java for shared memory programming on multi- and many-core systems is not only productive but also can provide high performance competitive results. However, in order to effectively take advantage of the underlying multi- and many-core architectures, the key is the availability of optimized middleware that abstracts multithreading details from the user, like the one proposed in this Thesis, and the optimization of common operations like collective communications
    corecore