675 research outputs found

    Using the Spring Physical Model to Extend a Cooperative Caching Protocol for Many-Core Processors

    Get PDF
    International audienceAs the number of embedded cores grows up, the off-chip memory wall becomes an overwhelming bottleneck. As a consequence, it is more and more prevalent to efficiently exploit on-chip data storage. In a previous work, we proposed a data sliding mechanism that allows to store data onto our closest neighborhood, even under heavy stress loads. However, each cache block is allowed to migrate only one time to a neighbor's cache (e.g. 1-Chance Forwarding). In this paper, we propose an extension of our mechanism in order to expand the cooperative caching area. Our work is based on an adaptive physical model, where each cache block is considered as a mass connected to a spring. This technique constrains data migration according to the spring constant and the difference of work-loads between cores. This adaptive data sliding approach leads to a balanced spread of data on the chip and therefore improves on-chip storage. On-chip data access has been evaluated using an analytical approach. Results show that the extended data sliding increases the global cache hit rate on the chip, especially in the context of juxtaposed hot spots

    A Fast Evaluation Approach of Data Consistency Protocols within a Compilation Toolchain

    Get PDF
    International audienceShared memory is a critical issue for large distributed systems. Despite several data consistency protocols have been proposed, the selection of the protocol that best suits to the application requirements and system constraints remains a challenge. The development of multi-consistency systems, where different protocols can be deployed during runtime, appears to be an interesting alternative. In order to explore the design space of the consistency protocols a fast and accurate method should be used. In this work we rely on a compilation toolchain that transparently handles data consistency decisions for a multi-protocol platform. We focus on the analytical evaluation of the consistency configuration that stands within the optimization loop. We propose to use a TLM NoC simulator to get feedback on expected network contentions. We evaluate the approach using five workloads and three different data consistency protocols. As a result, we are able to obtain a fast and accurate evaluation of the different consistency alternatives

    A shared-disk parallel cluster file system

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaToday, clusters are the de facto cost effective platform both for high performance computing (HPC) as well as IT environments. HPC and IT are quite different environments and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs). These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds. Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include: · Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data. · Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required). A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was slightly modified, and two kernel modules and a user-level daemon were added. In the prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN. Our benchmarks for non-overlapping writers over a single file shared among processes running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa IBM Shared University Research (SUR

    Scalable Storage for Digital Libraries

    Get PDF
    I propose a storage system optimised for digital libraries. Its key features are its heterogeneous scalability; its integration and exploitation of rich semantic metadata associated with digital objects; its use of a name space; and its aggressive performance optimisation in the digital library domain

    SCALABLE MULTI-HOP DATA DISSEMINATION IN VEHICULAR AD HOC NETWORKS

    Get PDF
    Vehicular Ad hoc Networks (VANETs) aim at improving road safety and travel comfort, by providing self-organizing environments to disseminate traffic data, without requiring fixed infrastructure or centralized administration. Since traffic data is of public interest and usually benefit a group of users rather than a specific individual, it is more appropriate to rely on broadcasting for data dissemination in VANETs. However, broadcasting under dense networks suffers from high percentage of data redundancy that wastes the limited radio channel bandwidth. Moreover, packet collisions may lead to the broadcast storm problem when large number of vehicles in the same vicinity rebroadcast nearly simultaneously. The broadcast storm problem is still challenging in the context of VANET, due to the rapid changes in the network topology, which are difficult to predict and manage. Existing solutions either do not scale well under high density scenarios, or require extra communication overhead to estimate traffic density, so as to manage data dissemination accordingly. In this dissertation, we specifically aim at providing an efficient solution for the broadcast storm problem in VANETs, in order to support different types of applications. A novel approach is developed to provide scalable broadcast without extra communication overhead, by relying on traffic regime estimation using speed data. We theoretically validate the utilization of speed instead of the density to estimate traffic flow. The results of simulating our approach under different density scenarios show its efficiency in providing scalable multi-hop data dissemination for VANETs

    Internet of Vehicles and Real-Time Optimization Algorithms: Concepts for Vehicle Networking in Smart Cities

    Get PDF
    Achieving sustainable freight transport and citizens’ mobility operations in modern cities are becoming critical issues for many governments. By analyzing big data streams generated through IoT devices, city planners now have the possibility to optimize traffic and mobility patterns. IoT combined with innovative transport concepts as well as emerging mobility modes (e.g., ridesharing and carsharing) constitute a new paradigm in sustainable and optimized traffic operations in smart cities. Still, these are highly dynamic scenarios, which are also subject to a high uncertainty degree. Hence, factors such as real-time optimization and re-optimization of routes, stochastic travel times, and evolving customers’ requirements and traffic status also have to be considered. This paper discusses the main challenges associated with Internet of Vehicles (IoV) and vehicle networking scenarios, identifies the underlying optimization problems that need to be solved in real time, and proposes an approach to combine the use of IoV with parallelization approaches. To this aim, agile optimization and distributed machine learning are envisaged as the best candidate algorithms to develop efficient transport and mobility systems

    Memory and information processing in neuromorphic systems

    Full text link
    A striking difference between brain-inspired neuromorphic processors and current von Neumann processors architectures is the way in which memory and processing is organized. As Information and Communication Technologies continue to address the need for increased computational power through the increase of cores within a digital processor, neuromorphic engineers and scientists can complement this need by building processor architectures where memory is distributed with the processing. In this paper we present a survey of brain-inspired processor architectures that support models of cortical networks and deep neural networks. These architectures range from serial clocked implementations of multi-neuron systems to massively parallel asynchronous ones and from purely digital systems to mixed analog/digital systems which implement more biological-like models of neurons and synapses together with a suite of adaptation and learning mechanisms analogous to the ones found in biological nervous systems. We describe the advantages of the different approaches being pursued and present the challenges that need to be addressed for building artificial neural processing systems that can display the richness of behaviors seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed neuromorphic computing platforms and system

    Parallel and Distributed Immersive Real-Time Simulation of Large-Scale Networks

    Get PDF
    corecore