2,659 research outputs found

    Resource-aware scheduling for 2D/3D multi-/many-core processor-memory systems

    Get PDF
    This dissertation addresses the complexities of 2D/3D multi-/many-core processor-memory systems, focusing on two key areas: enhancing timing predictability in real-time multi-core processors and optimizing performance within thermal constraints. The integration of an increasing number of transistors into compact chip designs, while boosting computational capacity, presents challenges in resource contention and thermal management. The first part of the thesis improves timing predictability. We enhance shared cache interference analysis for set-associative caches, advancing the calculation of Worst-Case Execution Time (WCET). This development enables accurate assessment of cache interference and the effectiveness of partitioned schedulers in real-world scenarios. We introduce TCPS, a novel task and cache-aware partitioned scheduler that optimizes cache partitioning based on task-specific WCET sensitivity, leading to improved schedulability and predictability. Our research explores various cache and scheduling configurations, providing insights into their performance trade-offs. The second part focuses on thermal management in 2D/3D many-core systems. Recognizing the limitations of Dynamic Voltage and Frequency Scaling (DVFS) in S-NUCA many-core processors, we propose synchronous thread migrations as a thermal management strategy. This approach culminates in the HotPotato scheduler, which balances performance and thermal safety. We also introduce 3D-TTP, a transient temperature-aware power budgeting strategy for 3D-stacked systems, reducing the need for Dynamic Thermal Management (DTM) activation. Finally, we present 3QUTM, a novel method for 3D-stacked systems that combines core DVFS and memory bank Low Power Modes with a learning algorithm, optimizing response times within thermal limits. This research contributes significantly to enhancing performance and thermal management in advanced processor-memory systems

    La traduzione specializzata all’opera per una piccola impresa in espansione: la mia esperienza di internazionalizzazione in cinese di Bioretics© S.r.l.

    Get PDF
    Global markets are currently immersed in two all-encompassing and unstoppable processes: internationalization and globalization. While the former pushes companies to look beyond the borders of their country of origin to forge relationships with foreign trading partners, the latter fosters the standardization in all countries, by reducing spatiotemporal distances and breaking down geographical, political, economic and socio-cultural barriers. In recent decades, another domain has appeared to propel these unifying drives: Artificial Intelligence, together with its high technologies aiming to implement human cognitive abilities in machinery. The “Language Toolkit – Le lingue straniere al servizio dell’internazionalizzazione dell’impresa” project, promoted by the Department of Interpreting and Translation (Forlì Campus) in collaboration with the Romagna Chamber of Commerce (Forlì-Cesena and Rimini), seeks to help Italian SMEs make their way into the global market. It is precisely within this project that this dissertation has been conceived. Indeed, its purpose is to present the translation and localization project from English into Chinese of a series of texts produced by Bioretics© S.r.l.: an investor deck, the company website and part of the installation and use manual of the Aliquis© framework software, its flagship product. This dissertation is structured as follows: Chapter 1 presents the project and the company in detail; Chapter 2 outlines the internationalization and globalization processes and the Artificial Intelligence market both in Italy and in China; Chapter 3 provides the theoretical foundations for every aspect related to Specialized Translation, including website localization; Chapter 4 describes the resources and tools used to perform the translations; Chapter 5 proposes an analysis of the source texts; Chapter 6 is a commentary on translation strategies and choices

    Secure storage systems for untrusted cloud environments

    Get PDF
    The cloud has become established for applications that need to be scalable and highly available. However, moving data to data centers owned and operated by a third party, i.e., the cloud provider, raises security concerns because a cloud provider could easily access and manipulate the data or program flow, preventing the cloud from being used for certain applications, like medical or financial. Hardware vendors are addressing these concerns by developing Trusted Execution Environments (TEEs) that make the CPU state and parts of memory inaccessible from the host software. While TEEs protect the current execution state, they do not provide security guarantees for data which does not fit nor reside in the protected memory area, like network and persistent storage. In this work, we aim to address TEEs’ limitations in three different ways, first we provide the trust of TEEs to persistent storage, second we extend the trust to multiple nodes in a network, and third we propose a compiler-based solution for accessing heterogeneous memory regions. More specifically, • SPEICHER extends the trust provided by TEEs to persistent storage. SPEICHER implements a key-value interface. Its design is based on LSM data structures, but extends them to provide confidentiality, integrity, and freshness for the stored data. Thus, SPEICHER can prove to the client that the data has not been tampered with by an attacker. • AVOCADO is a distributed in-memory key-value store (KVS) that extends the trust that TEEs provide across the network to multiple nodes, allowing KVSs to scale beyond the boundaries of a single node. On each node, AVOCADO carefully divides data between trusted memory and untrusted host memory, to maximize the amount of data that can be stored on each node. AVOCADO leverages the fact that we can model network attacks as crash-faults to trust other nodes with a hardened ABD replication protocol. • TOAST is based on the observation that modern high-performance systems often use several different heterogeneous memory regions that are not easily distinguishable by the programmer. The number of regions is increased by the fact that TEEs divide memory into trusted and untrusted regions. TOAST is a compiler-based approach to unify access to different heterogeneous memory regions and provides programmability and portability. TOAST uses a load/store interface to abstract most library interfaces for different memory regions

    Design and Real-World Evaluation of Dependable Wireless Cyber-Physical Systems

    Get PDF
    The ongoing effort for an efficient, sustainable, and automated interaction between humans, machines, and our environment will make cyber-physical systems (CPS) an integral part of the industry and our daily lives. At their core, CPS integrate computing elements, communication networks, and physical processes that are monitored and controlled through sensors and actuators. New and innovative applications become possible by extending or replacing static and expensive cable-based communication infrastructures with wireless technology. The flexibility of wireless CPS is a key enabler for many envisioned scenarios, such as intelligent factories, smart farming, personalized healthcare systems, autonomous search and rescue, and smart cities. High dependability, efficiency, and adaptivity requirements complement the demand for wireless and low-cost solutions in such applications. For instance, industrial and medical systems should work reliably and predictably with performance guarantees, even if parts of the system fail. Because emerging CPS will feature mobile and battery-driven devices that can execute various tasks, the systems must also quickly adapt to frequently changing conditions. Moreover, as applications become ever more sophisticated, featuring compact embedded devices that are deployed densely and at scale, efficient designs are indispensable to achieve desired operational lifetimes and satisfy high bandwidth demands. Meeting these partly conflicting requirements, however, is challenging due to imperfections of wireless communication and resource constraints along several dimensions, for example, computing, memory, and power constraints of the devices. More precisely, frequent and correlated message losses paired with very limited bandwidth and varying delays for the message exchange significantly complicate the control design. In addition, since communication ranges are limited, messages must be relayed over multiple hops to cover larger distances, such as an entire factory. Although the resulting mesh networks are more robust against interference, efficient communication is a major challenge as wireless imperfections get amplified, and significant coordination effort is needed, especially if the networks are dynamic. CPS combine various research disciplines, which are often investigated in isolation, ignoring their complex interaction. However, to address this interaction and build trust in the proposed solutions, evaluating CPS using real physical systems and wireless networks paired with formal guarantees of a system’s end-to-end behavior is necessary. Existing works that take this step can only satisfy a few of the abovementioned requirements. Most notably, multi-hop communication has only been used to control slow physical processes while providing no guarantees. One of the reasons is that the current communication protocols are not suited for dynamic multi-hop networks. This thesis closes the gap between existing works and the diverse needs of emerging wireless CPS. The contributions address different research directions and are split into two parts. In the first part, we specifically address the shortcomings of existing communication protocols and make the following contributions to provide a solid networking foundation: • We present Mixer, a communication primitive for the reliable many-to-all message exchange in dynamic wireless multi-hop networks. Mixer runs on resource-constrained low-power embedded devices and combines synchronous transmissions and network coding for a highly scalable and topology-agnostic message exchange. As a result, it supports mobile nodes and can serve any possible traffic patterns, for example, to efficiently realize distributed control, as required by emerging CPS applications. • We present Butler, a lightweight and distributed synchronization mechanism with formally guaranteed correctness properties to improve the dependability of synchronous transmissions-based protocols. These protocols require precise time synchronization provided by a specific node. Upon failure of this node, the entire network cannot communicate. Butler removes this single point of failure by quickly synchronizing all nodes in the network without affecting the protocols’ performance. In the second part, we focus on the challenges of integrating communication and various control concepts using classical time-triggered and modern event-based approaches. Based on the design, implementation, and evaluation of the proposed solutions using real systems and networks, we make the following contributions, which in many ways push the boundaries of previous approaches: • We are the first to demonstrate and evaluate fast feedback control over low-power wireless multi-hop networks. Essential for this achievement is a novel co-design and integration of communication and control. Our wireless embedded platform tames the imperfections impairing control, for example, message loss and varying delays, and considers the resulting key properties in the control design. Furthermore, the careful orchestration of control and communication tasks enables real-time operation and makes our system amenable to an end-to-end analysis. Due to this, we can provably guarantee closed-loop stability for physical processes with linear time-invariant dynamics. • We propose control-guided communication, a novel co-design for distributed self-triggered control over wireless multi-hop networks. Self-triggered control can save energy by transmitting data only when needed. However, there are no solutions that bring those savings to multi-hop networks and that can reallocate freed-up resources, for example, to other agents. Our control system informs the communication system of its transmission demands ahead of time so that communication resources can be allocated accordingly. Thus, we can transfer the energy savings from the control to the communication side and achieve an end-to-end benefit. • We present a novel co-design of distributed control and wireless communication that resolves overload situations in which the communication demand exceeds the available bandwidth. As systems scale up, featuring more agents and higher bandwidth demands, the available bandwidth will be quickly exceeded, resulting in overload. While event-triggered control and self-triggered control approaches reduce the communication demand on average, they cannot prevent that potentially all agents want to communicate simultaneously. We address this limitation by dynamically allocating the available bandwidth to the agents with the highest need. Thus, we can formally prove that our co-design guarantees closed-loop stability for physical systems with stochastic linear time-invariant dynamics.:Abstract Acknowledgements List of Abbreviations List of Figures List of Tables 1 Introduction 1.1 Motivation 1.2 Application Requirements 1.3 Challenges 1.4 State of the Art 1.5 Contributions and Road Map 2 Mixer: Efficient Many-to-All Broadcast in Dynamic Wireless Mesh Networks 2.1 Introduction 2.2 Overview 2.3 Design 2.4 Implementation 2.5 Evaluation 2.6 Discussion 2.7 Related Work 3 Butler: Increasing the Availability of Low-Power Wireless Communication Protocols 3.1 Introduction 3.2 Motivation and Background 3.3 Design 3.4 Analysis 3.5 Implementation 3.6 Evaluation 3.7 Related Work 4 Feedback Control Goes Wireless: Guaranteed Stability over Low-Power Multi-Hop Networks 4.1 Introduction 4.2 Related Work 4.3 Problem Setting and Approach 4.4 Wireless Embedded System Design 4.5 Control Design and Analysis 4.6 Experimental Evaluation 4.A Control Details 5 Control-Guided Communication: Efficient Resource Arbitration and Allocation in Multi-Hop Wireless Control Systems 5.1 Introduction 5.2 Problem Setting 5.3 Co-Design Approach 5.4 Wireless Communication System Design 5.5 Self-Triggered Control Design 5.6 Experimental Evaluation 6 Scaling Beyond Bandwidth Limitations: Wireless Control With Stability Guarantees Under Overload 6.1 Introduction 6.2 Problem and Related Work 6.3 Overview of Co-Design Approach 6.4 Predictive Triggering and Control System 6.5 Adaptive Communication System 6.6 Integration and Stability Analysis 6.7 Testbed Experiments 6.A Proof of Theorem 4 6.B Usage of the Network Bandwidth for Control 7 Conclusion and Outlook 7.1 Contributions 7.2 Future Directions Bibliography List of Publication

    ACiS: smart switches with application-level acceleration

    Full text link
    Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches. In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities. In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems. In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs). To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator. In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method. In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes. We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities

    Adaptive Microarchitectural Optimizations to Improve Performance and Security of Multi-Core Architectures

    Get PDF
    With the current technological barriers, microarchitectural optimizations are increasingly important to ensure performance scalability of computing systems. The shift to multi-core architectures increases the demands on the memory system, and amplifies the role of microarchitectural optimizations in performance improvement. In a multi-core system, microarchitectural resources are usually shared, such as the cache, to maximize utilization but sharing can also lead to contention and lower performance. This can be mitigated through partitioning of shared caches.However, microarchitectural optimizations which were assumed to be fundamentally secure for a long time, can be used in side-channel attacks to exploit secrets, as cryptographic keys. Timing-based side-channels exploit predictable timing variations due to the interaction with microarchitectural optimizations during program execution. Going forward, there is a strong need to be able to leverage microarchitectural optimizations for performance without compromising security. This thesis contributes with three adaptive microarchitectural resource management optimizations to improve security and/or\ua0performance\ua0of multi-core architectures\ua0and a systematization-of-knowledge of timing-based side-channel attacks.\ua0We observe that to achieve high-performance cache partitioning in a multi-core system\ua0three requirements need to be met: i) fine-granularity of partitions, ii) locality-aware placement and iii) frequent changes. These requirements lead to\ua0high overheads for current centralized partitioning solutions, especially as the number of cores in the\ua0system increases. To address this problem, we present an adaptive and scalable cache partitioning solution (DELTA) using a distributed and asynchronous allocation algorithm. The\ua0allocations occur through core-to-core challenges, where applications with larger performance benefit will gain cache capacity. The\ua0solution is implementable in hardware, due to low computational complexity, and can scale to large core counts.According to our analysis, better performance can be achieved by coordination of multiple optimizations for different resources, e.g., off-chip bandwidth and cache, but is challenging due to the increased number of possible allocations which need to be evaluated.\ua0Based on these observations, we present a solution (CBP) for coordinated management of the optimizations: cache partitioning, bandwidth partitioning and prefetching.\ua0Efficient allocations, considering the inter-resource interactions and trade-offs, are achieved using local resource managers to limit the solution space.The continuously growing number of\ua0side-channel attacks leveraging\ua0microarchitectural optimizations prompts us to review attacks and defenses to understand the vulnerabilities of different microarchitectural optimizations. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation\ua0and information flow.\ua0Our key insight is that eliminating any of the exploited root causes, in any of the attack steps, is enough to provide protection.\ua0Based on our framework, we present a systematization of the attacks and defenses on a wide range of microarchitectural optimizations, which highlights their key similarities.\ua0Shared caches are an attractive attack surface for side-channel attacks, while defenses need to be efficient since the cache is crucial for performance.\ua0To address this issue, we present an adaptive and scalable cache partitioning solution (SCALE) for protection against cache side-channel attacks. The solution leverages randomness,\ua0and provides quantifiable and information theoretic security guarantees using differential privacy. The solution closes the performance gap to a state-of-the-art non-secure allocation policy for a mix of secure and non-secure applications

    Research and development for the data, trigger and control card in preparation for Hi-Lumi lhc

    Get PDF
    When the Large Hadron Collider (LHC) increases its luminosity by an order of magnitude in the coming decade, the experiments that sit upon it must also be upgraded to continue to their physics performance in the increasingly demanding environment. To achieve this, the Compact Muon Solenoid (CMS) experiment will make use of tracking information in the Level-1 trigger for the first time, meaning that track reconstruction must be achieved in less than 4 Îźs in an all-FPGA architecture. MUonE is an experiment aiming to make an accurate measurement of the the hadronic contribution to the anomalous magnetic moment of the muon. It will achieve this by making use of similar apparatus to that designed for CMS and benefit from the research and development efforts there. This thesis presents both development and testing work for the readout chain from tracker module to back-end processing card, as well as the results and analysis of a beam test used to validate this chain for both CMS and the MUonE experiment.Open Acces

    Design and Implementation of HD Wireless Video Transmission System Based on Millimeter Wave

    Get PDF
    With the improvement of optical fiber communication network construction and the improvement of camera technology, the video that the terminal can receive becomes clearer, with resolution up to 4K. Although optical fiber communication has high bandwidth and fast transmission speed, it is not the best solution for indoor short-distance video transmission in terms of cost, laying difficulty and speed. In this context, this thesis proposes to design and implement a multi-channel wireless HD video transmission system with high transmission performance by using the 60GHz millimeter wave technology, aiming to improve the bandwidth from optical nodes to wireless terminals and improve the quality of video transmission. This thesis mainly covers the following parts: (1) This thesis implements wireless video transmission algorithm, which is divided into wireless transmission algorithm and video transmission algorithm, such as 64QAM modulation and demodulation algorithm, H.264 video algorithm and YUV420P algorithm. (2) This thesis designs the hardware of wireless HD video transmission system, including network processing unit (NPU) and millimeter wave module. Millimeter wave module uses RWM6050 baseband chip and TRX-BF01 rf chip. This thesis will design the corresponding hardware circuit based on the above chip, such as 10Gb/s network port, PCIE. (3) This thesis realizes the software design of wireless HD video transmission system, selects FFmpeg and Nginx to build the sending platform of video transmission system on NPU, and realizes video multiplex transmission with Docker. On the receiving platform of video transmission, FFmpeg and Qt are selected to realize video decoding, and OpenGL is combined to realize video playback. (4) Finally, the thesis completed the wireless HD video transmission system test, including pressure test, Web test and application scenario test. It has been verified that its HD video wireless transmission system can transmit HD VR video with three-channel bit rate of 1.2GB /s, and its rate can reach up to 3.7GB /s, which meets the research goal

    Efficient Security Protocols for Constrained Devices

    Get PDF
    During the last decades, more and more devices have been connected to the Internet.Today, there are more devices connected to the Internet than humans.An increasingly more common type of devices are cyber-physical devices.A device that interacts with its environment is called a cyber-physical device.Sensors that measure their environment and actuators that alter the physical environment are both cyber-physical devices.Devices connected to the Internet risk being compromised by threat actors such as hackers.Cyber-physical devices have become a preferred target for threat actors since the consequence of an intrusion disrupting or destroying a cyber-physical system can be severe.Cyber attacks against power and energy infrastructure have caused significant disruptions in recent years.Many cyber-physical devices are categorized as constrained devices.A constrained device is characterized by one or more of the following limitations: limited memory, a less powerful CPU, or a limited communication interface.Many constrained devices are also powered by a battery or energy harvesting, which limits the available energy budget.Devices must be efficient to make the most of the limited resources.Mitigating cyber attacks is a complex task, requiring technical and organizational measures.Constrained cyber-physical devices require efficient security mechanisms to avoid overloading the systems limited resources.In this thesis, we present research on efficient security protocols for constrained cyber-physical devices.We have implemented and evaluated two state-of-the-art protocols, OSCORE and Group OSCORE.These protocols allow end-to-end protection of CoAP messages in the presence of untrusted proxies.Next, we have performed a formal protocol verification of WirelessHART, a protocol for communications in an industrial control systems setting.In our work, we present a novel attack against the protocol.We have developed a novel architecture for industrial control systems utilizing the Digital Twin concept.Using a state synchronization protocol, we propagate state changes between the digital and physical twins.The Digital Twin can then monitor and manage devices.We have also designed a protocol for secure ownership transfer of constrained wireless devices. Our protocol allows the owner of a wireless sensor network to transfer control of the devices to a new owner.With a formal protocol verification, we can guarantee the security of both the old and new owners.Lastly, we have developed an efficient Private Stream Aggregation (PSA) protocol.PSA allows devices to send encrypted measurements to an aggregator.The aggregator can combine the encrypted measurements and calculate the decrypted sum of the measurements.No party will learn the measurement except the device that generated it

    Application-centric bandwidth allocation in datacenters

    Get PDF
    Today's datacenters host a large number of concurrently executing applications with diverse intra-datacenter latency and bandwidth requirements. Some of these applications, such as data analytics, graph processing, and machine learning training, are data-intensive and require high bandwidth to function properly. However, these bandwidth-hungry applications can often congest the datacenter network, leading to queuing delays that hurt application completion time. To remove the network as a potential performance bottleneck, datacenter operators have begun deploying high-end HPC-grade networks like InfiniBand. These networks offer fully offloaded network stacks, remote direct memory access (RDMA) capability, and non-discarding links, which allow them to provide both low latency and high bandwidth for a single application. However, it is unclear how well such networks accommodate a mix of latency- and bandwidth-sensitive traffic in a real-world deployment. In this thesis, we aim to answer the above question. To do so, we develop RPerf, a latency measurement tool for RDMA-based networks that can precisely measure the InfiniBand switch latency without hardware support. Using RPerf, we benchmark a rack-scale InfiniBand cluster in both isolated and mixed-traffic scenarios. Our key finding is that the evaluated switch can provide either low latency or high bandwidth, but not both simultaneously in a mixed-traffic scenario. We also evaluate several options to improve the latency-bandwidth trade-off and demonstrate that none are ideal. We find that while queue separation is a solution to protect latency-sensitive applications, it fails to properly manage the bandwidth of other applications. We also aim to resolve the problem with bandwidth management for non-latency-sensitive applications. Previous efforts to address this problem have generally focused on achieving max-min fairness at the flow level. However, we observe that different workloads exhibit varying levels of sensitivity to network bandwidth. For some workloads, even a small reduction in available bandwidth can significantly increase completion time, while for others, completion time is largely insensitive to available network bandwidth. As a result, simply splitting the bandwidth equally among all workloads is sub-optimal for overall application-level performance. To address this issue, we first propose a robust methodology capable of effectively measuring the sensitivity of applications to bandwidth. We then design Saba, an application-aware bandwidth allocation framework that distributes network bandwidth based on application-level sensitivity. Saba combines ahead-of-time application profiling to determine bandwidth sensitivity with runtime bandwidth allocation using lightweight software support, with no modifications to network hardware or protocols. Experiments with a 32-server hardware testbed show that Saba can significantly increase overall performance by reducing the job completion time for bandwidth-sensitive jobs
    • …
    corecore