1,707 research outputs found
Perceptions, Actors, Innovations
With Agenda 2030, the UN adopted wide-ranging Sustainable Development Goals (SDGs) that integrate development and environmental agendas. This book has a unique focus on the political tensions between environmental and socio-economic objectives and advocates for a cooperative shift towards environmentally sound sustainability
Business Functions Capabilities and Small and Medium Enterprises’ Internationalization
Ineffective global expansion can adversely affect small and medium enterprises (SMEs) business outcomes. Business leaders are concerned with developing effective global expansion strategies to penetrate potential international markets, thus enhancing sustainability. Grounded in the business management systems theory, the purpose of this qualitative multi-case study was to explore strategies that leaders of Sub-Saharan Africa manufacturing SMEs use for global expansion. The participants were five manufacturing value-adding SME leaders participating in export markets. Using Yin’s five steps data analysis process, six themes emerged: (a) enterprise characterization, (b) understanding the enterprise’s product, (c) intra-enterprise factor-based strategies for export participation, (d) the enterprise’s external factor-based strategies for successful export venture, (e) global expansion strategies, and (f) serendipitous findings. A key recommendation for SME leaders is to analyze the critical components of their products and prepare to adjust them to the demand dimensions of the target market. The implications for positive social change include the potential to increase the enterprise’s wealth, increase employment, reduce poverty for all value chain participants, and growth in gross domestic product
Porting and optimizing BWA-MEM2 using the Fujitsu A64FX processor
Sequence alignment pipelines for human genomes are an emerging workload that will dominate in the precision medicine field. BWA-MEM2 is a tool widely used in the scientific community to perform read mapping studies. In this paper, we port BWA-MEM2 to the AArch64 architecture using the ARMv8-A specification, and we compare the resulting version against an Intel Skylake system both in performance and in energy-to-solution. The porting effort entails numerous code modifications, since BWA-MEM2 implements certain kernels using x86 64 specific intrinsics, e.g., AVX-512. To adapt this code we use the recently introduced Arm’s Scalable Vector Extensions (SVE). More specifically, we use Fujitsu’s A64FX processor, the first to implement SVE. The A64FX powers the Fugaku Supercomputer that led the Top500 ranking from June 2020 to November 2021. After porting BWA-MEM2 we define and implement a number of optimizations to improve performance in the A64FX target architecture. We show that while the A64FX performance is lower than that of the Skylake system, A64FX delivers 11.6% better energy-to-solution on average. All the code used for this article is available at https://gitlab.bsc.es/rlangari/bwa-a64fx
Recommended from our members
Policy options for food system transformation in Africa and the role of science, technology and innovation
As recognized by the Science, Technology and Innovation Strategy for Africa – 2024 (STISA-2024), science, technology and innovation (STI) offer many opportunities for addressing the main constraints to embracing transformation in Africa, while important lessons can be learned from successful interventions, including policy and institutional innovations, from those African countries that have already made significant progress towards food system transformation. This chapter identifies opportunities for African countries and the region to take proactive steps to harness the potential of the food and agriculture sector so as to ensure future food and nutrition security by applying STI solutions and by drawing on transformational policy and institutional innovations across the continent. Potential game-changing solutions and innovations for food system transformation serving people and ecology apply to (a) raising production efficiency and restoring and sustainably managing degraded resources; (b) finding innovation in the storage, processing and packaging of foods; (c) improving human nutrition and health; (d) addressing equity and vulnerability at the community and ecosystem levels; and (e) establishing preparedness and accountability systems. To be effective in these areas will require institutional coordination; clear, food safety and health-conscious regulatory environments; greater and timely access to information; and transparent monitoring and accountability systems
Optimisation for Optical Data Centre Switching and Networking with Artificial Intelligence
Cloud and cluster computing platforms have become standard across almost every domain of business, and their scale quickly approaches servers in a single warehouse. However, the tier-based opto-electronically packet switched network infrastructure that is standard across these systems gives way to several scalability bottlenecks including resource fragmentation and high energy requirements. Experimental results show that optical circuit switched networks pose a promising alternative that could avoid these.
However, optimality challenges are encountered at realistic commercial scales. Where exhaustive optimisation techniques are not applicable for problems at the scale of Cloud-scale computer networks, and expert-designed heuristics are performance-limited and typically biased in their design, artificial intelligence can discover more scalable and better performing optimisation strategies.
This thesis demonstrates these benefits through experimental and theoretical work spanning all of component, system and commercial optimisation problems which stand in the way of practical Cloud-scale computer network systems. Firstly, optical components are optimised to gate in and are demonstrated in a proof-of-concept switching architecture for optical data centres with better wavelength and component scalability than previous demonstrations. Secondly, network-aware resource allocation schemes for optically composable data centres are learnt end-to-end with deep reinforcement learning and graph neural networks, where less networking resources are required to achieve the same resource efficiency compared to conventional methods. Finally, a deep reinforcement learning based method for optimising PID-control parameters is presented which generates tailored parameters for unseen devices in . This method is demonstrated on a market leading optical switching product based on piezoelectric actuation, where switching speed is improved with no compromise to optical loss and the manufacturing yield of actuators is improved. This method was licensed to and integrated within the manufacturing pipeline of this company. As such, crucial public and private infrastructure utilising these products will benefit from this work
Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques
The rapid growth of demanding applications in domains applying multimedia
processing and machine learning has marked a new era for edge and cloud
computing. These applications involve massive data and compute-intensive tasks,
and thus, typical computing paradigms in embedded systems and data centers are
stressed to meet the worldwide demand for high performance. Concurrently, the
landscape of the semiconductor field in the last 15 years has constituted power
as a first-class design concern. As a result, the community of computing
systems is forced to find alternative design approaches to facilitate
high-performance and/or power-efficient computing. Among the examined
solutions, Approximate Computing has attracted an ever-increasing interest,
with research works applying approximations across the entire traditional
computing stack, i.e., at software, hardware, and architectural levels. Over
the last decade, there is a plethora of approximation techniques in software
(programs, frameworks, compilers, runtimes, languages), hardware (circuits,
accelerators), and architectures (processors, memories). The current article is
Part I of our comprehensive survey on Approximate Computing, and it reviews its
motivation, terminology and principles, as well it classifies and presents the
technical details of the state-of-the-art software and hardware approximation
techniques.Comment: Under Review at ACM Computing Survey
NetClone: Fast, Scalable, and Dynamic Request Cloning for Microsecond-Scale RPCs
Spawning duplicate requests, called cloning, is a powerful technique to
reduce tail latency by masking service-time variability. However, traditional
client-based cloning is static and harmful to performance under high load,
while a recent coordinator-based approach is slow and not scalable. Both
approaches are insufficient to serve modern microsecond-scale Remote Procedure
Calls (RPCs). To this end, we present NetClone, a request cloning system that
performs cloning decisions dynamically within nanoseconds at scale. Rather than
the client or the coordinator, NetClone performs request cloning in the network
switch by leveraging the capability of programmable switch ASICs. Specifically,
NetClone replicates requests based on server states and blocks redundant
responses using request fingerprints in the switch data plane. To realize the
idea while satisfying the strict hardware constraints, we address several
technical challenges when designing a custom switch data plane. NetClone can be
integrated with emerging in-network request schedulers like RackSched. We
implement a NetClone prototype with an Intel Tofino switch and a cluster of
commodity servers. Our experimental results show that NetClone can improve the
tail latency of microsecond-scale RPCs for synthetic and real-world application
workloads and is robust to various system conditions.Comment: 13 pages, ACM SIGCOMM 202
Adaptive Microarchitectural Optimizations to Improve Performance and Security of Multi-Core Architectures
With the current technological barriers, microarchitectural optimizations are increasingly important to ensure performance scalability of computing systems. The shift to multi-core architectures increases the demands on the memory system, and amplifies the role of microarchitectural optimizations in performance improvement. In a multi-core system, microarchitectural resources are usually shared, such as the cache, to maximize utilization but sharing can also lead to contention and lower performance. This can be mitigated through partitioning of shared caches.However, microarchitectural optimizations which were assumed to be fundamentally secure for a long time, can be used in side-channel attacks to exploit secrets, as cryptographic keys. Timing-based side-channels exploit predictable timing variations due to the interaction with microarchitectural optimizations during program execution. Going forward, there is a strong need to be able to leverage microarchitectural optimizations for performance without compromising security. This thesis contributes with three adaptive microarchitectural resource management optimizations to improve security and/or\ua0performance\ua0of multi-core architectures\ua0and a systematization-of-knowledge of timing-based side-channel attacks.\ua0We observe that to achieve high-performance cache partitioning in a multi-core system\ua0three requirements need to be met: i) fine-granularity of partitions, ii) locality-aware placement and iii) frequent changes. These requirements lead to\ua0high overheads for current centralized partitioning solutions, especially as the number of cores in the\ua0system increases. To address this problem, we present an adaptive and scalable cache partitioning solution (DELTA) using a distributed and asynchronous allocation algorithm. The\ua0allocations occur through core-to-core challenges, where applications with larger performance benefit will gain cache capacity. The\ua0solution is implementable in hardware, due to low computational complexity, and can scale to large core counts.According to our analysis, better performance can be achieved by coordination of multiple optimizations for different resources, e.g., off-chip bandwidth and cache, but is challenging due to the increased number of possible allocations which need to be evaluated.\ua0Based on these observations, we present a solution (CBP) for coordinated management of the optimizations: cache partitioning, bandwidth partitioning and prefetching.\ua0Efficient allocations, considering the inter-resource interactions and trade-offs, are achieved using local resource managers to limit the solution space.The continuously growing number of\ua0side-channel attacks leveraging\ua0microarchitectural optimizations prompts us to review attacks and defenses to understand the vulnerabilities of different microarchitectural optimizations. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation\ua0and information flow.\ua0Our key insight is that eliminating any of the exploited root causes, in any of the attack steps, is enough to provide protection.\ua0Based on our framework, we present a systematization of the attacks and defenses on a wide range of microarchitectural optimizations, which highlights their key similarities.\ua0Shared caches are an attractive attack surface for side-channel attacks, while defenses need to be efficient since the cache is crucial for performance.\ua0To address this issue, we present an adaptive and scalable cache partitioning solution (SCALE) for protection against cache side-channel attacks. The solution leverages randomness,\ua0and provides quantifiable and information theoretic security guarantees using differential privacy. The solution closes the performance gap to a state-of-the-art non-secure allocation policy for a mix of secure and non-secure applications
- …