3,857 research outputs found

    Enabling the Twin Transitions: Digital Technologies Support Environmental Sustainability through Lean Principles

    Get PDF
    Manufacturing companies seek innovative approaches to achieve successful Green and Digital transitions, where adopting lean production is one alternative. However, further investigation is required to formulate the approach with empirical inputs and identify what digital technologies could be applied with which lean principles for environmental benefits. Therefore, this study first conducted a case study in three companies to collect empirical data. A complementary literature review was then carried out, investigating the existing frameworks, and complementing practices of digitalized lean implementations and the resulting environmental impact. Consequently, the Internet of Things and related connection-level technologies were identified as the key facilitators in lean implementations, specifically in visualization, communication, and poka-yoke, leading to environmental benefits. Furthermore, a framework of DIgitalization Supports Environmental sustainability through Lean principles (DISEL) was proposed to help manufacturing companies identify the opportunities of digitalizing lean principles for Environmental sustainability, thus enabling the twin transitions and being resilient

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation

    Reflections on Philanthropy for Social Justice A New Era of Giving

    Get PDF
    Philanthropy in India is growing steadily, with a surge in funds and practice advancements. The question remains, how can this redistribution of wealth be effectively harnessed to achieve transformative social change and more inclusive development? In A New Era of Giving, thought leaders from India and abroad share their insights and perspectives on the challenges and issues to be addressed to make a shift from a charitable model of support to an approach that prioritises social justice

    Gendered Bodies, Engendered Lives: Bioarchaeological Exploration of the Intersectionality of Gender, Health, and Trauma at Turkey Creek Pueblo, Arizona (AD 1225-1286)

    Full text link
    This dissertation examines the relationships between sex, gender, and health at Turkey Creek Pueblo (AD 1225-1286), the earliest aggregated Pueblo community in the Point of Pines region of east central Arizona, to better understand their roles in producing differential health outcomes. To gain a view of these interactions, I use osteological, mortuary, and ethnohistoric data to explore how gender, as a social institution, informed divisions of labor and experiences with traumatic injury at Turkey Creek Pueblo, because this site was occupied during a socially dynamic and important period in the pre-contact American Southwest. Using these data, I explore how sex, age, life history, and social status/prestige, as bioarchaeologically recoverable axes of gender identity, intersected to structure experiences of disease, heavy workloads, and traumatic injury between individuals and groups at Turkey Creek Pueblo. Through this research, I identify which bioarchaeologically detectable axes of gender (e.g., sex, age, and social role/status) were significantly promoting or buffering against experiences of disease and physical trauma within this community. I show that, at Turkey Creek Pueblo, osteological sex is not the most significant axis of identity structuring differences in experiences of trauma, health, and social status or social power within this community. This challenges Euro-centric, binary assumptions and portrayals of Indigenous gender roles and inequalities in the past. Gender roles among pre-contact Puebloan communities were complex and not rigidly defined by sex, nor were labor activities, poor health, trauma, and social power/prestige expressly divided along binary dimensions, in contrast to how they have been portrayed by traditional ethnographic and ethnohistoric sources. This research is significant in that it provides another line of evidence that gender and gendered experience are relational social scripts informed by the intersection of multiple axes of individual identity and life history. These analyses shed light on the social consequences of early population aggregation in the Mogollon Highland region and its implications for health, disease, and traumatic injury for aggregating communities

    ACiS: smart switches with application-level acceleration

    Full text link
    Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches. In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities. In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems. In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs). To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator. In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method. In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes. We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities

    Enabling HW-based task scheduling in large multicore architectures

    Get PDF
    Dynamic Task Scheduling is an enticing programming model aiming to ease the development of parallel programs with intrinsically irregular or data-dependent parallelism. The performance of such solutions relies on the ability of the Task Scheduling HW/SW stack to efficiently evaluate dependencies at runtime and schedule work to available cores. Traditional SW-only systems implicate scheduling overheads of around 30K processor cycles per task, which severely limit the ( core count , task granularity ) combinations that they might adequately handle. Previous work on HW-accelerated Task Scheduling has shown that such systems might support high performance scheduling on processors with up to eight cores, but questions remained regarding the viability of such solutions to support the greater number of cores now frequently found in high-end SMP systems. The present work presents an FPGA-proven, tightly-integrated, Linux-capable, 30-core RISC-V system with hardware accelerated Task Scheduling. We use this implementation to show that HW Task Scheduling can still offer competitive performance at such high core count, and describe how this organization includes hardware and software optimizations that make it even more scalable than previous solutions. Finally, we outline ways in which this architecture could be augmented to overcome inter-core communication bottlenecks, mitigating the cache-degradation effects usually involved in the parallelization of highly optimized serial code.This work is supported by the TEXTAROSSA project G.A. n.956831, as part of the EuroHPC initiative, by the Spanish Government (grants PCI2021-121964, TEXTAROSSA; PDC2022-133323-I00, Multi-Ka; PID2019-107255GB-C21 MCIN/AEI/10.13039/501100011033; and CEX2021-001148-S), by Generalitat de Catalunya (2021 SGR 01007), and FAPESP (grant 2019/26702-8).Peer ReviewedPostprint (published version

    Kempsey, New South Wales : How social and political divisions in Kempsey’s early history impacted the town’s economic and environmental development to 1865, and its ongoing susceptibility to disaster

    Get PDF
    This study addresses the question: how did social and political divisions influence the
 economic and environmental development of Kempsey during the colonial period up
 to 1865? Primary documents including personal letters, journals, memoirs, political
 and governmental papers, along with a range of colonial newspapers have been
 studied and interpreted to form a social historical solution to the question. Due to the
 range of sources available for this investigation, a variation of methodologies has been
 employed, with particular emphasis on an empirical qualitative analysis. In addition to
 considering existing non-scholarly thematic histories of the Macleay Valley, this
 thesis draws existing scholarly investigations together and builds upon them, looking
 into the interdependence between society and environment, politics and geographical
 developments, culture and social movements to piece together the story of Kempsey
 and uncover the key events which have led to long lasting impacts on the town. No
 other scholarly study of this kind has been undertaken to bring the entire complex and
 multifaceted story of Kempsey’s early years into one scholarly investigation.
 Implications for this study highlight the important factor that powerful social and
 political divisions in a community have when important decisions about town
 planning, environmental protection, and issues of social justice need to be addressed.
 These divisions can lead to catastrophic outcomes that could impact generations to
 follow, as shown in the tumultuous history of Kempsey, New South Wales

    Tiny Machine Learning Environment: Enabling Intelligence on Constrained Devices

    Get PDF
    Running machine learning algorithms (ML) on constrained devices at the extreme edge of the network is problematic due to the computational overhead of ML algorithms, available resources on the embedded platform, and application budget (i.e., real-time requirements, power constraints, etc.). This required the development of specific solutions and development tools for what is now referred to as TinyML. In this dissertation, we focus on improving the deployment and performance of TinyML applications, taking into consideration the aforementioned challenges, especially memory requirements. This dissertation contributed to the construction of the Edge Learning Machine environment (ELM), a platform-independent open-source framework that provides three main TinyML services, namely shallow ML, self-supervised ML, and binary deep learning on constrained devices. In this context, this work includes the following steps, which are reflected in the thesis structure. First, we present the performance analysis of state-of-the-art shallow ML algorithms including dense neural networks, implemented on mainstream microcontrollers. The comprehensive analysis in terms of algorithms, hardware platforms, datasets, preprocessing techniques, and configurations shows similar performance results compared to a desktop machine and highlights the impact of these factors on overall performance. Second, despite the assumption that TinyML only permits models inference provided by the scarcity of resources, we have gone a step further and enabled self-supervised on-device training on microcontrollers and tiny IoT devices by developing the Autonomous Edge Pipeline (AEP) system. AEP achieves comparable accuracy compared to the typical TinyML paradigm, i.e., models trained on resource-abundant devices and then deployed on microcontrollers. Next, we present the development of a memory allocation strategy for convolutional neural networks (CNNs) layers, that optimizes memory requirements. This approach reduces the memory footprint without affecting accuracy nor latency. Moreover, e-skin systems share the main requirements of the TinyML fields: enabling intelligence with low memory, low power consumption, and low latency. Therefore, we designed an efficient Tiny CNN architecture for e-skin applications. The architecture leverages the memory allocation strategy presented earlier and provides better performance than existing solutions. A major contribution of the thesis is given by CBin-NN, a library of functions for implementing extremely efficient binary neural networks on constrained devices. The library outperforms state of the art NN deployment solutions by drastically reducing memory footprint and inference latency. All the solutions proposed in this thesis have been implemented on representative devices and tested in relevant applications, of which results are reported and discussed. The ELM framework is open source, and this work is clearly becoming a useful, versatile toolkit for the IoT and TinyML research and development community

    Towards Fast and Scalable Private Inference

    Full text link
    Privacy and security have rapidly emerged as first order design constraints. Users now demand more protection over who can see their data (confidentiality) as well as how it is used (control). Here, existing cryptographic techniques for security fall short: they secure data when stored or communicated but must decrypt it for computation. Fortunately, a new paradigm of computing exists, which we refer to as privacy-preserving computation (PPC). Emerging PPC technologies can be leveraged for secure outsourced computation or to enable two parties to compute without revealing either users' secret data. Despite their phenomenal potential to revolutionize user protection in the digital age, the realization has been limited due to exorbitant computational, communication, and storage overheads. This paper reviews recent efforts on addressing various PPC overheads using private inference (PI) in neural network as a motivating application. First, the problem and various technologies, including homomorphic encryption (HE), secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are introduced. Next, a characterization of their overheads when used to implement PI is covered. The characterization motivates the need for both GCs and HE accelerators. Then two solutions are presented: HAAC for accelerating GCs and RPU for accelerating HE. To conclude, results and effects are shown with a discussion on what future work is needed to overcome the remaining overheads of PI.Comment: Appear in the 20th ACM International Conference on Computing Frontier
    • 

    corecore