Search CORE

2,024 research outputs found

Automatic Sharing Classification and Timely Push for Cache-coherent Systems

Author: Musleh Malek
Pai Vijay
Publication venue: 'Purdue University (bepress)'
Publication date: 12/11/2013
Field of study

This paper proposes and evaluates Sharing/Timing Adaptive Push (STAP), a dynamic scheme for preemptively sending data from producers to consumers to minimize criticalpath communication latency. STAP uses small hardware buffers to dynamically detect sharing patterns and timing requirements. The scheme applies to both intra-node and inter-socket directorybased shared memory networks. We integrate STAP into a MOESI cache-coherence protocol using heuristics to detect different data sharing patterns, including broadcasts, producer/consumer, and migratory-data sharing. Using 12 benchmarks from the PARSEC and SPLASH-2 suites in 3 different configurations, we show that our scheme significantly reduces communication latency in NUMA systems and achieves an average of 10% performance improvement (up to 46%), with at most 2% on-chip storage overhead. When combined with existing prefetch schemes, STAP either outperforms prefetching or combines with prefetching for improved performance (up to 15% extra) in most cases

Purdue E-Pubs

ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

Author: Li Baopu
Li Chaojian
Lin Yingyan
Shi Huihong
Sun Zhanyi
You Haoran
Yu Zhongzhi
Zhang Yongan
Zhao Yang
Publication venue
Publication date: 10/12/2022
Field of study

Vision Transformers (ViTs) have achieved state-of-the-art performance on various vision tasks. However, ViTs' self-attention module is still arguably a major bottleneck, limiting their achievable hardware efficiency. Meanwhile, existing accelerators dedicated to NLP Transformers are not optimal for ViTs. This is because there is a large difference between ViTs and NLP Transformers: ViTs have a relatively fixed number of input tokens, whose attention maps can be pruned by up to 90% even with fixed sparse patterns; while NLP Transformers need to handle input sequences of varying numbers of tokens and rely on on-the-fly predictions of dynamic sparse attention patterns for each input to achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated algorithm and accelerator co-design framework dubbed ViTCoD for accelerating ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations. On the hardware level, we develop a dedicated accelerator to simultaneously coordinate the enforced denser/sparser workloads and encoder/decoder engines for boosted hardware utilization. Extensive experiments and ablation studies validate that ViTCoD largely reduces the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x, 86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs, and prior-art Transformer accelerators SpAtten and Sanger under an attention sparsity of 90%, respectively.Comment: Accepted to HPCA 202

arXiv.org e-Print Archive

Automatically Securing Permission-Based Software by Reducing the Attack Surface: An Application to Android

Author: Bartel Alexandre
Klein Jacques
Monperrus Martin
Traon Yves Le
Publication venue
Publication date: 01/01/2011
Field of study

A common security architecture, called the permission-based security model (used e.g. in Android and Blackberry), entails intrinsic risks. For instance, applications can be granted more permissions than they actually need, what we call a "permission gap". Malware can leverage the unused permissions for achieving their malicious goals, for instance using code injection. In this paper, we present an approach to detecting permission gaps using static analysis. Our prototype implementation in the context of Android shows that the static analysis must take into account a significant amount of platform-specific knowledge. Using our tool on two datasets of Android applications, we found out that a non negligible part of applications suffers from permission gaps, i.e. does not use all the permissions they declare

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL Descartes

Open Repository and Bibliography - Luxembourg

Hal-Diderot

Electronic Medical Record (EMR) Informatics Security

Author: Grimaila Michael R.
Nicholson Byron
Strouble Dennis
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/03/2008
Field of study

Medical records, once archived on paper and stored in filing cabinets, are now housed in electronic data repositories. While converting medical information into electronic format yields enormous benefits, it also raises new privacy and security concerns. The fact that medical information is now accessed, stored, processed, and transmitted through multiple organizations has led to the need for medical informatics security. In this paper, we examine the evolution of electronic medical records, review relevant legislation, and examine issues of privacy and security as it relates to medical information

AIS Electronic Library (AISeL)

Assessing the security of hardware-assisted isolation techniques

Author: Maisuradze Giorgi
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

Universaar

Acronym

Recommended from our members

QoS-aware mechanisms for improving cost-efficiency of datacenters

Author: Zhu Haishan
Publication venue
Publication date: 22/01/2021
Field of study

Warehouse Scale Computers (WSCs) promise high cost-efficiency by amortizing power, cooling, and management overheads. WSCs today host a large variety of jobs with two broad performance requirements categories: latency-critical (LC) and best-effort (BE). Ideally, to fully utilize all hardware resources, WSC operators can simply fill all the nodes with computing jobs. Unfortunately, because colocated jobs contend for shared resources, systems with high loads often experience performance degradation, which negatively impacts the Quality of Service (QoS) for LC jobs. In fact, service providers usually over-provision resources to avoid any interference with LC jobs, leading to significant resource inefficiencies. In this dissertation, I explore opportunities across different system-abstraction layers to improve the cost-efficiency of dataceters by increasing resource utilization of WSCs with little or no impact on the performance of LC jobs. The dissertation has three main components. First, I explore opportunities to improve the throughput of multicore systems by reducing the performance variation of LC jobs. The main insight is that by reshaping the latency distribution curve, performance headroom of LC jobs can be effectively converted to improved BE throughput. I develop, implement, and evaluate a runtime system that achieves this goal with existing hardware. I leverage the cache partitioning, per-core frequency scaling, and thread masking of server processors. Evaluation results show the proposed solution enables 30% higher system throughput compared to solutions proposed in prior works while maintaining at least as good QoS for LC jobs. Second, I study resource contention in near-future heterogeneous memory architectures (HMA). This study is motivated by recent developments in non-volatile memory (NVM) technologies, which enable higher storage density at the cost of same performance. To understand the performance and QoS impact of HMAs, I design and implement a performance emulator in the Linux kernel that runs unmodified workloads with high accuracy, low overhead, and complete transparency. I further propose and evaluate multiple data and resource management QoS mechanisms, such as locality-aware page admission, occupancy management, and write buffer jailing. Third, I focus on accelerated machine learning (ML) systems. By profiling the performance of production workloads and accelerators, I show that accelerated ML tasks are highly sensitive to main memory interference due to fine-grained interaction between CPU and accelerator tasks. As a result, memory resource contention can significantly decreases the performance and efficiency gains of accelerators. I propose a runtime system that leverages existing hardware capabilities and show 17% higher system efficiency compared to previous approaches. This study further exposes opportunities for future processor architecturesElectrical and Computer Engineerin

Texas ScholarWorks

Designing the Extended Zero Trust Maturity Model A Holistic Approach to Assessing and Improving an Organization’s Maturity Within the Technology, Processes and People Domains of Information Security

Author: Jansen Jarand Nikolai
Tokerud Simen
Publication venue: 'University of Agder'
Publication date: 01/01/2022
Field of study

Zero Trust is an approach to security where implicit trust is removed, forcing applications, workloads, servers and users to verify themselves every time a request is made. Furthermore, Zero Trust means assuming anything can be compromised, and designing networks, identities and systems with this in mind and following the principle of least privilege. This approach to information security has been coined as the solution to the weaknesses of traditional perimeter-based information security models, and adoption is starting to increase. However, the principles of Zero Trust are only applied within the technical domain to aspects such as networks, data and identities in past research. This indicates a knowledge gap, as the principles of Zero Trust could be applied to organizational domains such as people and processes to further strengthen information security, resulting in a holistic approach. To fill this gap, we employed design science research to develop a holistic maturity model for Zero Trust maturity based on these principles: The EZTMM. We performed two systematic literature reviews on Zero Trust and Maturity Model theory respectively and collaborated closely with experts and practitioners on the operational, tactical and strategic levels of six different organizations. The resulting maturity model was anchored in prior Zero Trust and maturity model literature, as well as practitioner and expert experiences and knowledge. The EZTMM was evaluated by our respondent organizations through two rounds of interviews before being used by one respondent organization to perform a maturity assessment of their own organization as a part of our case study evaluation. Each interview round resulted in ample feedback and learning, while the case study allowed us to evaluate and improve on the model in a real-world setting. Our contribution is twofold: A fully functional, holistic Zero Trust maturity model with an accompanying maturity assessment spreadsheet (the artifact), and our reflections and suggestions regarding further development of the EZTMM and research on the holistic application of Zero Trust principles for improved information security

Agder University Research Archive

Designing the Extended Zero Trust Maturity Model A Holistic Approach to Assessing and Improving an Organization’s Maturity Within the Technology, Processes and People Domains of Information Security

Author: Jansen Jarand Nikolai
Tokerud Simen
Publication venue: 'University of Agder'
Publication date: 01/01/2022
Field of study

Agder University Research Archive

Measuring and Controlling Multicore Contention in a RISC-V System-on-Chip

Author: Andreu Cerezo Pablo
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 30/09/2021
Field of study

[ES] Los procesadores multinúcleo empezaron una revolución en el cómputo moderno cuando fueron introducidos en el espacio de cómputo comercial y de consumidor. Estos procesadores multinúcleo presentaban un aumento significativo en consumo, eficiencia y rendimiento en un periodo de tiempo en el aumento de la frecuencia y el IPC del procesador parecía estar tocando techo. Sin embargo, en sistemas críticos, la introducción de los procesadores multinúcleo ha traído a la luz diferentes dificultades en el proceso de certificación. La principal área que dificulta la caracterización de los sistemas multicore en tiempo real es el uso de recursos compartidos, en específico, los buses compartidos. En este trabajo proveeremos las herramientas necesarias para facilitar la caracterización de sistemas que hacen uso de buses compartidos en sistemas de criticidad mixta. En específico, combinamos las políticas desarrolladas para sistemas con buses con políticas de limitación de ancho de banda basadas en interferencia causada al núcleo principal. Con esta combinación de políticas podemos limitar el WCET de la tarea crítica en el sistema multinúcleo mientras que proveemos un "best effort" para permitir el progreso en los núcleos secundarios.[CAT] Els processadors multinucli van començar una revolució en el còmput modern quan van ser introduïts en l’espai de còmput comercial i de consumidor. Aquests processadors multinucli presentaven un augment significatiu en consum, eficiència i rendiment en un període de temps en l’augment de la freqüència i l’IPC de l’processador semblava estar tocant sostre. No obstant això, en sistemes crítics, la introducció dels processadors multi- nucli ha portat a la llum diferents dificultats en el procés de certificació. La principal àrea que dificulta la caracterització dels sistemes multinucli en temps real és l’ús de recursos compartits, en específic, els busos compartits. En aquest treball proveirem les eines necessàries per facilitar la caracterització de sis- temes que fan ús de busos compartits en sistemes de criticitat mixta. En específic, combi- nem les polítiques desenvolupades per a sistemes amb busos amb polítiques de limitació d’ample de banda basades en interferència causada a el nucli principal. Amb aquesta combinació de polítiques podem limitar l’WCET de la tasca crítica en el sistema multinu- cli mentre que proveïm un "best effort"per permetre el progrés en els nuclis secundaris.[EN] Multicore processors were a revolution when introduced into the commercial computing space, they presented great power efficiency and performance in a time where clock speeds and instruction level parallelism were plateauing. But, on safety critical systems, the introduction of multi-core processors has brought serious difficulties to the certification process. The main trouble spot for multicore characterization is the usage of shared resources, in specific, shared buses. In this work, we provide tools to ease the characterization of shared bus mechanisms timing interference on critical and mixed criticality systems. In particular, we combine shared bus arbitration policies with rate limiting policies based on critical workload interference to bound the WCET of a critical workload on a multi-core system while doing a best effort to let secondary cores progress as much as possible.Andreu Cerezo, P. (2021). Measuring and Controlling Multicore Contention in a RISC-V System-on-Chip. Universitat Politècnica de València. http://hdl.handle.net/10251/173563TFG

RiuNet

Web Replica Hosting Systems

Author: Pierre G.
Sivasubramanian S.
Szymaniak M.
Publication venue
Publication date: 01/01/2004
Field of study

VU Research Portal