Search CORE

43 research outputs found

Optimal and Error-Free Multi-Valued Byzantine Consensus Through Parallel Execution

Author: Andrew Loveless
Baris Kasikci
Ronald Dreslinski
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 17/03/2020
Field of study

Multi-valued Byzantine Consensus (BC), in which

n

processes must reach agreement on a single

L

-bit value, is an essential primitive in the design of distributed cryptographic protocols and fault-tolerant distributed systems. One of the most desirable traits for a multi-valued BC protocol is to be error-free. In other words, have zero probability of producing incorrect results. The most efficient error-free multi-valued BC protocols are built as extension protocols, which reduce agreement on large values to agreement on small sequences of bits whose lengths are independent of

L

. The best extension protocols achieve

\mathcal{O}(Ln)

communication complexity, which is optimal, when

L

is large relative to

n

. Unfortunately, all known error-free and communication-optimal BC extension protocols require each process to broadcast at least

n

bits with a binary Byzantine Broadcast (BB) protocol. This design limits the scalability of these protocols to many processes, since when

n

is large, the binary broadcasts significantly inflate the overall number of bits communicated by the extension protocol. In this paper, we present Byzantine Consensus with Parallel Execution (BCPE), the first error-free and communication-optimal BC extension protocol in which each process only broadcasts a single bit with a binary BB protocol. BCPE is a synchronous and deterministic protocol, and tolerates

f < n/3

faulty processes (the best resilience possible). Our evaluation shows that BCPE\u27s design makes it significantly more scalable than the best existing protocol by Ganesh and Patra. For 1,000 processes to agree on 2 MB of data, BCPE communicates

10.92\times

fewer bits. For agreement on 10 MB of data, BCPE communicates

6.97\times

fewer bits. BCPE also matches the best existing protocol in all other standard efficiency metrics

Cryptology ePrint Archive

Open Information Extraction: A Review of Baseline Techniques, Approaches, and Applications

Author: Benameur-El Zineb
Dreslinski Ronald
Fayazi Morteza
Kamp Serafina
Yu Shuyan
Publication venue
Publication date: 17/10/2023
Field of study

With the abundant amount of available online and offline text data, there arises a crucial need to extract the relation between phrases and summarize the main content of each document in a few words. For this purpose, there have been many studies recently in Open Information Extraction (OIE). OIE improves upon relation extraction techniques by analyzing relations across different domains and avoids requiring hand-labeling pre-specified relations in sentences. This paper surveys recent approaches of OIE and its applications on Knowledge Graph (KG), text summarization, and Question Answering (QA). Moreover, the paper describes OIE basis methods in relation extraction. It briefly discusses the main approaches and the pros and cons of each method. Finally, it gives an overview about challenges, open issues, and future work opportunities for OIE, relation extraction, and OIE applications.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator

Author: Amarnath Aporva
Dreslinski Ronald
Dubach Christophe
Feng Siying
O'Boyle Michael
Pal Subhankar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/10/2021
Field of study

Edinburgh Research Explorer

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Author: Cole Murray
Dreslinski Ronald
Kaszyk Kuba
O'Boyle Michael F P
Pal Subhankar
Publication venue
Publication date: 12/08/2020
Field of study

Edinburgh Research Explorer

Hi-Rise: A high-radix switch for 3D integration with single-cycle arbitration

Author: David Blaauw
Reetuparna Das
Ronald G Dreslinski
Supreet Jeloka
Trevor Mudge
Publication venue
Publication date: 01/01/2014
Field of study

Abstract-This paper proposes a novel 3D switch, called 'HiRise', that employs high-radix switches to efficiently route data across multiple stacked layers of dies. The proposed interconnect is hierarchical and composed of two switches per silicon layer and a set of dedicated layer to layer channels. However, a hierarchical 3D switch can lead to unfair arbitration across different layers. To address this, the paper proposes a unique class-based arbitration scheme that is fully integrated into the switching fabric, and is easy to implement. It makes the 3D hierarchical switch's fairness comparable to that of a flat 2D switch with least recently granted arbitration. The 3D switch is evaluated for different radices, number of stacked layers, and different 3D integration technologies. A 64-radix, 128-bit width, 4-layer Hi-Rise evaluated in a 32nm technology has a throughput of 10.65 Tbps for uniform random traffic. Compared to a 2D design this corresponds to a 15% improvement in throughput, a 33% area reduction, a 20% latency reduction, and a 38% energy per transaction reduction

CiteSeerX

HETSIM: Simulating Large-Scale Heterogeneous Systems using a Trace-driven, Synchronization and Dependency-Aware Framework

Author: Cole Murray
Dreslinski Ronald G.
Feng Siying
Franke Bjoern
Kaszyk Kuba
Mudge Trevor
O'Boyle Michael F P
Pal Subhankar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/11/2020
Field of study

Crossref

Edinburgh Research Explorer

MTrainS: Improving DLRM training efficiency using heterogeneous memories

Author: Akers Jason
Ardestani Ehsan K.
Dreslinski Ronald
Ghosh Mrinmoy
Johnson Paul
Kassa Hiwot Tadese
Liu Xing
Mudigere Dheevatsa
Park Jongsoo
Tulloch Andrew
Publication venue
Publication date: 19/04/2023
Field of study

Recommendation models are very large, requiring terabytes (TB) of memory during training. In pursuit of better quality, the model size and complexity grow over time, which requires additional training data to avoid overfitting. This model growth demands a large number of resources in data centers. Hence, training efficiency is becoming considerably more important to keep the data center power demand manageable. In Deep Learning Recommendation Models (DLRM), sparse features capturing categorical inputs through embedding tables are the major contributors to model size and require high memory bandwidth. In this paper, we study the bandwidth requirement and locality of embedding tables in real-world deployed models. We observe that the bandwidth requirement is not uniform across different tables and that embedding tables show high temporal locality. We then design MTrainS, which leverages heterogeneous memory, including byte and block addressable Storage Class Memory for DLRM hierarchically. MTrainS allows for higher memory capacity per node and increases training efficiency by lowering the need to scale out to multiple hosts in memory capacity bound use cases. By optimizing the platform memory hierarchy, we reduce the number of nodes for training by 4-8X, saving power and cost of training while meeting our target training performance

arXiv.org e-Print Archive

Hardware Acceleration for Similarity Measurement in Natural Language Processing

Author: Jichuan Chang
Parthasarathy Ranganathan
Prateek Tandon
Ronald G Dreslinski
Thomas F Wenisch
Vahed Qazvinian
Publication venue
Publication date: 10/04/2020
Field of study

Abstract-The continuation of Moore's law scaling, but in the absence of Dennard scaling, motivates an emphasis on energyefficient accelerator-based designs for future applications. In natural language processing, the conventional approach to automatically analyze vast text collections-using scale-out processingincurs high energy and hardware costs since the central computeintensive step of similarity measurement often entails pair-wise, allto-all comparisons. We propose a custom hardware accelerator for similarity measures that leverages data streaming, memory latency hiding, and parallel computation across variable-length threads. We evaluate our design through a combination of architectural simulation and RTL synthesis. When executing the dominant kernel in a semantic indexing application for documents, we demonstrate throughput gains of up to 42× and 58× lower energy per similaritycomputation compared to an optimized software implementation, while requiring less than 1.3% of the area of a conventional core

CiteSeerX

CoSPARSE: A Software and Hardware Reconfigurable SpMV Framework for Graph Analytics

Author: Chakrabarti Chaitali
Cole Murray
Dreslinski Ronald
Feng Siying
He Xin
Kaszyk Kuba
Morton Magnus
Mudge Trevor
O'Boyle Michael
Pal Subhankar
Park Dong-hyeon
Sun Jiawen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/11/2021
Field of study

Edinburgh Research Explorer