Search CORE

15 research outputs found

Hardware Acceleration for Similarity Measurement in Natural Language Processing

Author: Jichuan Chang
Parthasarathy Ranganathan
Prateek Tandon
Ronald G Dreslinski
Thomas F Wenisch
Vahed Qazvinian
Publication venue
Publication date: 10/04/2020
Field of study

Abstract-The continuation of Moore's law scaling, but in the absence of Dennard scaling, motivates an emphasis on energyefficient accelerator-based designs for future applications. In natural language processing, the conventional approach to automatically analyze vast text collections-using scale-out processingincurs high energy and hardware costs since the central computeintensive step of similarity measurement often entails pair-wise, allto-all comparisons. We propose a custom hardware accelerator for similarity measures that leverages data streaming, memory latency hiding, and parallel computation across variable-length threads. We evaluate our design through a combination of architectural simulation and RTL synthesis. When executing the dominant kernel in a semantic indexing application for documents, we demonstrate throughput gains of up to 42× and 58× lower energy per similaritycomputation compared to an optimized software implementation, while requiring less than 1.3% of the area of a conventional core

CiteSeerX

Decoding billions of integers per second through vectorization

Author: Aksyonoff A
Büttcher S
Jones DM
Witten IH
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see http://boytsov.info/datasets/clueweb09gap

arXiv.org e-Print Archive

R-libre

Crossref

Efficient and Scalable Listing of Four-Vertex Subgraph

Author: Xia Xiangzhou
Publication venue
Publication date: 18/01/2019
Field of study

Identifying four-vertex subgraphs has long been recognized as a fundamental technique in bioinformatics and social networks. However, listing these structures is a challenging task, especially for graphs that do not fit in RAM. To address this problem, we build a set of algorithms, models, and implementations that can handle massive graphs on commodity hardware. Our technique achieves 4 – 5 orders of magnitude speedup compared to the best prior methods on graphs with billions of edges, with external-memory operation equally efficient

Texas A&M Repository

Efficient and Scalable Listing of Four-Vertex Subgraph

Author: Xia Xiangzhou
Publication venue
Publication date: 18/01/2019
Field of study

Texas A&M Repository