2,531 research outputs found
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
FlashX: Massive Data Analysis Using Fast I/O
With the explosion of data and the increasing complexity of data analysis, large-scale
data analysis imposes significant challenges in systems design. While current
research focuses on scaling out to large clusters, these scale-out solutions introduce
a significant amount of overhead. This thesis is motivated by the advance of new
I/O technologies such as flash memory. Instead of scaling out, we explore efficient
system designs in a single commodity machine with non-uniform memory architecture
(NUMA) and scale to large datasets by utilizing commodity solid-state drives
(SSDs). This thesis explores the impact of the new I/O technologies on large-scale
data analysis. Instead of implementing individual data analysis algorithms for SSDs,
we develop a data analysis ecosystem called FlashX to target a large range of data
analysis tasks. FlashX includes three subsystems: SAFS, FlashGraph and FlashMatrix.
SAFS is a user-space filesystem optimized for a large SSD array to deliver
maximal I/O throughput from SSDs. FlashGraph is a general-purpose graph analysis
framework that processes graphs in a semi-external memory fashion, i.e., keeping
vertex state in memory and edges on SSDs, and scales to graphs with billions of
vertices by utilizing SSDs through SAFS. FlashMatrix is a matrix-oriented programming
framework that supports both sparse matrices and dense matrices for general
data analysis. Similar to FlashGraph, it scales matrix operations beyond memory
capacity by utilizing SSDs. We demonstrate that with the current I/O technologies
FlashGraph and FlashMatrix in the (semi-)external-memory meets or even exceeds
state-of-the-art in-memory data analysis frameworks while scaling to massive datasets
for a large variety of data analysis tasks
IndexMAC: A Custom RISC-V Vector Instruction to Accelerate Structured-Sparse Matrix Multiplications
Structured sparsity has been proposed as an efficient way to prune the
complexity of modern Machine Learning (ML) applications and to simplify the
handling of sparse data in hardware. The acceleration of ML models - for both
training and inference - relies primarily on equivalent matrix multiplications
that can be executed efficiently on vector processors or custom matrix engines.
The goal of this work is to incorporate the simplicity of structured sparsity
into vector execution, thereby accelerating the corresponding matrix
multiplications. Toward this objective, a new vector index-multiply-accumulate
instruction is proposed, which enables the implementation of lowcost indirect
reads from the vector register file. This reduces unnecessary memory traffic
and increases data locality. The proposed new instruction was integrated in a
decoupled RISCV vector processor with negligible hardware cost. Extensive
evaluation demonstrates significant speedups of 1.80x-2.14x, as compared to
state-of-the-art vectorized kernels, when executing layers of varying sparsity
from state-of-the-art Convolutional Neural Networks (CNNs).Comment: DATE 202
- …