23 research outputs found
An Optimal Level-synchronous Shared-memory Parallel BFS Algorithm with Optimal parallel Prefix-sum Algorithm and its Implications for Energy Consumption
We present a work-efficient parallel level-synchronous Breadth First Search
(BFS) algorithm for shared-memory architectures which achieves the theoretical
lower bound on parallel running time. The optimality holds regardless of the
shape of the graph. We also demonstrate the implication of this optimality for
the energy consumption of the program empirically. The key idea is never to use
more processing cores than necessary to complete the work in any computation
step efficiently. We keep the rest of the cores idle to save energy and to
reduce other resource contentions (e.g., bandwidth, shared caches, etc). Our
BFS does not use locks and atomic instructions and is easily extendible to
shared-memory coprocessors.Comment: 2 pages, brief announcemen
Large Language Models Based Automatic Synthesis of Software Specifications
Software configurations play a crucial role in determining the behavior of
software systems. In order to ensure safe and error-free operation, it is
necessary to identify the correct configuration, along with their valid bounds
and rules, which are commonly referred to as software specifications. As
software systems grow in complexity and scale, the number of configurations and
associated specifications required to ensure the correct operation can become
large and prohibitively difficult to manipulate manually. Due to the fast pace
of software development, it is often the case that correct software
specifications are not thoroughly checked or validated within the software
itself. Rather, they are frequently discussed and documented in a variety of
external sources, including software manuals, code comments, and online
discussion forums. Therefore, it is hard for the system administrator to know
the correct specifications of configurations due to the lack of clarity,
organization, and a centralized unified source to look at. To address this
challenge, we propose SpecSyn a framework that leverages a state-of-the-art
large language model to automatically synthesize software specifications from
natural language sources. Our approach formulates software specification
synthesis as a sequence-to-sequence learning problem and investigates the
extraction of specifications from large contextual texts. This is the first
work that uses a large language model for end-to-end specification synthesis
from natural language texts. Empirical results demonstrate that our system
outperforms prior the state-of-the-art specification synthesis tool by 21% in
terms of F1 score and can find specifications from single as well as multiple
sentences
Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation
High-dimensional sparse data emerge in many critical application domains such
as cybersecurity, healthcare, anomaly detection, and trend analysis. To quickly
extract meaningful insights from massive volumes of these multi-dimensional
data, scientists employ unsupervised analysis tools based on tensor
decomposition (TD) methods. However, real-world sparse tensors exhibit highly
irregular shapes, data distributions, and sparsity, which pose significant
challenges for making efficient use of modern parallel architectures. This
study breaks the prevailing assumption that compressing sparse tensors into
coarse-grained structures (i.e., tensor slices or blocks) or along a particular
dimension/mode (i.e., mode-specific) is more efficient than keeping them in a
fine-grained, mode-agnostic form. Our novel sparse tensor representation,
Adaptive Linearized Tensor Order (ALTO), encodes tensors in a compact format
that can be easily streamed from memory and is amenable to both caching and
parallel execution. To demonstrate the efficacy of ALTO, we accelerate popular
TD methods that compute the Canonical Polyadic Decomposition (CPD) model across
a range of real-world sparse tensors. Additionally, we characterize the major
execution bottlenecks of TD methods on multiple generations of the latest Intel
Xeon Scalable processors, including Sapphire Rapids CPUs, and introduce dynamic
adaptation heuristics to automatically select the best algorithm based on the
sparse tensor characteristics. Across a diverse set of real-world data sets,
ALTO outperforms the state-of-the-art approaches, achieving more than an
order-of-magnitude speedup over the best mode-agnostic formats. Compared to the
best mode-specific formats, which require multiple tensor copies, ALTO achieves
more than 5.1x geometric mean speedup at a fraction (25%) of their storage.Comment: We extend the results of our previous ICS paper to significantly
improve the parallel performance of the Canonical Polyadic Alternating Least
Squares (CP-ALS) algorithm for normally distributed data and the Canonical
Polyadic Alternating Poisson Regression (CP-APR) algorithm for non-negative
count dat