291,604 research outputs found
Computational structural mechanics: A new activity at the NASA Langley Research Center
Complex structures considered for the late 1980's and early 1990's include composite primary aircraft structures and the space station. These structures are much more difficult to analyze than today's structures and necessitate a major upgrade in computerized structural analysis technology. A major research activity in computational structural mechanics (CSM) was initiated. The objective of the CSM activity is develop advanced structural analysis technology that will exploit modern and emerging computers such as computers with vector and/or parallel processing capabilities. The three main research activities underway in CSM include: (1) structural analysis methods development; (2) a software testbed for evaluating the methods; and (3) numerical techniques for parallel processing computers. The motivation and objectives of the CSM activity are presented and CSM activity is described. The current CSM research thrusts, and near and long term CSM research thrusts are outlined
Vector-Processing for Mobile Devices: Benchmark and Analysis
Vector processing has become commonplace in today's CPU microarchitectures.
Vector instructions improve performance and energy which is crucial for
resource-constraint mobile devices. The research community currently lacks a
comprehensive benchmark suite to study the benefits of vector processing for
mobile devices. This paper presents Swan-an extensive vector processing
benchmark suite for mobile applications. Swan consists of a diverse set of
data-parallel workloads from four commonly used mobile applications: operating
system, web browser, audio/video messaging application, and PDF rendering
engine. Using Swan benchmark suite, we conduct a detailed analysis of the
performance, power, and energy consumption of vectorized workloads, and show
that: (a) Vectorized kernels increase the pressure on cache hierarchy due to
the higher rate of memory requests. (b) Vector processing is more beneficial
for workloads with lower precision operations and higher cache hit rates. (c)
Limited Instruction-Level Parallelism and strided memory accesses to
multi-dimensional data structures prevent vector processing benefits from
scaling with more SIMD functional units and wider registers. (d) Despite lower
computation throughput than domain-specific accelerators, such as GPU, vector
processing outperforms these accelerators for kernels with lower operation
counts. Finally, we show five common computation patterns in mobile
data-parallel workloads that dominate the execution time.Comment: 2023 IEEE International Symposium on Workload Characterization
(IISWC
Processing large raster and vector data in apache spark
Spatial data processing frameworks in many cases are limited to vector data only. However, an important type of spatial data is raster data which is produced by sensors on satellites but also by high resolution cameras taking pictures of nano structures, such as chips on wafers. Often the raster
data sets become large and need to be processed in parallel on a cluster environment. In this paper we demonstrate our STARK framework with its support for raster data and functionality to combine raster and vector data in filter and join operations. To save engineers from the burden of learning a
programming language, queries can be formulated in SQL in a web interface. In the demonstration, users can use this web interface to inspect examples of raster data using our extended SQL queries on a Apache Spark cluster
Qubit Data Structures for Analyzing Computing Systems
Qubit models and methods for improving the performance of software and
hardware for analyzing digital devices through increasing the dimension of the
data structures and memory are proposed. The basic concepts, terminology and
definitions necessary for the implementation of quantum computing when
analyzing virtual computers are introduced. The investigation results
concerning design and modeling computer systems in a cyberspace based on the
use of two-component structure are presented.Comment: 9 pages,4 figures, Proceeding of the Third International Conference
on Data Mining & Knowledge Management Process (CDKP 2014
Probabilistic Graphical Models on Multi-Core CPUs using Java 8
In this paper, we discuss software design issues related to the development
of parallel computational intelligence algorithms on multi-core CPUs, using the
new Java 8 functional programming features. In particular, we focus on
probabilistic graphical models (PGMs) and present the parallelisation of a
collection of algorithms that deal with inference and learning of PGMs from
data. Namely, maximum likelihood estimation, importance sampling, and greedy
search for solving combinatorial optimisation problems. Through these concrete
examples, we tackle the problem of defining efficient data structures for PGMs
and parallel processing of same-size batches of data sets using Java 8
features. We also provide straightforward techniques to code parallel
algorithms that seamlessly exploit multi-core processors. The experimental
analysis, carried out using our open source AMIDST (Analysis of MassIve Data
STreams) Java toolbox, shows the merits of the proposed solutions.Comment: Pre-print version of the paper presented in the special issue on
Computational Intelligence Software at IEEE Computational Intelligence
Magazine journa
Bit-level pipelined digit-serial array processors
A new architecture for high performance digit-serial vector inner product (VIP) which can be pipelined to the bit-level is introduced. The design of the digit-serial vector inner product is based on a new systematic design methodology using radix-2n arithmetic. The proposed architecture allows a high level of bit-level pipelining to increase the throughput rate with minimum initial delay and minimum area. This will give designers greater flexibility in finding the best tradeoff between hardware cost and throughput rate. It is shown that sub-digit pipelined digit-serial structure can achieve a higher throughput rate with much less area consumption than an equivalent bit-parallel structure. A twin-pipe architecture to double the throughput rate of digit-serial multipliers and consequently that of the digit-serial vector inner product is also presented. The effect of the number of pipelining levels and the twin-pipe architecture on the throughput rate and hardware cost are discussed. A two's complement digit-serial architecture which can operate on both negative and positive numbers is also presented
Breadth First Search Vectorization on the Intel Xeon Phi
Breadth First Search (BFS) is a building block for graph algorithms and has
recently been used for large scale analysis of information in a variety of
applications including social networks, graph databases and web searching. Due
to its importance, a number of different parallel programming models and
architectures have been exploited to optimize the BFS. However, due to the
irregular memory access patterns and the unstructured nature of the large
graphs, its efficient parallelization is a challenge. The Xeon Phi is a
massively parallel architecture available as an off-the-shelf accelerator,
which includes a powerful 512 bit vector unit with optimized scatter and gather
functions. Given its potential benefits, work related to graph traversing on
this architecture is an active area of research.
We present a set of experiments in which we explore architectural features of
the Xeon Phi and how best to exploit them in a top-down BFS algorithm but the
techniques can be applied to the current state-of-the-art hybrid, top-down plus
bottom-up, algorithms.
We focus on the exploitation of the vector unit by developing an improved
highly vectorized OpenMP parallel algorithm, using vector intrinsics, and
understanding the use of data alignment and prefetching. In addition, we
investigate the impact of hyperthreading and thread affinity on performance, a
topic that appears under researched in the literature. As a result, we achieve
what we believe is the fastest published top-down BFS algorithm on the version
of Xeon Phi used in our experiments. The vectorized BFS top-down source code
presented in this paper can be available on request as free-to-use software
- …