2,378 research outputs found
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
Immunotronics - novel finite-state-machine architectures with built-in self-test using self-nonself differentiation
A novel approach to hardware fault tolerance is demonstrated that takes inspiration from the human immune system as a method of fault detection. The human immune system is a remarkable system of interacting cells and organs that protect the body from invasion and maintains reliable operation even in the presence of invading bacteria or viruses. This paper seeks to address the field of electronic hardware fault tolerance from an immunological perspective with the aim of showing how novel methods based upon the operation of the immune system can both complement and create new approaches to the development of fault detection mechanisms for reliable hardware systems. In particular, it is shown that by use of partial matching, as prevalent in biological systems, high fault coverage can be achieved with the added advantage of reducing memory requirements. The development of a generic finite-state-machine immunization procedure is discussed that allows any system that can be represented in such a manner to be "immunized" against the occurrence of faulty operation. This is demonstrated by the creation of an immunized decade counter that can detect the presence of faults in real tim
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
Emerging Technologies and Research Challenges for 5G Wireless Networks
As the take-up of Long Term Evolution (LTE)/4G cellular accelerates, there is
increasing interest in technologies that will define the next generation (5G)
telecommunication standard. This paper identifies several emerging technologies
which will change and define the future generations of telecommunication
standards. Some of these technologies are already making their way into
standards such as 3GPP LTE, while others are still in development.
Additionally, we will look at some of the research problems that these new
technologies pose.Comment: Accepted for publication in IEEE Wireless Communications April 201
A Scalable VLSI Architecture for Soft-Input Soft-Output Depth-First Sphere Decoding
Multiple-input multiple-output (MIMO) wireless transmission imposes huge
challenges on the design of efficient hardware architectures for iterative
receivers. A major challenge is soft-input soft-output (SISO) MIMO demapping,
often approached by sphere decoding (SD). In this paper, we introduce the - to
our best knowledge - first VLSI architecture for SISO SD applying a single
tree-search approach. Compared with a soft-output-only base architecture
similar to the one proposed by Studer et al. in IEEE J-SAC 2008, the
architectural modifications for soft input still allow a one-node-per-cycle
execution. For a 4x4 16-QAM system, the area increases by 57% and the operating
frequency degrades by 34% only.Comment: Accepted for IEEE Transactions on Circuits and Systems II Express
Briefs, May 2010. This draft from April 2010 will not be updated any more.
Please refer to IEEE Xplore for the final version. *) The final publication
will appear with the modified title "A Scalable VLSI Architecture for
Soft-Input Soft-Output Single Tree-Search Sphere Decoding
Building knowledge around complex objects using InforBright Data Warehousing technology
There are considerable challenges in analysing and reporting on word-based data. Infobright data warehousing technology was used to build knowledge around qualitative data that are subject to human interpretation. Infobright was chosen as a system for implementing the data set because its rough set based intelligence appears to be extensible with moderate effort to implement the data warehousing requirements for automatic interpretation of word based data. An example of social sciences research data was used for illustration
Reconfigurable Processing for Satellite On-Board Automatic Cloud Cover Assessment (ACCA)
Clouds have a critical role in many studies such as weather- and climate-related investigations. However, they represent a source of errors in many applications, and the presence of cloud contamination can hinder the use of satellite data. In addition, sending cloudy data to ground stations can result in an inefficient utilization of the communication bandwidth. This requires satellite on-board cloud detection capability to mask out cloudy pixels from further processing. Remote sensing satellite missions have always required smaller size, lower cost, more flexibility, and higher computational power. Reconfigurable Computers (RCs) combine the flexibility of traditional microprocessors with the power of Field Programmable Gate Arrays (FPGAs). Therefore, RCs are a promising candidate for on-board preprocessing. This paper presents the design and implementation of an RC-based real-time cloud detection system. We investigate the potential of using RCs for on-board preprocessing by prototyping the Landsat 7 ETM+ ACCA algorithm on one of the state-of-the-art reconfigurable platforms, SRC-6. It will be shown that our work provides higher detection accuracy and over one order of magnitude improvement in performance when compared to previously reported investigations
- …