1,007 research outputs found
The importance of a new product development (NPD) process: getting started.
In order to achieve a successful new product, and certainly the successful implementation of a new product into a company, it is necessary to have a structured and documented approach to New Product Development (NPD), therefore providing a clear roadmap for the development of new products. This review highlights the NPD process, from concept to consumer, and what the key success drivers are, such as; the quest for real product superiority and success, and the need for cross-functional teams; in order for a company to succeed and use new products as a source for competitive advantage
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to
implement on new parallel hardware such as GPUs because of three challenges:
(1) the difficulty of coming up with graph building blocks, (2) load imbalance
on parallel hardware, and (3) graph problems having low arithmetic intensity.
To address some of these challenges, GraphBLAS is an innovative, on-going
effort by the graph analytics community to propose building blocks based on
sparse linear algebra, which will allow graph algorithms to be expressed in a
performant, succinct, composable and portable manner. In this paper, we examine
the performance challenges of a linear-algebra-based approach to building graph
frameworks and describe new design principles for overcoming these bottlenecks.
Among the new design principles is exploiting input sparsity, which allows
users to write graph algorithms without specifying push and pull direction.
Exploiting output sparsity allows users to tell the backend which values of the
output in a single vectorized computation they do not want computed.
Load-balancing is an important feature for balancing work amongst parallel
workers. We describe the important load-balancing features for handling graphs
with different characteristics. The design principles described in this paper
have been implemented in "GraphBLAST", the first high-performance linear
algebra-based graph framework on NVIDIA GPUs that is open-source. The results
show that on a single GPU, GraphBLAST has on average at least an order of
magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL,
comparable performance to the fastest GPU hardwired primitives and
shared-memory graph frameworks Ligra and Gunrock, and better performance than
any other GPU graph framework, while offering a simpler and more concise
programming model.Comment: 50 pages, 14 figures, 14 table
Scalable Breadth-First Search on a GPU Cluster
On a GPU cluster, the ratio of high computing power to communication
bandwidth makes scaling breadth-first search (BFS) on a scale-free graph
extremely challenging. By separating high and low out-degree vertices, we
present an implementation with scalable computation and a model for scalable
communication for BFS and direction-optimized BFS. Our communication model uses
global reduction for high-degree vertices, and point-to-point transmission for
low-degree vertices. Leveraging the characteristics of degree separation, we
reduce the graph size to one third of the conventional edge list
representation. With several other optimizations, we observe linear weak
scaling as we increase the number of GPUs, and achieve 259.8 GTEPS on a
scale-33 Graph500 RMAT graph with 124 GPUs on the latest CORAL early access
system.Comment: 12 pages, 13 figures. To appear at IPDPS 201
Dynamic Graphs on the GPU
We present a fast dynamic graph data structure for the GPU. Our dynamic graph structure uses one hash table per vertex to store adjacency lists and achieves 3.4–14.8x faster insertion rates over the state of the art across a diverse set of large datasets, as well as deletion speedups up to 7.8x. The data structure supports queries and dynamic updates through both edge and vertex insertion and deletion. In addition, we define a comprehensive evaluation strategy based on operations, workloads, and applications that we believe better characterize and evaluate dynamic graph data structures
Influence of water temperature on the efficacy of diquat and endothall versus curlyleaf pondweed
determine the impact of water temperature on the efficacy
of the contact herbicides diquat (6,7-dihydrodipyrido [1,2-
α:2’,1’-c] pyrazinediium ion) and endothall (7-oxabicyclo
[2.2.1] heptane-2,3-dicarboxylic acid) for control of the exotic
nuisance species curlyleaf pondweed (Potamogeton crispus L.)
across a range of water temperatures
Multi-GPU Graph Analytics
We present a single-node, multi-GPU programmable graph processing library
that allows programmers to easily extend single-GPU graph algorithms to achieve
scalable performance on large graphs with billions of edges. Directly using the
single-GPU implementations, our design only requires programmers to specify a
few algorithm-dependent concerns, hiding most multi-GPU related implementation
details. We analyze the theoretical and practical limits to scalability in the
context of varying graph primitives and datasets. We describe several
optimizations, such as direction optimizing traversal, and a just-enough memory
allocation scheme, for better performance and smaller memory consumption.
Compared to previous work, we achieve best-of-class performance across
operations and datasets, including excellent strong and weak scalability on
most primitives as we increase the number of GPUs in the system.Comment: 12 pages. Final version submitted to IPDPS 201
Finding Convex Hulls Using Quickhull on the GPU
We present a convex hull algorithm that is accelerated on commodity graphics
hardware. We analyze and identify the hurdles of writing a recursive divide and
conquer algorithm on the GPU and divise a framework for representing this class
of problems. Our framework transforms the recursive splitting step into a
permutation step that is well-suited for graphics hardware. Our convex hull
algorithm of choice is Quickhull. Our parallel Quickhull implementation (for
both 2D and 3D cases) achieves an order of magnitude speedup over standard
computational geometry libraries.Comment: 11 page
GPU LSM: A Dynamic Dictionary Data Structure for the GPU
We develop a dynamic dictionary data structure for the GPU, supporting fast
insertions and deletions, based on the Log Structured Merge tree (LSM). Our
implementation on an NVIDIA K40c GPU has an average update (insertion or
deletion) rate of 225 M elements/s, 13.5x faster than merging items into a
sorted array. The GPU LSM supports the retrieval operations of lookup, count,
and range query operations with an average rate of 75 M, 32 M and 23 M
queries/s respectively. The trade-off for the dynamic updates is that the
sorted array is almost twice as fast on retrievals. We believe that our GPU LSM
is the first dynamic general-purpose dictionary data structure for the GPU.Comment: 11 pages, accepted to appear on the Proceedings of IEEE International
Parallel and Distributed Processing Symposium (IPDPS'18
- …