100,709 research outputs found
Dynamic Dependency Collapsing
In this dissertation, we explore the concept of dynamic dependency collapsing. Performance increases in computer architecture are always introduced by exploiting additional parallelism when the clock speed is fixed. We show that further improvements are possible even when the available parallelism in programs are exhausted. This performance improvement is possible due to executing instructions in parallel that would ordinarily have been serialized. We call this concept dependency collapsing. We explore existing techniques that exploit parallelism and show which of them fall under the umbrella of dependency collapsing. We then introduce two dependency collapsing techniques of our own. The first technique collapses data dependencies by executing two normally dependent instructions together by fusing them. We show that exploiting the additional parallelism generated by collapsing these dependencies results in a performance increase. Our second technique collapses resource dependencies to execute instructions that would normally have been serialized due to resource constraints in the processor. We show that it is possible to take advantage of larger in-processor structures while avoiding the power and area penalty this often implies
Parallelizing Optimal Multiple Sequence Alignment by Dynamic Programming
Optimal multiple sequence alignment by dynamic programming, like many highly
dimensional scientific computing problems, has failed to benefit from the
improvements in computing performance brought about by multi-processor systems,
due to the lack of suitable scheme to manage partitioning and dependencies. A
scheme for parallel implementation of the dynamic programming multiple sequence
alignment is presented, based on a peer to peer design and a multidimensional
array indexing method. This design results in up to 5-fold improvement compared
to a previously described master/slave design, and scales favourably with the
number of processors used. This study demonstrates an approach for
parallelising multi-dimensional dynamic programming and similar algorithms
utilizing multi-processor architectures
An Architectural Approach to Ensuring Consistency in Hierarchical Execution
Hierarchical task decomposition is a method used in many agent systems to
organize agent knowledge. This work shows how the combination of a hierarchy
and persistent assertions of knowledge can lead to difficulty in maintaining
logical consistency in asserted knowledge. We explore the problematic
consequences of persistent assumptions in the reasoning process and introduce
novel potential solutions. Having implemented one of the possible solutions,
Dynamic Hierarchical Justification, its effectiveness is demonstrated with an
empirical analysis
Recommended from our members
Computer-aided programming for multiprocessing systems
As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. This report discusses parallel models of computation and tools for computer-aided programming (CAP). Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. In particular, a CAP tool, named Hypertool, is described here. It performs scheduling and handles the communication primitive insertion automatically so that many errors are eliminated. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs. Experiments have shown that up to a 300% performance improvement can be achieved by computer-aided programming
Dependency parsing of Turkish
The suitability of different parsing methods for different languages is an important topic in
syntactic parsing. Especially lesser-studied languages, typologically different from the languages
for which methods have originally been developed, poses interesting challenges in this respect.
This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative
free constituent order language that can be seen as the representative of a wider class
of languages of similar type. Our investigations show that morphological structure plays an
essential role in finding syntactic relations in such a language. In particular, we show that
employing sublexical representations called inflectional groups, rather than word forms, as the
basic parsing units improves parsing accuracy. We compare two different parsing methods, one
based on a probabilistic model with beam search, the other based on discriminative classifiers and
a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless
of parsing method.We examine the impact of morphological and lexical information in detail and
show that, properly used, this kind of information can improve parsing accuracy substantially.
Applying the techniques presented in this article, we achieve the highest reported accuracy for
parsing the Turkish Treebank
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks
Going deeper and wider in neural architectures improves the accuracy, while
the limited GPU DRAM places an undesired restriction on the network design
domain. Deep Learning (DL) practitioners either need change to less desired
network architectures, or nontrivially dissect a network across multiGPUs.
These distract DL practitioners from concentrating on their original machine
learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling
runtime to enable the network training far beyond the GPU DRAM capacity.
SuperNeurons features 3 memory optimizations, \textit{Liveness Analysis},
\textit{Unified Tensor Pool}, and \textit{Cost-Aware Recomputation}, all
together they effectively reduce the network-wide peak memory usage down to the
maximal memory usage among layers. We also address the performance issues in
those memory saving techniques. Given the limited GPU DRAM, SuperNeurons not
only provisions the necessary memory for the training, but also dynamically
allocates the memory for convolution workspaces to achieve the high
performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have
demonstrated that SuperNeurons trains at least 3.2432 deeper network than
current ones with the leading performance. Particularly, SuperNeurons can train
ResNet2500 that has basic network layers on a 12GB K40c.Comment: PPoPP '2018: 23nd ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programmin
NeutronStream: A Dynamic GNN Training Framework with Sliding Window for Graph Streams
Existing Graph Neural Network (GNN) training frameworks have been designed to
help developers easily create performant GNN implementations. However, most
existing GNN frameworks assume that the input graphs are static, but ignore
that most real-world graphs are constantly evolving. Though many dynamic GNN
models have emerged to learn from evolving graphs, the training process of
these dynamic GNNs is dramatically different from traditional GNNs in that it
captures both the spatial and temporal dependencies of graph updates. This
poses new challenges for designing dynamic GNN training frameworks. First, the
traditional batched training method fails to capture real-time structural
evolution information. Second, the time-dependent nature makes parallel
training hard to design. Third, it lacks system supports for users to
efficiently implement dynamic GNNs. In this paper, we present NeutronStream, a
framework for training dynamic GNN models. NeutronStream abstracts the input
dynamic graph into a chronologically updated stream of events and processes the
stream with an optimized sliding window to incrementally capture the
spatial-temporal dependencies of events. Furthermore, NeutronStream provides a
parallel execution engine to tackle the sequential event processing challenge
to achieve high performance. NeutronStream also integrates a built-in graph
storage structure that supports dynamic updates and provides a set of
easy-to-use APIs that allow users to express their dynamic GNNs. Our
experimental results demonstrate that, compared to state-of-the-art dynamic GNN
implementations, NeutronStream achieves speedups ranging from 1.48X to 5.87X
and an average accuracy improvement of 3.97%.Comment: 12 pages, 15 figure
- …