9,133 research outputs found
Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models
Neural machine translation - using neural networks to translate human
language - is an area of active research exploring new neuron types and network
topologies with the goal of dramatically improving machine translation
performance. Current state-of-the-art approaches, such as the multi-head
attention-based transformer, require very large translation corpuses and many
epochs to produce models of reasonable quality. Recent attempts to parallelize
the official TensorFlow "Transformer" model across multiple nodes have hit
roadblocks due to excessive memory use and resulting out of memory errors when
performing MPI collectives. This paper describes modifications made to the
Horovod MPI-based distributed training framework to reduce memory usage for
transformer models by converting assumed-sparse tensors to dense tensors, and
subsequently replacing sparse gradient gather with dense gradient reduction.
The result is a dramatic increase in scale-out capability, with CPU-only
scaling tests achieving 91% weak scaling efficiency up to 1200 MPI processes
(300 nodes), and up to 65% strong scaling efficiency up to 400 MPI processes
(200 nodes) using the Stampede2 supercomputer.Comment: 18 pages, 10 figures, accepted at the 2019 International
Supercomputing Conferenc
From Clarity to Efficiency for Distributed Algorithms
This article describes a very high-level language for clear description of
distributed algorithms and optimizations necessary for generating efficient
implementations. The language supports high-level control flows where complex
synchronization conditions can be expressed using high-level queries,
especially logic quantifications, over message history sequences.
Unfortunately, the programs would be extremely inefficient, including consuming
unbounded memory, if executed straightforwardly.
We present new optimizations that automatically transform complex
synchronization conditions into incremental updates of necessary auxiliary
values as messages are sent and received. The core of the optimizations is the
first general method for efficient implementation of logic quantifications. We
have developed an operational semantics of the language, implemented a
prototype of the compiler and the optimizations, and successfully used the
language and implementation on a variety of important distributed algorithms
AutoGraph: Imperative-style Coding with Graph-based Performance
There is a perceived trade-off between machine learning code that is easy to
write, and machine learning code that is scalable or fast to execute. In
machine learning, imperative style libraries like Autograd and PyTorch are easy
to write, but suffer from high interpretive overhead and are not easily
deployable in production or mobile settings. Graph-based libraries like
TensorFlow and Theano benefit from whole-program optimization and can be
deployed broadly, but make expressing complex models more cumbersome. We
describe how the use of staged programming in Python, via source code
transformation, offers a midpoint between these two library design patterns,
capturing the benefits of both. A key insight is to delay all type-dependent
decisions until runtime, via dynamic dispatch. We instantiate these principles
in AutoGraph, a software system that improves the programming experience of the
TensorFlow library, and demonstrate usability improvements with no loss in
performance compared to native TensorFlow graphs. We also show that our system
is backend agnostic, and demonstrate targeting an alternate IR with
characteristics not found in TensorFlow graphs
Semantic Source Code Models Using Identifier Embeddings
The emergence of online open source repositories in the recent years has led
to an explosion in the volume of openly available source code, coupled with
metadata that relate to a variety of software development activities. As an
effect, in line with recent advances in machine learning research, software
maintenance activities are switching from symbolic formal methods to
data-driven methods. In this context, the rich semantics hidden in source code
identifiers provide opportunities for building semantic representations of code
which can assist tasks of code search and reuse. To this end, we deliver in the
form of pretrained vector space models, distributed code representations for
six popular programming languages, namely, Java, Python, PHP, C, C++, and C#.
The models are produced using fastText, a state-of-the-art library for learning
word representations. Each model is trained on data from a single programming
language; the code mined for producing all models amounts to over 13.000
repositories. We indicate dissimilarities between natural language and source
code, as well as variations in coding conventions in between the different
programming languages we processed. We describe how these heterogeneities
guided the data preprocessing decisions we took and the selection of the
training parameters in the released models. Finally, we propose potential
applications of the models and discuss limitations of the models.Comment: 16th International Conference on Mining Software Repositories (MSR
2019): Data Showcase Trac
DEMorphy, German Language Morphological Analyzer
DEMorphy is a morphological analyzer for German. It is built onto large,
compactified lexicons from German Morphological Dictionary. A guesser based on
German declension suffixed is also provided. For German, we provided a
state-of-art morphological analyzer. DEMorphy is implemented in Python with
ease of usability and accompanying documentation. The package is suitable for
both academic and commercial purposes wit a permissive licence.Comment: 7 pages, 2 figure
Visual Analytics and Human Involvement in Machine Learning
The rapidly developing AI systems and applications still require human
involvement in practically all parts of the analytics process. Human decisions
are largely based on visualizations, providing data scientists details of data
properties and the results of analytical procedures. Different visualizations
are used in the different steps of the Machine Learning (ML) process. The
decision which visualization to use depends on factors, such as the data
domain, the data model and the step in the ML process. In this chapter, we
describe the seven steps in the ML process and review different visualization
techniques that are relevant for the different steps for different types of
data, models and purposes
Building a Framework for Predictive Science
Key questions that scientists and engineers typically want to address can be
formulated in terms of predictive science. Questions such as: "How well does my
computational model represent reality?", "What are the most important
parameters in the problem?", and "What is the best next experiment to perform?"
are fundamental in solving scientific problems. Mystic is a framework for
massively-parallel optimization and rigorous sensitivity analysis that enables
these motivating questions to be addressed quantitatively as global
optimization problems. Often realistic physics, engineering, and materials
models may have hundreds of input parameters, hundreds of constraints, and may
require execution times of seconds or longer. In more extreme cases, realistic
models may be multi-scale, and require the use of high-performance computing
clusters for their evaluation. Predictive calculations, formulated as a global
optimization over a potential surface in design parameter space, may require an
already prohibitively large simulation to be performed hundreds, if not
thousands, of times. The need to prepare, schedule, and monitor thousands of
model evaluations, and dynamically explore and analyze results, is a
challenging problem that requires a software infrastructure capable of
distributing and managing computations on large-scale heterogeneous resources.
In this paper, we present the design behind an optimization framework, and also
a framework for heterogeneous computing, that when utilized together, can make
computationally intractable sensitivity and optimization problems much more
tractable
One DSL to Rule Them All: IDE-Assisted Code Generation for Agile Data Analysis
Data analysis is at the core of scientific studies, a prominent task that
researchers and practitioners typically undertake by programming their own set
of automated scripts. While there is no shortage of tools and languages
available for designing data analysis pipelines, users spend substantial effort
in learning the specifics of such languages/tools and often design solutions
too project specific to be reused in future studies. Furthermore, users need to
put further effort into making their code scalable, as parallel implementations
are typically more complex.
We address these problems by proposing an advanced code recommendation tool
which facilitates developing data science scripts. Users formulate their
intentions in a human-readable Domain Specific Language (DSL) for dataframe
manipulation and analysis. The DSL statements can be converted into executable
Python code during editing. To avoid the need to learn the DSL and increase
user-friendliness, our tool supports code completion in mainstream IDEs and
editors. Moreover, DSL statements can generate executable code for different
data analysis frameworks (currently we support Pandas and PySpark). Overall,
our approach attempts to accelerate programming of common data analysis tasks
and to facilitate the conversion of the implementations between frameworks.
In a preliminary assessment based on a popular data processing tutorial, our
tool was able to fully cover 9 out of 14 processing steps for Pandas and 10 out
of 16 for PySpark, while partially covering 4 processing steps for each of the
frameworks.Comment: 7 page
Learning to Generate Corrective Patches using Neural Machine Translation
Bug fixing is generally a manually-intensive task. However, recent work has
proposed the idea of automated program repair, which aims to repair (at least a
subset of) bugs in different ways such as code mutation, etc. Following in the
same line of work as automated bug repair, in this paper we aim to leverage
past fixes to propose fixes of current/future bugs. Specifically, we propose
Ratchet, a corrective patch generation system using neural machine translation.
By learning corresponding pre-correction and post-correction code in past fixes
with a neural sequence-to-sequence model, Ratchet is able to generate a fix
code for a given bug-prone code query. We perform an empirical study with five
open source projects, namely Ambari, Camel, Hadoop, Jetty and Wicket, to
evaluate the effectiveness of Ratchet. Our findings show that Ratchet can
generate syntactically valid statements 98.7% of the time, and achieve an
F1-measure between 0.29 - 0.83 with respect to the actual fixes adopted in the
code base. In addition, we perform a qualitative validation using 20
participants to see whether the generated statements can be helpful in
correcting bugs. Our survey showed that Ratchet's output was considered to be
helpful in fixing the bugs on many occasions, even if fix was not 100% correct.Comment: 20 page
Towards Formula Translation using Recursive Neural Networks
While it has become common to perform automated translations on natural
language, performing translations between different representations of
mathematical formulae has thus far not been possible. We implemented the first
translator for mathematical formulae based on recursive neural networks. We
chose recursive neural networks because mathematical formulae inherently
include a structural encoding. In our implementation, we developed new
techniques and topologies for recursive tree-to-tree neural networks based on
multi-variate multi-valued Long Short-Term Memory cells. We propose a novel
approach for mini-batch training that utilizes clustering and tree traversal.
We evaluate our translator and analyze the behavior of our proposed topologies
and techniques based on a translation from generic LaTeX to the semantic LaTeX
notation. We use the semantic LaTeX notation from the Digital Library for
Mathematical Formulae and the Digital Repository for Mathematical Formulae at
the National Institute for Standards and Technology. We find that a simple
heuristics-based clustering algorithm outperforms the conventional clustering
algorithms on the task of clustering binary trees of mathematical formulae with
respect to their topology. Furthermore, we find a mask for the loss function,
which can prevent the neural network from finding a local minimum of the loss
function. Given our preliminary results, a complete translation from formula to
formula is not yet possible. However, we achieved a prediction accuracy of
47.05% for predicting symbols at the correct position and an accuracy of 92.3%
when ignoring the predicted position. Concluding, our work advances the field
of recursive neural networks by improving the training speed and quality of
training. In the future, we will work towards a complete translation allowing a
machine-interpretation of LaTeX formulae.Comment: 11 pages, Work-in-Progress paper in CICM-WS 2018 Workshop Papers at
11th Conference on Intelligent Computer Mathematics CICM 201
- …