31,800 research outputs found
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks
Predicting the number of clock cycles a processor takes to execute a block of
assembly instructions in steady state (the throughput) is important for both
compiler designers and performance engineers. Building an analytical model to
do so is especially complicated in modern x86-64 Complex Instruction Set
Computer (CISC) machines with sophisticated processor microarchitectures in
that it is tedious, error prone, and must be performed from scratch for each
processor generation. In this paper we present Ithemal, the first tool which
learns to predict the throughput of a set of instructions. Ithemal uses a
hierarchical LSTM--based approach to predict throughput based on the opcodes
and operands of instructions in a basic block. We show that Ithemal is more
accurate than state-of-the-art hand-written tools currently used in compiler
backends and static machine code analyzers. In particular, our model has less
than half the error of state-of-the-art analytical models (LLVM's llvm-mca and
Intel's IACA). Ithemal is also able to predict these throughput values just as
fast as the aforementioned tools, and is easily ported across a variety of
processor microarchitectures with minimal developer effort.Comment: Published at 36th International Conference on Machine Learning (ICML)
201
User-Friendly Parallel Computations with Econometric Examples
This paper shows how a high level matrix programming language may be used to perform Monte Carlo simulation, bootstrapping, estimation by maximum likelihood and GMM, and kernel regression in parallel on symmetric multiprocessor computers or clusters of workstations. The implementation of parallelization is done in a way such that an investigator may use the programs without any knowledge of parallel programming. A bootable CD that allows rapid creation of a cluster for parallel computing is introduced. Examples show that parallelization can lead to important reductions in computational time. Detailed discussion of how the Monte Carlo problem was parallelized is included as an example for learning to write parallel programs for Octave.parallel computing, Monte Carlo, bootstrapping,maximum likelihood, GMM, kernel regression
API design for machine learning software: experiences from the scikit-learn project
Scikit-learn is an increasingly popular machine learning li- brary. Written
in Python, it is designed to be simple and efficient, accessible to
non-experts, and reusable in various contexts. In this paper, we present and
discuss our design choices for the application programming interface (API) of
the project. In particular, we describe the simple and elegant interface shared
by all learning and processing units in the library and then discuss its
advantages in terms of composition and reusability. The paper also comments on
implementation details specific to the Python ecosystem and analyzes obstacles
faced by users and developers of the library
Likelihood Inference In Parallel Systems Regression Models With Censored Data
The work in this thesis is concerned with the investigation of the finite
sample performance of asymptotic inference procedures based on the likelihood
function when applied to the regression model based on parallel systems with
censored data. The study includes investigating the adequacy of these inferential
procedures as well as investigating the relative performances of asymptotically
equivalent likelihood-based statistics in small samples.
The maximum likelihood estimator of the parameters of this model is not
available in closed form. Thus, its actual sampling distribution is intractable. A
simulation study is conducted to investigate the bias, the finite sample variance, the
asymptotic variance obtained from the inverse of the observed Fisher information
matrix, the adequacy of this approximate asymptotic variance, and the mean square
Efficient transfer entropy analysis of non-stationary neural time series
Information theory allows us to investigate information processing in neural
systems in terms of information transfer, storage and modification. Especially
the measure of information transfer, transfer entropy, has seen a dramatic
surge of interest in neuroscience. Estimating transfer entropy from two
processes requires the observation of multiple realizations of these processes
to estimate associated probability density functions. To obtain these
observations, available estimators assume stationarity of processes to allow
pooling of observations over time. This assumption however, is a major obstacle
to the application of these estimators in neuroscience as observed processes
are often non-stationary. As a solution, Gomez-Herrero and colleagues
theoretically showed that the stationarity assumption may be avoided by
estimating transfer entropy from an ensemble of realizations. Such an ensemble
is often readily available in neuroscience experiments in the form of
experimental trials. Thus, in this work we combine the ensemble method with a
recently proposed transfer entropy estimator to make transfer entropy
estimation applicable to non-stationary time series. We present an efficient
implementation of the approach that deals with the increased computational
demand of the ensemble method's practical application. In particular, we use a
massively parallel implementation for a graphics processing unit to handle the
computationally most heavy aspects of the ensemble method. We test the
performance and robustness of our implementation on data from simulated
stochastic processes and demonstrate the method's applicability to
magnetoencephalographic data. While we mainly evaluate the proposed method for
neuroscientific data, we expect it to be applicable in a variety of fields that
are concerned with the analysis of information transfer in complex biological,
social, and artificial systems.Comment: 27 pages, 7 figures, submitted to PLOS ON
Transputer control of a flexible robot link
The applicability of transputers in control systems is investigated. This is done by implementing a controller for a flexible robot arm with one degree of freedom on a system consisting of an IBM-AT and four transputers. It is found that a control system with transputers offers a great improvement compared with conventional digital control systems. Transputers can solve the common problem in control practice, i.e. having very sophisticted controllers but not being able to implement them because they need too much computing time. However, transputers are not an optimal solution for more sophisticated control systems because of shortcomings in the scheduling mechanism
Feedback Controlled Software Systems
Software systems generally suffer from a certain fragility in the face of disturbances such as bugs, unforeseen user input, unmodeled interactions with other software components, and so on. A single such disturbance can make the machine on which the software is executing hang or crash. We postulate that what is required to address this fragility is a general means of using feedback to stabilize these systems. In this paper we develop a preliminary dynamical systems model of an arbitrary iterative software process along with the conceptual framework for stabilizing it in the presence of disturbances. To keep the computational requirements of the controllers low, randomization and approximation are used. We describe our initial attempts to apply the model to a faulty list sorter, using feedback to improve its performance. Methods by which software robustness can be enhanced by distributing a task between nodes each of which are capable of selecting the best input to process are also examined, and the particular case of a sorting system consisting of a network of partial sorters, some of which may be buggy or even malicious, is examined
- …