3,348 research outputs found
Efficient and Reasonable Object-Oriented Concurrency
Making threaded programs safe and easy to reason about is one of the chief
difficulties in modern programming. This work provides an efficient execution
model for SCOOP, a concurrency approach that provides not only data race
freedom but also pre/postcondition reasoning guarantees between threads. The
extensions we propose influence both the underlying semantics to increase the
amount of concurrent execution that is possible, exclude certain classes of
deadlocks, and enable greater performance. These extensions are used as the
basis an efficient runtime and optimization pass that improve performance 15x
over a baseline implementation. This new implementation of SCOOP is also 2x
faster than other well-known safe concurrent languages. The measurements are
based on both coordination-intensive and data-manipulation-intensive benchmarks
designed to offer a mixture of workloads.Comment: Proceedings of the 10th Joint Meeting of the European Software
Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of
Software Engineering (ESEC/FSE '15). ACM, 201
Asynchronous Evolution of Deep Neural Network Architectures
Many evolutionary algorithms (EAs) take advantage of parallel evaluation of
candidates. However, if evaluation times vary significantly, many worker nodes
(i.e.,\ compute clients) are idle much of the time, waiting for the next
generation to be created. Evolutionary neural architecture search (ENAS), a
class of EAs that optimizes the architecture and hyperparameters of deep neural
networks, is particularly vulnerable to this issue. This paper proposes a
generic asynchronous evaluation strategy (AES) that is then adapted to work
with ENAS. AES increases throughput by maintaining a queue of upto
individuals ready to be sent to the workers for evaluation and proceeding to
the next generation as soon as individuals have been evaluated by the
workers. A suitable value for is determined experimentally, balancing
diversity and efficiency. To showcase the generality and power of AES, it was
first evaluated in 11-bit multiplexer design (a single-population verifiable
discovery task) and then scaled up to ENAS for image captioning (a
multi-population open-ended-optimization task). In both problems, a multifold
performance improvement was observed, suggesting that AES is a promising method
for parallelizing the evolution of complex systems with long and variable
evaluation times, such as those in ENAS
Image recognition with Deep Learning techniques and TensorFlow
Deep neural networks have gained popularity in recent years, obtaining outstanding results in
a wide range of application, but most notoriously in computer vision and natural language
processing tasks. Despite the newly found interest, research in neural networks span many
decades back, and some of today’s most used network architectures where invented many years
ago. Nevertheless, the progress made during this period cannot be understood without taking
into account the technological advancements seen in key contiguous domains such as massive
data storage and computing systems, more specifically in the Graphic Processing Unit (GPU)
domain. These two components are responsible for the enormous performance gains in neural
networks, that have made what we call Deep Learning a common word among the Artificial
Intelligence and Machine Learning community.
These kind of networks need massive amounts of data to effectively train the millions of
parameters they contain, and this training can take up to days or weeks depending on the
computer architecture we are using. The size of new published datasets keeps growing, and the
tendency of creating deeper networks that outperforms shallower architectures means that on
the medium and long term the computer hardware to undertake these kind of training processes
can only be found in high performance computing facilities, where they have enormous clusters
of computers. However, using these machines is not straightforward, as both the framework and
the code need to be appropriately tuned for effectively taking advantage of these distributed
environments.
For this reason, we test TensorFlow, an open-sourced framework for Deep Learning from
Google that has built-in distributed support, on top of the GPU cluster, called MinoTauro, at
Barcelona Supercomputing Center (BSC). We aim to implement a defined workload using the
distributed features the framework offers, to speed up the training process, acquire knowledge
of the inner workings of the framework and understand the similarities and differences with
respect to a classic single node training
Performance Improvements for FDDI and CSMA/CD Protocols
The High-Performance Computing Initiative from the White House Office of Science and Technology Policy has defined 20 major challenges in science and engineering which are dependent on the solutions to a number of high-performance computing problems. One of the major areas of focus of this initiative is the development of gigabit rate networks to be used in environments such as the space station or a National Research and Educational Network (NREN).
The strategy here is to use existing network designs as building blocks for achieving higher rates, with the ultimate goal being a gigabit rate network. Two strategies which contribute to achieving this goal are examined in detail.1
FDDI2 is a token ring network based on fiber optics capable of a 100 Mbps rate. Both media access (MAC) and physical layer modifications are considered. A method is presented which allows one to determine maximum utilization based on the token-holding timer settings. Simulation results show that employing the second counter-rotating ring in combination with destination removal has a multiplicative effect greater than the effect which either of the factors have individually on performance. Two 100 Mbps rings can handle loads in the range of 400 to 500 Mbps for traffic with a uniform distribution and fixed packet size. Performance is dependent on the number of nodes, improving as the number increases. A wide range of environments are examined to illustrate robustness, and a method of implementation is discussed
- …