Search CORE

3,348 research outputs found

Efficient and Reasonable Object-Oriented Concurrency

Author: Meyer Bertrand
Nanz Sebastian
West Scott
Publication venue
Publication date: 27/07/2015
Field of study

Making threaded programs safe and easy to reason about is one of the chief difficulties in modern programming. This work provides an efficient execution model for SCOOP, a concurrency approach that provides not only data race freedom but also pre/postcondition reasoning guarantees between threads. The extensions we propose influence both the underlying semantics to increase the amount of concurrent execution that is possible, exclude certain classes of deadlocks, and enable greater performance. These extensions are used as the basis an efficient runtime and optimization pass that improve performance 15x over a baseline implementation. This new implementation of SCOOP is also 2x faster than other well-known safe concurrent languages. The measurements are based on both coordination-intensive and data-manipulation-intensive benchmarks designed to offer a mixture of workloads.Comment: Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE '15). ACM, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Asynchronous Evolution of Deep Neural Network Architectures

Author: Liang Jason
Miikkulainen Risto
Shahrzad Hormoz
Publication venue
Publication date: 08/08/2023
Field of study

Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation to be created. Evolutionary neural architecture search (ENAS), a class of EAs that optimizes the architecture and hyperparameters of deep neural networks, is particularly vulnerable to this issue. This paper proposes a generic asynchronous evaluation strategy (AES) that is then adapted to work with ENAS. AES increases throughput by maintaining a queue of upto

K

individuals ready to be sent to the workers for evaluation and proceeding to the next generation as soon as

M<<K

individuals have been evaluated by the workers. A suitable value for

M

is determined experimentally, balancing diversity and efficiency. To showcase the generality and power of AES, it was first evaluated in 11-bit multiplexer design (a single-population verifiable discovery task) and then scaled up to ENAS for image captioning (a multi-population open-ended-optimization task). In both problems, a multifold performance improvement was observed, suggesting that AES is a promising method for parallelizing the evolution of complex systems with long and variable evaluation times, such as those in ENAS

arXiv.org e-Print Archive

Image recognition with Deep Learning techniques and TensorFlow

Author: Yagües Gomà Maurici
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

Deep neural networks have gained popularity in recent years, obtaining outstanding results in a wide range of application, but most notoriously in computer vision and natural language processing tasks. Despite the newly found interest, research in neural networks span many decades back, and some of today’s most used network architectures where invented many years ago. Nevertheless, the progress made during this period cannot be understood without taking into account the technological advancements seen in key contiguous domains such as massive data storage and computing systems, more specifically in the Graphic Processing Unit (GPU) domain. These two components are responsible for the enormous performance gains in neural networks, that have made what we call Deep Learning a common word among the Artificial Intelligence and Machine Learning community. These kind of networks need massive amounts of data to effectively train the millions of parameters they contain, and this training can take up to days or weeks depending on the computer architecture we are using. The size of new published datasets keeps growing, and the tendency of creating deeper networks that outperforms shallower architectures means that on the medium and long term the computer hardware to undertake these kind of training processes can only be found in high performance computing facilities, where they have enormous clusters of computers. However, using these machines is not straightforward, as both the framework and the code need to be appropriately tuned for effectively taking advantage of these distributed environments. For this reason, we test TensorFlow, an open-sourced framework for Deep Learning from Google that has built-in distributed support, on top of the GPU cluster, called MinoTauro, at Barcelona Supercomputing Center (BSC). We aim to implement a defined workload using the distributed features the framework offers, to speed up the training process, acquire knowledge of the inner workings of the framework and understand the similarities and differences with respect to a classic single node training

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Performance Improvements for FDDI and CSMA/CD Protocols

Author: Game David Earl
Publication venue: ODU Digital Commons
Publication date: 01/01/1990
Field of study

The High-Performance Computing Initiative from the White House Office of Science and Technology Policy has defined 20 major challenges in science and engineering which are dependent on the solutions to a number of high-performance computing problems. One of the major areas of focus of this initiative is the development of gigabit rate networks to be used in environments such as the space station or a National Research and Educational Network (NREN). The strategy here is to use existing network designs as building blocks for achieving higher rates, with the ultimate goal being a gigabit rate network. Two strategies which contribute to achieving this goal are examined in detail.1 FDDI2 is a token ring network based on fiber optics capable of a 100 Mbps rate. Both media access (MAC) and physical layer modifications are considered. A method is presented which allows one to determine maximum utilization based on the token-holding timer settings. Simulation results show that employing the second counter-rotating ring in combination with destination removal has a multiplicative effect greater than the effect which either of the factors have individually on performance. Two 100 Mbps rings can handle loads in the range of 400 to 500 Mbps for traffic with a uniform distribution and fixed packet size. Performance is dependent on the number of nodes, improving as the number increases. A wide range of environments are examined to illustrate robustness, and a method of implementation is discussed

Old Dominion University