3,090 research outputs found
A Research-Oriented Course on Advanced Multicore Architecture
©2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Multicore processors have become ubiquitous in our real life in devices like smartphones, tablets, etc. In fact, they are present in almost all segments of the computing market, from supercomputers to embedded devices. The huge market competence have lead industry and academia to develop vertiginous technological and architectural advances.
The fast evolution that are still experiencing current multicores makes difficult for instructors to offer computer architecture courses with updated contents, preferably showing the industry and academia research trends. To deal with this shortcoming, authors consider that a research-oriented course is the most appropriate solution. This paper presents an advanced computer architecture course called Advanced Multicore Architectures, offered in 2015. The
course covers the basic topics of multicore architecture and has been organized in four main modules regarding multicore basis, performance evaluation, advanced caching, and main memory organization.
The course follows a research-oriented approach that covers theoretical concepts at lectures in which recent research papers are analyzed to provide students a wide view of current trends. Moreover, additional teaching methods like lab sessions with a state-of-the-art multicore simulator or research-oriented exercises have been used with the aim of introducing students to research in these topics. To achieve this fully research-oriented methodology, about 40% of the time is devoted to labs and exercises.This work was supported by the Spanish Ministerio de Economía y Competitividad (MINECO) and by FEDER funds under Grant TIN2012-38341-C04-01, and by the Intel Early Career Faculty Honor Program Award.Sahuquillo Borrás, J.; Petit Martí, SV.; Selfa Oliver, V.; Gómez Requena, ME. (2015). A Research-Oriented Course on Advanced Multicore Architecture. IEEE Computer Society. https://doi.org/10.1109/IPDPSW.2015.46
A research-oriented course on Advanced Multicore Architecture: Contents and active learning methodologies
[EN] The fast evolution of multicore processors makes it difficult for professors to offer computer architecture courses with updated contents. To deal with this shortcoming that could discourage students, the most appropriate solution is a research-oriented course based on current microprocessor industry trends. Additionally, we also seek to improve the students' skills by applying active learning methodologies, where teachers act as guiders and resource providers while students take the responsibility for their learning. In this paper, we present the Advanced Multicore Architecture (AMA) course, which follows a research-oriented approach to introduce students in architectural breakthroughs and uses active learning methodologies to enable students to develop practical research skills such as critical analysis of research papers or communication abilities. To this end five main activities are used: (i) lectures dealing with key theoretical concepts, (ii) paper review & discussion, (iii) research-oriented practical exercises, (iv) lab sessions with a state-of-the-art multicore simulator, and (v) paper presentation. An important part of all these activities is driven by active learning methodologies. Special emphasis is put on the practical side by allocating 40% of the time to labs and exercises. This work also includes an assessment study that analyzes both the course contents and the used methodology (both of them compared to other courses).This work was supported in part by the Spanish Ministerio de Economia y Competitividad (MINECO) and by Plan E funds under Grant TIN2014-62246-EXP and Grant TIN2015-66972-C5-1-R, and by Generalitat Valenciana under grant AICO/2016/059. Authors also would like to thank Onur Mutlu for making available online his valuable teaching material.Petit Martí, SV.; Sahuquillo Borrás, J.; Gómez Requena, ME.; Selfa-Oliver, V. (2017). A research-oriented course on Advanced Multicore Architecture: Contents and active learning methodologies. Journal of Parallel and Distributed Computing. 105:63-72. https://doi.org/10.1016/j.jpdc.2017.01.011S637210
XinuPi3: Teaching Multicore Concepts Using Embedded Xinu
As computer platforms become more advanced, the need to teach advanced computing concepts grows accordingly. This paper addresses one such need by presenting XinuPi3, a port of the lightweight instructional operating system Embedded Xinu to the Raspberry Pi 3. The Raspberry Pi 3 improves upon previous generations of inexpensive, credit card-sized computers by including a quad-core, ARM-based processor, opening the door for educators to demonstrate essential aspects of modern computing like inter-core communication and genuine concurrency.
Embedded Xinu has proven to be an effective teaching tool for demonstrating low-level concepts on single-core platforms, and it is currently used to teach a range of systems courses at multiple universities. As of this writing, no other bare metal educational operating system supports multicore computing. XinuPi3 provides a suitable learning environment for beginners on genuinely concurrent hardware. This paper provides an overview of the key features of the XinuPi3 system, as well as the novel embedded system education experiences it makes possible
RELEASE: A High-level Paradigm for Reliable Large-scale Server Software
Erlang is a functional language with a much-emulated model for building reliable distributed systems. This paper outlines the RELEASE project, and describes the progress in the first six months. The project aim is to scale the Erlang’s radical concurrency-oriented programming paradigm to build reliable general-purpose software, such as server-based systems, on massively parallel machines. Currently Erlang has inherently scalable computation and reliability models, but in practice scalability is constrained by aspects of the language and virtual machine. We are working at three levels to address these challenges: evolving the Erlang virtual machine so that it can work effectively on large scale multicore systems; evolving the language to Scalable Distributed (SD) Erlang; developing a scalable Erlang infrastructure to integrate multiple, heterogeneous clusters. We are also developing state of the art tools that allow programmers to understand the behaviour of massively parallel SD Erlang programs. We will demonstrate the effectiveness of the RELEASE approach using demonstrators and two large case studies on a Blue Gene
Developing Efficient Discrete Simulations on Multicore and GPU Architectures
In this paper we show how to efficiently implement parallel discrete simulations on multicoreandGPUarchitecturesthrougharealexampleofanapplication: acellularautomatamodel of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientific computing, using an Intel Xeon Platinum 8259CL server with 48 cores, and also an NVIDIA Tesla V100GPU,bothrunningonAmazonWebServer(AWS)Cloud;and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results were compared and analyzed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current multicore CPUs with large core numbers can bring a performance very near to that of GPUs, and even identical in some cases.Ministerio de Economía, Industria y Competitividad, Gobierno de España (MINECO), and the Agencia Estatal de Investigación (AEI) of Spain, cofinanced by FEDER funds (EU) TIN2017-89842
Mainstream parallel array programming on cell
We present the E] compiler and runtime library for the ‘F’ subset of
the Fortran 95 programming language. ‘F’ provides first-class support for arrays,
allowing E] to implicitly evaluate array expressions in parallel using the SPU coprocessors
of the Cell Broadband Engine. We present performance results from
four benchmarks that all demonstrate absolute speedups over equivalent ‘C’ or
Fortran versions running on the PPU host processor. A significant benefit of this
straightforward approach is that a serial implementation of any code is always
available, providing code longevity, and a familiar development paradigm
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
Life of occam-Pi
This paper considers some questions prompted by a brief review of the history of computing. Why is programming so hard? Why is concurrency considered an “advanced” subject? What’s the matter with Objects? Where did all the Maths go? In searching for answers, the paper looks at some concerns over fundamental ideas within object orientation (as represented by modern programming languages), before focussing on the concurrency model of communicating processes and its particular expression in the occam family of languages. In that focus, it looks at the history of occam, its underlying philosophy (Ockham’s Razor), its semantic foundation on Hoare’s CSP, its principles of process oriented design and its development over almost three decades into occam-? (which blends in the concurrency dynamics of Milner’s ?-calculus). Also presented will be an urgent need for rationalisation – occam-? is an experiment that has demonstrated significant results, but now needs time to be spent on careful review and implementing the conclusions of that review. Finally, the future is considered. In particular, is there a future
TensorFlow Enabled Genetic Programming
Genetic Programming, a kind of evolutionary computation and machine learning
algorithm, is shown to benefit significantly from the application of vectorized
data and the TensorFlow numerical computation library on both CPU and GPU
architectures. The open source, Python Karoo GP is employed for a series of 190
tests across 6 platforms, with real-world datasets ranging from 18 to 5.5M data
points. This body of tests demonstrates that datasets measured in tens and
hundreds of data points see 2-15x improvement when moving from the scalar/SymPy
configuration to the vector/TensorFlow configuration, with a single core
performing on par or better than multiple CPU cores and GPUs. A dataset
composed of 90,000 data points demonstrates a single vector/TensorFlow CPU core
performing 875x better than 40 scalar/Sympy CPU cores. And a dataset containing
5.5M data points sees GPU configurations out-performing CPU configurations on
average by 1.3x.Comment: 8 pages, 5 figures; presented at GECCO 2017, Berlin, German
- …