38 research outputs found
Introduction to Programming Using Mobile Phones and MIT App Inventor
At the beginning of each year, we ask our new
undergraduate students in Computer Engineering if they have
ever developed a computer program. Surprisingly, the most
frequent answer is no. The few students who have attended
a Computer Science training module usually have some basic
programming notions; however, most of our students coming
straight from high school have never programmed. This lack
of basic programming skills represents a major drawback when
taking programming-related courses. This is especially true for
the course on Computer Organization, taught during the first
semester of the first year, as one of its main objectives is to
explain the processor architecture, and therefore a great part of
it revolves around programming in assembly language.
To tackle this lack of basic programming skills, a workshop
on mobile application programming using MIT App Inventor
is offered to freshmen. This workshop is highly welcomed and
positively received by the students, and we believe that it has
contributed to improving their performance on courses related to
programming, and in particular, on the Computer Organization
course
¿Puedo programar mi móvil? Pero si acabo de llegar
Por sorprendente que parezca, cada vez que preguntamos a nuestros estudiantes recién matriculados en el Grado en IngenierÃa Informática si han programado alguna vez, la respuesta mayoritaria es que no. Los pocos que han estudiado formación profesional en informática suelen tener alguna noción, pero la mayor parte de los que han estudiado bachillerato, ninguna. Esta falta de competencias básicas de programación supone una desventaja en aquellas asignaturas relacionadas con esta materia. En nuestro grado, esta desventaja es especialmente evidente en la asignatura Estructura de computadores, de primer curso y primer semestre que, sin ser una asignatura de programación al uso, tiene por objeto que el estudiante adquiera competencias relacionadas con la arquitectura de un computador y, por tanto, con la programación en lenguaje ensamblador. Para suplir esta falta de base, se ha impartido un taller de programación para móviles con MIT App Inventor. Este taller ha tenido una gran aceptación, ha sido muy bien valorado por los estudiantes y consideramos que ha contribuido a mejorar los resultados de Estructura de computadores.Surprisingly, every time we ask the newly enrolled students in the Degree in Computer Engineering whether they have ever programmed, the majority answer is no. The few that have done a computer science vocational training module usually have some notion, but most of those who have done high school, none. This lack of basic programming skills is a disadvantage in those courses related to this matter. In our degree, this disadvantage is especially evident in the Computers Structure course, taught on the first year at the first semester. Although it is not a usual programming course, it requires the student to acquire skills related to computer architecture, and, therefore, to programming in assembly language. To address this lack of previous knowledge, a workshop on mobile programming has been taught using MIT App Inventor. This workshop has had a great acceptance, has been very well evaluated by the students, and we believe that has contributed to improve their results on the Computers Structure course
Animaciones interactivas para la enseñanza y aprendizaje de los protocolos de coherencia de cachés
Entre los objetivos formativos de los cursos avanzados
de arquitectura de computadores suele estar el de
que los estudiantes sean capaces de describir y analizar
el funcionamiento de los protocolos de coherencia
de cachés. Aunque dichos protocolos son relativamente
sencillos, es necesario analizar muchas
situaciones diferentes para entender cómo abordan
todos los detalles del problema que quieren resolver.
Lo que hace que sean complejos de explicar y de
comprender. Una herramienta que ilustrara gráficamente
el funcionamiento de dichos protocolos facilitarÃa
enormemente su enseñanza/aprendizaje.
Con objeto de mejorar la docencia de dicha materia,
hemos desarrollado tres animaciones interactivas
que muestran cómo funcionan tres de los protocolos
de coherencia de caché más frecuentemente utilizados.
Para cada protocolo, una serie de operaciones
de lectura/escritura ilustran todas las posibles situaciones
que pueden darse. Las animaciones permiten
avanzar y retroceder para poder entender/estudiar
mejor las acciones que tienen lugar en cada paso.SUMMARY: Among the educational objectives in advanced courses
of computers architecture there is usually one
that states that students should be able to describe
and analyze how the cache coherence protocols
work. Although these protocols are relatively simple,
it is necessary to analyze many different situations
to understand how they address all the details
of the problem they solve. This makes them complex
to be explained and to be understood. A tool
that illustrates graphically the operation of these protocols
should greatly facilitate the teaching/learning of these protocols.
With the aim of improving the teaching on this
subject, we have developed three interactive animations
that show how some of the most frequently
used cache coherence protocols work. For each protocol,
a sequence of read and write operations illustrates
all possible situations that can take place in
each protocol. The tool is interactive in that the
student can go forward and backward to understand/
study the different actions that occur at each
step.Peer Reviewe
Utilizando ARMSim y QtARMSim para la docencia de Arquitectura de Computadores
Muchos de los objetivos formativos de las asignaturas de introducción a la Arquitectura de Computadores se centran en aquellos
aspectos que conforman la visión que un programador en lenguaje ensamblador tiene de un computador. Por regla general, para
definir dichos objetivos se suele utilizar una arquitectura de computador concreta, que normalmente se selecciona con el doble
criterio de que sea lo más sencilla posible y, a la vez, motive al estudiantado.
La arquitectura ARM es una candidata idónea como vehÃculo conductor en la docencia de Arquitectura de Computadores.
Por un lado, al estar basada en la arquitectura RISC (Reduced Instruction Set Computer), es relativamente sencilla. Por otro, se
trata de una arquitectura actual y ampliamente difundida (especialmente en dispositivos móviles, smartphones y tabletas), lo que
motiva al estudiantado.
Para poder realizar prácticas sobre ARM es conveniente disponer de un simulador o de una herramienta de desarrollo sobre
una máquina ARM. Puesto que dicha materia se explica en los primeros cursos, conviene que la aplicación seleccionada sea
sencilla de utilizar y lo suficientemente flexible. Por otro lado, conviene que sea software libre, para poder adaptarla en caso
necesario, y también multiplataforma y gratuita, para facilitar que el estudiante que lo desee pueda instalarla en su propio equipo.
Tras evaluar distintas opciones, finalmente se optó por desarrollar y liberar un simulador propio de ARM, ARMSim, y una interfaz
gráfica para dicho simulador, QtARMSim.
El motor de simulación, ARMSim, y su interfaz, QtARMSim, han sido utilizados durante el curso 2014–15. Las crÃticas
recibidas, tanto por los estudiantes como por los profesores de laboratorio, han sido muy positivas.Many of the training objectives of the Introduction to Computer Architecture modules focus on those aspects that conform the vision that an assembly language programmer has about a computer. As a rule, in order to define those objectives a concrete computer architecture is used following the following criteria: simplicity
and ability to motivate students.
ARM architecture is an ideal candidate for the didactics of Computer Architecture. On the one hand, being based on RISC architecture (Reduced Instruction Set Computer) it is rather simple. On the other, it is widely spread contemporary architecture (especially in mobile phones, smartphones and tablets), something that motivates students.
In order to carry out ARM practice it would be convenient to have a simulator or a development tool on an ARM machine. Given the fact that this module is taught during the first academic years, it would also be convenient that the application selected was easy to use and flexible enough. Besides, it would be a good idea that it used freeware in order to be adapted if necessary, besides being free of charge
and cross-platform-based so the students may install it in their own computers.
After assessing several options, an ARM simulator (ARMSim) as well as a graphic interface for the latter (QtARMSim) were finally developed.
The simulation engine, ARMSim, as well as its interface, QtARMSim, were used during the 2014/2015 academic year. The feedback received from both the students and lab lecturers have been remarkably positive
Utilizando Arduino Due en la docencia de la entrada/salida
Resumen:
La problemática de la entrada/salida y su gestión suele
formar parte de las asignaturas de introducción a la arquitectura
de computadores. La propia naturaleza del
tema y su diversidad hace que las sesiones prácticas
se lleven a cabo habitualmente, bien sobre dispositivos
especÃficos sencillos, bien sobre simuladores, lo que
las aleja de los dispositivos reales y les resta vistosidad.
Sin embargo, es posible utilizar dispositivos actuales y
sencillos, como las tarjetas Arduino, para presentar a
los estudiantes una visión más real y atractiva de la entrada/
salida, manteniendo a su vez la sencillez de uso
de los entornos y sistemas empleados, lo que consideramos
prioritario en los primeros cursos de grado.
En nuestro caso, puesto que actualmente fundamentamos
nuestra docencia en arquitectura de computadores
sobre la arquitectura ARM, hemos optado por el modelo
Arduino Due, que dispone de un microcontrolador,
el ATSAM3X8E, que implementa la versión Cortex-
M3 de la arquitectura ARM.
Para poder realizar las prácticas de entrada/salida hemos
modificado ligeramente el entorno Arduino para
poder incluir programas en ensamblador, y hemos diseñado
una pequeña tarjeta con un led RGB y un pulsador,
lo que ha permitido proponer ejercicios sencillos
pero vistosos. Los propios dispositivos del microcontrolador
de la Arduino DUE han bastado para abarcar
otros aspectos de la entrada/salida y presentar ejemplos
de mayor complejidad para incentivar a los estudiantes.
La primera experiencia con este entorno ha sido satisfactoria
tanto para el profesorado de las asignaturas en
las que se ha utilizado como para los estudiantes, en
quienes además se ha fomentado el interés en continuar
trabajando con las tarjetas Arduino en sus propios
proyectos.Abstract:
The input/output (I/O) and its management is often part
of the introductory courses to computer architecture.
The very nature of this topic and its diversity makes
that the practice sessions often take place either on simple
specific devices, or on simulators, which hide the
complexity of actual I/O devices and subtracts their appealing.
However, it is possible to use today existing and simple
devices such as Arduino boards to introduce students
to a more realistic and attractive vision of the I/O,
while maintaining the ease of use of the required environments
and systems, which we consider a priority on
first degree courses.
In our case, since currently we base our teaching on
computer architecture on the ARM architecture, we
have opted for the Arduino Due model, which has a
microcontroller, ATSAM3X8E, which implements the
Cortex-M3 version of the ARM Architecture.
To carry out the laboratory sessions on I/O we have
slightly modified the Arduino IDE in order to accept
assembly source code. In addition, we have designed
and built a small board with an RGB led and a switch,
which allowed us to propose simple but colourful exercises.
The built-in I/O included in the ARM controller
of the Arduino DUE board have proved enough to explore
other important aspects of I/O as well as to offer
more complex examples to incentivate the students on
the subject.
The first experience with this environment has been
satisfactory for both teachers and students, who also
have fostered interest in continuing to work with Arduino
cards in their own projects
Using machine learning to model the training scalability of convolutional neural networks on clusters of GPUs
In this work, we build a general piece-wise model to analyze data-parallel (DP) training costs of convolutional neural networks (CNNs) on clusters of GPUs. This general model is based on i) multi-layer perceptrons (MLPs) in charge of modeling the NVIDIA cuDNN/cuBLAS library kernels involved in the training of some of the state-of-the-art CNNs; and ii) an analytical model in charge of modeling the NVIDIA NCCL Allreduce collective primitive using the Ring algorithm. The CNN training scalability study performed using this model in combination with the Roofline technique on varying batch sizes, node (floating-point) arithmetic performance, node memory bandwidth, network link bandwidth, and cluster dimension unveil some crucial bottlenecks at both GPU and cluster level. To provide evidence of this analysis, we validate the accuracy of the proposed model against a Python library for distributed deep learning training.Funding for open access charge: CRUE-Universitat Jaume
Animaciones interactivas para la enseñanza y aprendizaje de los protocolos de coherencia de cachés
Entre los objetivos formativos de los cursos avanzados de arquitectura de computadores suele estar el de que los estudiantes sean capaces de describir y analizar el funcionamiento de los protocolos de coherencia de cachés. Aunque dichos protocolos son relativamente sencillos, es necesario analizar muchas situaciones diferentes para entender cómo abordan todos los detalles del problema que quieren resolver. Lo que hace que sean complejos de explicar y de comprender. Una herramienta que ilustrara gráficamente el funcionamiento de dichos protocolos facilitarÃa enormemente su enseñanza/aprendizaje. Con objeto de mejorar la docencia de dicha materia, hemos desarrollado tres animaciones interactivas que muestran cómo funcionan tres de los protocolos de coherencia de caché más frecuentemente utilizados. Para cada protocolo, una serie de operaciones de lectura/escritura ilustran todas las posibles situaciones que pueden darse. Las animaciones permiten avanzar y retroceder para poder entender/estudiar mejor las acciones que tienen lugar en cada paso.Among the educational objectives in advanced courses of computers architecture there is usually one that states that students should be able to describe and analyze how the cache coherence protocols work. Although these protocols are relatively simple, it is necessary to analyze many different situations to understand how they address all the details of the problem they solve. This makes them complex to be explained and to be understood. A tool that illustrates graphically the operation of these protocols should greatly facilitate the teaching/learning of these protocols. With the aim of improving the teaching on this subject, we have developed three interactive animations that show how some of the most frequently used cache coherence protocols work. For each protocol, a sequence of read and write operations illustrates all possible situations that can take place in each protocol. The tool is interactive in that the student can go forward and backward to understand/ study the different actions that occur at each step.Este trabajo ha sido parcialmente financiado por el proyecto «Activitats formatives per a assignatures de la matèria Arquitectura de Computadores» de la Unitat de Suport Educatiu de la Universitat Jaume I (10G136-16)
PyDTNN: A user-friendly and extensible framework for distributed deep learning
We introduce a framework for training deep neural networks on clusters of computers with the following appealing properties: (1) It is developed in Python, exposing an amiable interface that provides an accessible entry point for the newcomer; (2) it is extensible, offering a customizable tool for the more advanced user in deep learning; (3) it covers the main functionality appearing in convolutional neural networks; and (4) it delivers reasonable inter-node parallel performance exploiting data parallelism by leveraging MPI via MPI4Py for communication and NumPy for the efficient execution of (multithreaded) numerical kernels
Performance–energy trade‑ofs of deep learning convolution algorithms on ARM processors
In this work, we assess the performance and energy efciency of high-performance
codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of
ARM-based processor architectures. Specifcally, we evaluate the NVIDIA Denver2
and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs
as part of a recent set of NVIDIA Jetson platforms. The performance–energy evaluation is carried out using the ResNet-50 v1.5 convolutional neural network (CNN)
on varying confgurations of convolution algorithms, number of threads/cores, and
operating frequencies on the tested processor cores. The results demonstrate that the
best throughput is obtained on all platforms with the Winograd convolution operator
running on all the cores at their highest frequency. However, if the goal is to reduce
the energy footprint, there is no rule of thumb for the optimal confguration.Funding for open access charge: CRUE-Universitat Jaume
Performance–energy trade-offs of deep learning convolution algorithms on ARM processors
In this work, we assess the performance and energy efficiency of high-performance codes for the convolution operator, based on the direct, explicit/implicit lowering and Winograd algorithms used for deep learning (DL) inference on a series of ARM-based processor architectures. Specifically, we evaluate the NVIDIA Denver2 and Carmel processors, as well as the ARM Cortex-A57 and Cortex-A78AE CPUs as part of a recent set of NVIDIA Jetson platforms. The performance–energy evaluation is carried out using the ResNet-50 v1.5 convolutional neural network (CNN) on varying configurations of convolution algorithms, number of threads/cores, and operating frequencies on the tested processor cores. The results demonstrate that the best throughput is obtained on all platforms with the Winograd convolution operator running on all the cores at their highest frequency. However, if the goal is to reduce the energy footprint, there is no rule of thumb for the optimal configuration.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research was funded by Project PID2020-113656RB-C21/C22 supported by MCIN/AEI/10.13039/501100011033. Manuel F. Dolz was also supported by the Plan Gen–T grant CDEIGENT/2018/014 of the Generalitat Valenciana. Héctor MartÃnez is a POSTDOC_21_00025 fellow supported by Junta de AndalucÃa. Adrián Castelló is a FJC2019-039222-I fellow supported by MCIN/AEI/10.13039/501100011033. Antonio Maciá is a PRE2021-099284 fellow supported by MCIN/AEI/10.13039/501100011033