847 research outputs found
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors
© 2016 Cheung, Schultz and Luk.NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Simulation and implementation of novel deep learning hardware architectures for resource constrained devices
Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
On microelectronic self-learning cognitive chip systems
After a brief review of machine learning techniques and applications, this Ph.D. thesis examines several approaches for implementing machine learning architectures and algorithms into hardware within our laboratory.
From this interdisciplinary background support, we have motivations for novel approaches that we intend to follow as an objective of innovative hardware implementations of dynamically self-reconfigurable logic for enhanced self-adaptive, self-(re)organizing and eventually self-assembling machine learning systems, while developing this new particular area of research.
And after reviewing some relevant background of robotic control methods followed by most recent advanced cognitive controllers, this Ph.D. thesis suggests that amongst many well-known ways of designing operational technologies, the design methodologies of those leading-edge high-tech devices such as cognitive chips that may well lead to intelligent machines exhibiting
conscious phenomena should crucially be restricted to extremely well defined constraints.
Roboticists also need those as specifications to help decide upfront on otherwise infinitely free hardware/software design details.
In addition and most importantly, we propose these specifications as methodological guidelines tightly related to ethics and the nowadays well-identified workings of the human body and of its psyche
Simulation Intelligence: Towards a New Generation of Scientific Methods
The original "Seven Motifs" set forth a roadmap of essential methods for the
field of scientific computing, where a motif is an algorithmic method that
captures a pattern of computation and data movement. We present the "Nine
Motifs of Simulation Intelligence", a roadmap for the development and
integration of the essential algorithms necessary for a merger of scientific
computing, scientific simulation, and artificial intelligence. We call this
merger simulation intelligence (SI), for short. We argue the motifs of
simulation intelligence are interconnected and interdependent, much like the
components within the layers of an operating system. Using this metaphor, we
explore the nature of each layer of the simulation intelligence operating
system stack (SI-stack) and the motifs therein: (1) Multi-physics and
multi-scale modeling; (2) Surrogate modeling and emulation; (3)
Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based
modeling; (6) Probabilistic programming; (7) Differentiable programming; (8)
Open-ended optimization; (9) Machine programming. We believe coordinated
efforts between motifs offers immense opportunity to accelerate scientific
discovery, from solving inverse problems in synthetic biology and climate
science, to directing nuclear energy experiments and predicting emergent
behavior in socioeconomic settings. We elaborate on each layer of the SI-stack,
detailing the state-of-art methods, presenting examples to highlight challenges
and opportunities, and advocating for specific ways to advance the motifs and
the synergies from their combinations. Advancing and integrating these
technologies can enable a robust and efficient hypothesis-simulation-analysis
type of scientific method, which we introduce with several use-cases for
human-machine teaming and automated science
Novel high performance techniques for high definition computer aided tomography
Mención Internacional en el título de doctorMedical image processing is an interdisciplinary field in which multiple research areas are involved:
image acquisition, scanner design, image reconstruction algorithms, visualization, etc.
X-Ray Computed Tomography (CT) is a medical imaging modality based on the attenuation
suffered by the X-rays as they pass through the body. Intrinsic differences in attenuation properties
of bone, air, and soft tissue result in high-contrast images of anatomical structures. The
main objective of CT is to obtain tomographic images from radiographs acquired using X-Ray
scanners. The process of building a 3D image or volume from the 2D radiographs is known as
reconstruction. One of the latest trends in CT is the reduction of the radiation dose delivered
to patients through the decrease of the amount of acquired data. This reduction results in artefacts
in the final images if conventional reconstruction methods are used, making it advisable to
employ iterative reconstruction algorithms.
There are numerous reconstruction algorithms available, from which we can highlight two
specific types: traditional algorithms, which are fast but do not enable the obtaining of high
quality images in situations of limited data; and iterative algorithms, slower but more reliable
when traditional methods do not reach the quality standard requirements. One of the priorities
of reconstruction is the obtaining of the final images in near real time, in order to reduce the
time spent in diagnosis. To accomplish this objective, new high performance techniques and methods
for accelerating these types of algorithms are needed. This thesis addresses the challenges
of both traditional and iterative reconstruction algorithms, regarding acceleration and image
quality. One common approach for accelerating these algorithms is the usage of shared-memory
and heterogeneous architectures. In this thesis, we propose a novel simulation/reconstruction
framework, namely FUX-Sim. This framework follows the hypothesis that the development of
new flexible X-ray systems can benefit from computer simulations, which may also enable performance
to be checked before expensive real systems are implemented. Its modular design
abstracts the complexities of programming for accelerated devices to facilitate the development
and evaluation of the different configurations and geometries available. In order to obtain near
real execution times, low-level optimizations for the main components of the framework are
provided for Graphics Processing Unit (GPU) architectures.
Other alternative tackled in this thesis is the acceleration of iterative reconstruction algorithms
by using distributed memory architectures. We present a novel architecture that unifies
the two most important computing paradigms for scientific computing nowadays: High Performance
Computing (HPC). The proposed architecture combines Big Data frameworks with the
advantages of accelerated computing.
The proposed methods presented in this thesis provide more flexible scanner configurations
as they offer an accelerated solution. Regarding performance, our approach is as competitive as
the solutions found in the literature. Additionally, we demonstrate that our solution scales with
the size of the problem, enabling the reconstruction of high resolution images.El procesamiento de imágenes médicas es un campo interdisciplinario en el que participan múltiples
áreas de investigación como la adquisición de imágenes, diseño de escáneres, algoritmos de
reconstrucción de imágenes, visualización, etc. La tomografía computarizada (TC) de rayos X es
una modalidad de imágen médica basada en el cálculo de la atenuación sufrida por los rayos X a
medida que pasan por el cuerpo a escanear. Las diferencias intrínsecas en la atenuación de hueso,
aire y tejido blando dan como resultado imágenes de alto contraste de estas estructuras anatómicas.
El objetivo principal de la TC es obtener imágenes tomográficas a partir estas radiografías
obtenidas mediante escáneres de rayos X. El proceso de construir una imagen o volumen en 3D a
partir de las radiografías 2D se conoce como reconstrucción. Una de las últimas tendencias en la
tomografía computarizada es la reducción de la dosis de radiación administrada a los pacientes
a través de la reducción de la cantidad de datos adquiridos. Esta reducción da como resultado
artefactos en las imágenes finales si se utilizan métodos de reconstrucción convencionales, por
lo que es aconsejable emplear algoritmos de reconstrucción iterativos.
Existen numerosos algoritmos de reconstrucción disponibles a partir de los cuales podemos
destacar dos categorías: algoritmos tradicionales, rápidos pero no permiten obtener imágenes de
alta calidad en situaciones en las que los datos son limitados; y algoritmos iterativos, más lentos
pero más estables en situaciones donde los métodos tradicionales no alcanzan los requisitos en
cuanto a la calidad de la imagen. Una de las prioridades de la reconstrucción es la obtención
de las imágenes finales en tiempo casi real, con el fin de reducir el tiempo de diagnóstico. Para
lograr este objetivo, se necesitan nuevas técnicas y métodos de alto rendimiento para acelerar
estos algoritmos.
Esta tesis aborda los desafíos de los algoritmos de reconstrucción tradicionales e iterativos,
con respecto a la aceleración y la calidad de imagen. Un enfoque común para acelerar estos
algoritmos es el uso de arquitecturas de memoria compartida y heterogéneas. En esta tesis,
proponemos un nuevo sistema de simulación/reconstrucción, llamado FUX-Sim. Este sistema se
construye alrededor de la hipótesis de que el desarrollo de nuevos sistemas de rayos X flexibles
puede beneficiarse de las simulaciones por computador, en los que también se puede realizar
un control del rendimiento de los nuevos sistemas a desarrollar antes de su implementación
física. Su diseño modular abstrae las complejidades de la programación para aceleradores con el
objetivo de facilitar el desarrollo y la evaluación de las diferentes configuraciones y geometrías
disponibles. Para obtener ejecuciones en casi tiempo real, se proporcionan optimizaciones de
bajo nivel para los componentes principales del sistema en las arquitecturas GPU.
Otra alternativa abordada en esta tesis es la aceleración de los algoritmos de reconstrucción
iterativa mediante el uso de arquitecturas de memoria distribuidas. Presentamos una arquitectura
novedosa que unifica los dos paradigmas informáticos más importantes en la actualidad:
computación de alto rendimiento (HPC) y Big Data. La arquitectura propuesta combina sistemas
Big Data con las ventajas de los dispositivos aceleradores.
Los métodos propuestos presentados en esta tesis proporcionan configuraciones de escáner
más flexibles y ofrecen una solución acelerada. En cuanto al rendimiento, nuestro enfoque es tan
competitivo como las soluciones encontradas en la literatura. Además, demostramos que nuestra
solución escala con el tamaño del problema, lo que permite la reconstrucción de imágenes de
alta resolución.This work has been mainly funded thanks to a FPU fellowship (FPU14/03875) from the Spanish
Ministry of Education.
It has also been partially supported by other grants:
• DPI2016-79075-R. “Nuevos escenarios de tomografía por rayos X”, from the Spanish Ministry
of Economy and Competitiveness.
• TIN2016-79637-P Towards unification of HPC and Big Data Paradigms from the Spanish
Ministry of Economy and Competitiveness.
• Short-term scientific missions (STSM) grant from NESUS COST Action IC1305.
• TIN2013-41350-P, Scalable Data Management Techniques for High-End Computing Systems
from the Spanish Ministry of Economy and Competitiveness.
• RTC-2014-3028-1 NECRA Nuevos escenarios clinicos con radiología avanzada from the
Spanish Ministry of Economy and Competitiveness.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: José Daniel García Sánchez.- Secretario: Katzlin Olcoz Herrero.- Vocal: Domenico Tali
Toward a formal theory for computing machines made out of whatever physics offers: extended version
Approaching limitations of digital computing technologies have spurred
research in neuromorphic and other unconventional approaches to computing. Here
we argue that if we want to systematically engineer computing systems that are
based on unconventional physical effects, we need guidance from a formal theory
that is different from the symbolic-algorithmic theory of today's computer
science textbooks. We propose a general strategy for developing such a theory,
and within that general view, a specific approach that we call "fluent
computing". In contrast to Turing, who modeled computing processes from a
top-down perspective as symbolic reasoning, we adopt the scientific paradigm of
physics and model physical computing systems bottom-up by formalizing what can
ultimately be measured in any physical substrate. This leads to an
understanding of computing as the structuring of processes, while classical
models of computing systems describe the processing of structures.Comment: 76 pages. This is an extended version of a perspective article with
the same title that will appear in Nature Communications soon after this
manuscript goes public on arxi
Enabling aggressive compiler optimization for the mobile environment
Aggressive code optimization on the mobile environment is a difficult endeavor. Billions of users rely on mobile devices for their daily computing tasks. Yet, they mostly run poorly optimized code, under-utilizing their already limited processing and energy resources. Existing optimization approaches, like iterative compilation, perform well in other domains but fall short on the mobile environment. They either rely on representative inputs that are hard to reconstruct, or expose users to slowdowns and crashes.
An ideal solution must be able to perform an optimization search by repeatedly evaluating different optimization decisions on the same input. That input should be representative of actual user usage without requiring developers to artificially create it. Finally, users should never be exposed to slow or crashing evaluations, a quite common side-effect of iterative compilation. This thesis presents a novel approach with all above in mind, bringing aggressive code optimization to the mobile environment.
With a transparent capture mechanism, real user inputs can be stored. This mechanism is infrequently invoked and remains unnoticeable from the users. A single capture is enough to enable offline, input-driven code optimization. It supports C functions as well as code regions of interactive Android applications. It allows controlling the timing and frequency of captures, it bails out on imminent high-impact runtime events, and excludes from captures some immutable data.
A replay-based evaluation mechanism is able to repeatedly restore a captured input while changing the underlying code. For C programs, it employs compile and link-time strategies to consistently work despite code transformations. For Android apps, a novel mechanism was developed, able to replay using different code types. These are the original Android-compiled code, interpretation, and LLVM-generated code. Additionally, it works well even in the presence of memory-shuffling security mechanisms.
Capture and replay is fused into an iterative compilation system that uses offline, replay-based evaluations. Initially, real inputs are captured online, without noticeably affecting the users. For C and interactive apps, captures required on average 2ms and 15ms respectively. Then, an optimization search is performed by repeatedly replaying the inputs using different code transformations. As this happens offline, any crashing or erroneous executions are not affecting the users. C programs became 29% faster using a random search, while interactive apps became 44% faster using a genetic algorithm and a novel Android backend based on LLVM. Finally, with crowd-sourcing, the offline evaluation effort was significantly accelerated. Specifically, for the user with the highest workload the search accelerated by 7 times
GPU Computing for Cognitive Robotics
This thesis presents the first investigation of the impact of GPU
computing on cognitive robotics by providing a series of novel experiments in
the area of action and language acquisition in humanoid robots and computer
vision. Cognitive robotics is concerned with endowing robots with high-level
cognitive capabilities to enable the achievement of complex goals in complex
environments. Reaching the ultimate goal of developing cognitive robots will
require tremendous amounts of computational power, which was until
recently provided mostly by standard CPU processors. CPU cores are
optimised for serial code execution at the expense of parallel execution, which
renders them relatively inefficient when it comes to high-performance
computing applications. The ever-increasing market demand for
high-performance, real-time 3D graphics has evolved the GPU into a highly
parallel, multithreaded, many-core processor extraordinary computational
power and very high memory bandwidth. These vast computational resources
of modern GPUs can now be used by the most of the cognitive robotics models
as they tend to be inherently parallel. Various interesting and insightful
cognitive models were developed and addressed important scientific questions
concerning action-language acquisition and computer vision. While they have
provided us with important scientific insights, their complexity and
application has not improved much over the last years. The experimental
tasks as well as the scale of these models are often minimised to avoid
excessive training times that grow exponentially with the number of neurons
and the training data. This impedes further progress and development of
complex neurocontrollers that would be able to take the cognitive robotics
research a step closer to reaching the ultimate goal of creating intelligent
machines. This thesis presents several cases where the application of the GPU
computing on cognitive robotics algorithms resulted in the development of
large-scale neurocontrollers of previously unseen complexity enabling the
conducting of the novel experiments described herein.European Commission Seventh Framework
Programm
- …