Pulse-coupled neural network performance for real-time identification of vegetation during forced landing by Hayward, Ross F. et al.
Introduction
PCNN Implementation
Performance
Conclusion
Pulse Coupled Neural Network Performance for
Real-Time Identification of Vegetation During
Forced Landing
R. F. Hayward1, D. J. Warne1,2, N. A. Kelson2, J. E. Banks1
and L. Mejias3
1School of Electrical Engineering and Computer Science
2High Performance Computing and Research Support
3Australian Research Center for Aerospace Automation
December 3, 2013
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Outline
1 Introduction
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
2 PCNN Implementation
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
3 Performance
Compute Performance
Power Efficiency
Power Consumption
4 Conclusion
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
UAV Emergency Landing
It is inevitable that UAVs (Unmanned Aerial
Vehicles) will encounter emergency
situations [6, 5].
Hardware/software failure.
Lost communication link.
Bad Weather.
Emergency situations may require an automomous forced landing.
Safe navigation of UAV to landing site [6].
Identification of safe landing site [5].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
UAV Emergency Landing
It is inevitable that UAVs (Unmanned Aerial
Vehicles) will encounter emergency
situations [6, 5].
Hardware/software failure.
Lost communication link.
Bad Weather.
Emergency situations may require an automomous forced landing.
Safe navigation of UAV to landing site [6].
Identification of safe landing site [5].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
UAV Emergency Landing
It is inevitable that UAVs (Unmanned Aerial
Vehicles) will encounter emergency
situations [6, 5].
Hardware/software failure.
Lost communication link.
Bad Weather.
Emergency situations may require an automomous forced landing.
Safe navigation of UAV to landing site [6].
Identification of safe landing site [5].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
UAV Emergency Landing
It is inevitable that UAVs (Unmanned Aerial
Vehicles) will encounter emergency
situations [6, 5].
Hardware/software failure.
Lost communication link.
Bad Weather.
Emergency situations may require an automomous forced landing.
Safe navigation of UAV to landing site [6].
Identification of safe landing site [5].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
UAV Emergency Landing
It is inevitable that UAVs (Unmanned Aerial
Vehicles) will encounter emergency
situations [6, 5].
Hardware/software failure.
Lost communication link.
Bad Weather.
Emergency situations may require an automomous forced landing.
Safe navigation of UAV to landing site [6].
Identification of safe landing site [5].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
UAV Emergency Landing
It is inevitable that UAVs (Unmanned Aerial
Vehicles) will encounter emergency
situations [6, 5].
Hardware/software failure.
Lost communication link.
Bad Weather.
Emergency situations may require an automomous forced landing.
Safe navigation of UAV to landing site [6].
Identification of safe landing site [5].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Landing Site Selection
A suitable landing site should be selected such that the following
are minimised [5]:
Personal injury,
Infrastructure and property damage,
UAV damage.
Properties of a good landing site would be [5]:
Region clear of trees and
dense vegetation,
Region clear of buildings,
roads, and people,
Region with smooth flat
terrain.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Landing Site Selection
A suitable landing site should be selected such that the following
are minimised [5]:
Personal injury,
Infrastructure and property damage,
UAV damage.
Properties of a good landing site would be [5]:
Region clear of trees and
dense vegetation,
Region clear of buildings,
roads, and people,
Region with smooth flat
terrain.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Landing Site Selection
A suitable landing site should be selected such that the following
are minimised [5]:
Personal injury,
Infrastructure and property damage,
UAV damage.
Properties of a good landing site would be [5]:
Region clear of trees and
dense vegetation,
Region clear of buildings,
roads, and people,
Region with smooth flat
terrain.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Landing Site Selection
A suitable landing site should be selected such that the following
are minimised [5]:
Personal injury,
Infrastructure and property damage,
UAV damage.
Properties of a good landing site would be [5]:
Region clear of trees and
dense vegetation,
Region clear of buildings,
roads, and people,
Region with smooth flat
terrain.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Landing Site Selection
A suitable landing site should be selected such that the following
are minimised [5]:
Personal injury,
Infrastructure and property damage,
UAV damage.
Properties of a good landing site would be [5]:
Region clear of trees and
dense vegetation,
Region clear of buildings,
roads, and people,
Region with smooth flat
terrain.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Landing Site Selection
A suitable landing site should be selected such that the following
are minimised [5]:
Personal injury,
Infrastructure and property damage,
UAV damage.
Properties of a good landing site would be [5]:
Region clear of trees and
dense vegetation,
Region clear of buildings,
roads, and people,
Region with smooth flat
terrain.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Landing Site Selection
A suitable landing site should be selected such that the following
are minimised [5]:
Personal injury,
Infrastructure and property damage,
UAV damage.
Properties of a good landing site would be [5]:
Region clear of trees and
dense vegetation,
Region clear of buildings,
roads, and people,
Region with smooth flat
terrain.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Requirements
A method for landing site identification should:
not rely on a communication link.
be reliable and accurate.
be computationally efficient.
be feasibly implemented in an embedded environment.
Our current approach is based on work by Li et al. [4].
Texture classification of aerial images,
focusing on vegetation identification,
using pulse coupled neural networks.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Requirements
A method for landing site identification should:
not rely on a communication link.
be reliable and accurate.
be computationally efficient.
be feasibly implemented in an embedded environment.
Our current approach is based on work by Li et al. [4].
Texture classification of aerial images,
focusing on vegetation identification,
using pulse coupled neural networks.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Requirements
A method for landing site identification should:
not rely on a communication link.
be reliable and accurate.
be computationally efficient.
be feasibly implemented in an embedded environment.
Our current approach is based on work by Li et al. [4].
Texture classification of aerial images,
focusing on vegetation identification,
using pulse coupled neural networks.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Requirements
A method for landing site identification should:
not rely on a communication link.
be reliable and accurate.
be computationally efficient.
be feasibly implemented in an embedded environment.
Our current approach is based on work by Li et al. [4].
Texture classification of aerial images,
focusing on vegetation identification,
using pulse coupled neural networks.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Requirements
A method for landing site identification should:
not rely on a communication link.
be reliable and accurate.
be computationally efficient.
be feasibly implemented in an embedded environment.
Our current approach is based on work by Li et al. [4].
Texture classification of aerial images,
focusing on vegetation identification,
using pulse coupled neural networks.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Requirements
A method for landing site identification should:
not rely on a communication link.
be reliable and accurate.
be computationally efficient.
be feasibly implemented in an embedded environment.
Our current approach is based on work by Li et al. [4].
Texture classification of aerial images,
focusing on vegetation identification,
using pulse coupled neural networks.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Requirements
A method for landing site identification should:
not rely on a communication link.
be reliable and accurate.
be computationally efficient.
be feasibly implemented in an embedded environment.
Our current approach is based on work by Li et al. [4].
Texture classification of aerial images,
focusing on vegetation identification,
using pulse coupled neural networks.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Requirements
A method for landing site identification should:
not rely on a communication link.
be reliable and accurate.
be computationally efficient.
be feasibly implemented in an embedded environment.
Our current approach is based on work by Li et al. [4].
Texture classification of aerial images,
focusing on vegetation identification,
using pulse coupled neural networks.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Pulse Coupled Neural Networks (PCNN)
A PCNN is mathematical model inspired by the visual cortex [1],
A 2d array of neurons,
laterally connected
integrate-and-fire neurons,
one to one mapping from
image pixel to neuron
Our approach is based on the simplified unit-linking PCNN
model [3, 2].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Pulse Coupled Neural Networks (PCNN)
A PCNN is mathematical model inspired by the visual cortex [1],
A 2d array of neurons,
laterally connected
integrate-and-fire neurons,
one to one mapping from
image pixel to neuron
Our approach is based on the simplified unit-linking PCNN
model [3, 2].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Pulse Coupled Neural Networks (PCNN)
A PCNN is mathematical model inspired by the visual cortex [1],
A 2d array of neurons,
laterally connected
integrate-and-fire neurons,
one to one mapping from
image pixel to neuron
Our approach is based on the simplified unit-linking PCNN
model [3, 2].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Pulse Coupled Neural Networks (PCNN)
A PCNN is mathematical model inspired by the visual cortex [1],
A 2d array of neurons,
laterally connected
integrate-and-fire neurons,
one to one mapping from
image pixel to neuron
Our approach is based on the simplified unit-linking PCNN
model [3, 2].
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Unit-Linking PCNN
Definition
An given an Image I with K components, a unit linking pulse
coupled neural network is given by,
Input Feed: F ti ,j =
K∑
k=0
wk I
t
i ,j ,k
Link Feed: Lti ,j =
{
1, if
∑
(x ,y)∈N(i ,j) Y
t−1
x ,y
0, otherwise
Input Modulation: Uti ,j =
(
1 + βLti ,j
)
F ti ,j
Pulse Generator: Y ti ,j =
{
1, ifUti ,j > θ
t
i ,j
0, otherwise
Dynamic Threshold: θti ,j = θ
t−1
i ,j − α + VY t−1i ,j
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Texture Classification with PCNN
Texture signatures can constructed using the pulse spectral
frequency
Definition
PSF t =
Nt
Nmax
Nt =
∑
i ,j
Y ti ,j
Nmax = maxN
0,N1, · · · ,NT
Unfortunately, computing many PCNN time-steps is
computationally expensive for embedded systems.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous Computing
Computation using multiple devices, each with a unique
architecture.
Multi-core processors
Graphical Processing Units
Digital Signal Processors
Many-core co-processors (Inter Xeon-Phi)
Application Specific Integrated Circuits
Field Programmable Gate Arrays
Within an embedded environment, power efficiency (ops/Watt)
and power consumption is of particular importance.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous Computing
Computation using multiple devices, each with a unique
architecture.
Multi-core processors
Graphical Processing Units
Digital Signal Processors
Many-core co-processors (Inter Xeon-Phi)
Application Specific Integrated Circuits
Field Programmable Gate Arrays
Within an embedded environment, power efficiency (ops/Watt)
and power consumption is of particular importance.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous Computing
Computation using multiple devices, each with a unique
architecture.
Multi-core processors
Graphical Processing Units
Digital Signal Processors
Many-core co-processors (Inter Xeon-Phi)
Application Specific Integrated Circuits
Field Programmable Gate Arrays
Within an embedded environment, power efficiency (ops/Watt)
and power consumption is of particular importance.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous Computing
Computation using multiple devices, each with a unique
architecture.
Multi-core processors
Graphical Processing Units
Digital Signal Processors
Many-core co-processors (Inter Xeon-Phi)
Application Specific Integrated Circuits
Field Programmable Gate Arrays
Within an embedded environment, power efficiency (ops/Watt)
and power consumption is of particular importance.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous Computing
Computation using multiple devices, each with a unique
architecture.
Multi-core processors
Graphical Processing Units
Digital Signal Processors
Many-core co-processors (Inter Xeon-Phi)
Application Specific Integrated Circuits
Field Programmable Gate Arrays
Within an embedded environment, power efficiency (ops/Watt)
and power consumption is of particular importance.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous Computing
Computation using multiple devices, each with a unique
architecture.
Multi-core processors
Graphical Processing Units
Digital Signal Processors
Many-core co-processors (Inter Xeon-Phi)
Application Specific Integrated Circuits
Field Programmable Gate Arrays
Within an embedded environment, power efficiency (ops/Watt)
and power consumption is of particular importance.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous Computing
Computation using multiple devices, each with a unique
architecture.
Multi-core processors
Graphical Processing Units
Digital Signal Processors
Many-core co-processors (Inter Xeon-Phi)
Application Specific Integrated Circuits
Field Programmable Gate Arrays
Within an embedded environment, power efficiency (ops/Watt)
and power consumption is of particular importance.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
CPU cores Vs GPU Cores
The most common compute devices used in heterogeneous are
CPUs and GPUs.
Common statement “CPUs have only 4-8 cores, but a GPU
has 100’s of cores”.
Important! CPU core 6= GPU core.
The Fermi architecture has up to 16 streaming
multi-processors, each executing 32 threads using SIMD
(Single Instruction Multiple Data) execution model (512
GFLOPS/sec).
An Intel Sandy-bridge CPU core has two 256-bit wide vector
units (SIMD) which can each process four 64-bit
FLOPS/cycle (160 GFLOPS/sec for the 8 core E5-2670).
A GPU streaming multi-processor ≈ CPU core.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
CPU cores Vs GPU Cores
The most common compute devices used in heterogeneous are
CPUs and GPUs.
Common statement “CPUs have only 4-8 cores, but a GPU
has 100’s of cores”.
Important! CPU core 6= GPU core.
The Fermi architecture has up to 16 streaming
multi-processors, each executing 32 threads using SIMD
(Single Instruction Multiple Data) execution model (512
GFLOPS/sec).
An Intel Sandy-bridge CPU core has two 256-bit wide vector
units (SIMD) which can each process four 64-bit
FLOPS/cycle (160 GFLOPS/sec for the 8 core E5-2670).
A GPU streaming multi-processor ≈ CPU core.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
CPU cores Vs GPU Cores
The most common compute devices used in heterogeneous are
CPUs and GPUs.
Common statement “CPUs have only 4-8 cores, but a GPU
has 100’s of cores”.
Important! CPU core 6= GPU core.
The Fermi architecture has up to 16 streaming
multi-processors, each executing 32 threads using SIMD
(Single Instruction Multiple Data) execution model (512
GFLOPS/sec).
An Intel Sandy-bridge CPU core has two 256-bit wide vector
units (SIMD) which can each process four 64-bit
FLOPS/cycle (160 GFLOPS/sec for the 8 core E5-2670).
A GPU streaming multi-processor ≈ CPU core.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
CPU cores Vs GPU Cores
The most common compute devices used in heterogeneous are
CPUs and GPUs.
Common statement “CPUs have only 4-8 cores, but a GPU
has 100’s of cores”.
Important! CPU core 6= GPU core.
The Fermi architecture has up to 16 streaming
multi-processors, each executing 32 threads using SIMD
(Single Instruction Multiple Data) execution model (512
GFLOPS/sec).
An Intel Sandy-bridge CPU core has two 256-bit wide vector
units (SIMD) which can each process four 64-bit
FLOPS/cycle (160 GFLOPS/sec for the 8 core E5-2670).
A GPU streaming multi-processor ≈ CPU core.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
CPU cores Vs GPU Cores
The most common compute devices used in heterogeneous are
CPUs and GPUs.
Common statement “CPUs have only 4-8 cores, but a GPU
has 100’s of cores”.
Important! CPU core 6= GPU core.
The Fermi architecture has up to 16 streaming
multi-processors, each executing 32 threads using SIMD
(Single Instruction Multiple Data) execution model (512
GFLOPS/sec).
An Intel Sandy-bridge CPU core has two 256-bit wide vector
units (SIMD) which can each process four 64-bit
FLOPS/cycle (160 GFLOPS/sec for the 8 core E5-2670).
A GPU streaming multi-processor ≈ CPU core.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
CPU cores Vs GPU Cores
The most common compute devices used in heterogeneous are
CPUs and GPUs.
Common statement “CPUs have only 4-8 cores, but a GPU
has 100’s of cores”.
Important! CPU core 6= GPU core.
The Fermi architecture has up to 16 streaming
multi-processors, each executing 32 threads using SIMD
(Single Instruction Multiple Data) execution model (512
GFLOPS/sec).
An Intel Sandy-bridge CPU core has two 256-bit wide vector
units (SIMD) which can each process four 64-bit
FLOPS/cycle (160 GFLOPS/sec for the 8 core E5-2670).
A GPU streaming multi-processor ≈ CPU core.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Field-programmable Gate Arrays
FPGAs are reconfigurable digital
circuits,
Lookup tables,
Block RAM,
programmable interconnects.
A Blank Slate,
Reconfigurable at program
run-time.
BYO Architecture.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Field-programmable Gate Arrays
FPGAs are reconfigurable digital
circuits,
Lookup tables,
Block RAM,
programmable interconnects.
A Blank Slate,
Reconfigurable at program
run-time.
BYO Architecture.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Field-programmable Gate Arrays
FPGAs are reconfigurable digital
circuits,
Lookup tables,
Block RAM,
programmable interconnects.
A Blank Slate,
Reconfigurable at program
run-time.
BYO Architecture.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Field-programmable Gate Arrays
FPGAs are reconfigurable digital
circuits,
Lookup tables,
Block RAM,
programmable interconnects.
A Blank Slate,
Reconfigurable at program
run-time.
BYO Architecture.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Field-programmable Gate Arrays
FPGAs are reconfigurable digital
circuits,
Lookup tables,
Block RAM,
programmable interconnects.
A Blank Slate,
Reconfigurable at program
run-time.
BYO Architecture.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Field-programmable Gate Arrays
FPGAs are reconfigurable digital
circuits,
Lookup tables,
Block RAM,
programmable interconnects.
A Blank Slate,
Reconfigurable at program
run-time.
BYO Architecture.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous computing with FPGAs
Traditionally, co-processing with FPGAs involved,
Designing a digital circuit for compute intensive task (using
HDL).
Define data transfer model between Host/FPGA using
low-level protocols (e.g., PCIe).
In some cases write a device driver.
Designer must deal with,
Register transfer logic (RTL),
clock-by-clock synchronisation,
timing closures.
However, Altera recently released an OpenCL software
development kit (SDK) for their Stratix V series FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous computing with FPGAs
Traditionally, co-processing with FPGAs involved,
Designing a digital circuit for compute intensive task (using
HDL).
Define data transfer model between Host/FPGA using
low-level protocols (e.g., PCIe).
In some cases write a device driver.
Designer must deal with,
Register transfer logic (RTL),
clock-by-clock synchronisation,
timing closures.
However, Altera recently released an OpenCL software
development kit (SDK) for their Stratix V series FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous computing with FPGAs
Traditionally, co-processing with FPGAs involved,
Designing a digital circuit for compute intensive task (using
HDL).
Define data transfer model between Host/FPGA using
low-level protocols (e.g., PCIe).
In some cases write a device driver.
Designer must deal with,
Register transfer logic (RTL),
clock-by-clock synchronisation,
timing closures.
However, Altera recently released an OpenCL software
development kit (SDK) for their Stratix V series FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous computing with FPGAs
Traditionally, co-processing with FPGAs involved,
Designing a digital circuit for compute intensive task (using
HDL).
Define data transfer model between Host/FPGA using
low-level protocols (e.g., PCIe).
In some cases write a device driver.
Designer must deal with,
Register transfer logic (RTL),
clock-by-clock synchronisation,
timing closures.
However, Altera recently released an OpenCL software
development kit (SDK) for their Stratix V series FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous computing with FPGAs
Traditionally, co-processing with FPGAs involved,
Designing a digital circuit for compute intensive task (using
HDL).
Define data transfer model between Host/FPGA using
low-level protocols (e.g., PCIe).
In some cases write a device driver.
Designer must deal with,
Register transfer logic (RTL),
clock-by-clock synchronisation,
timing closures.
However, Altera recently released an OpenCL software
development kit (SDK) for their Stratix V series FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous computing with FPGAs
Traditionally, co-processing with FPGAs involved,
Designing a digital circuit for compute intensive task (using
HDL).
Define data transfer model between Host/FPGA using
low-level protocols (e.g., PCIe).
In some cases write a device driver.
Designer must deal with,
Register transfer logic (RTL),
clock-by-clock synchronisation,
timing closures.
However, Altera recently released an OpenCL software
development kit (SDK) for their Stratix V series FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Heterogeneous computing with FPGAs
Traditionally, co-processing with FPGAs involved,
Designing a digital circuit for compute intensive task (using
HDL).
Define data transfer model between Host/FPGA using
low-level protocols (e.g., PCIe).
In some cases write a device driver.
Designer must deal with,
Register transfer logic (RTL),
clock-by-clock synchronisation,
timing closures.
However, Altera recently released an OpenCL software
development kit (SDK) for their Stratix V series FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Open Compute Language (OpenCL)
OpenCL is an open standard for program development within a
heterogeneous computing platform.
A device side high level language specification.
A host side application programming interface
The standard is designed to be portable across many co-processor
architectures,
CPUs,GPUs, DSPs
An now FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Open Compute Language (OpenCL)
OpenCL is an open standard for program development within a
heterogeneous computing platform.
A device side high level language specification.
A host side application programming interface
The standard is designed to be portable across many co-processor
architectures,
CPUs,GPUs, DSPs
An now FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Open Compute Language (OpenCL)
OpenCL is an open standard for program development within a
heterogeneous computing platform.
A device side high level language specification.
A host side application programming interface
The standard is designed to be portable across many co-processor
architectures,
CPUs,GPUs, DSPs
An now FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Open Compute Language (OpenCL)
OpenCL is an open standard for program development within a
heterogeneous computing platform.
A device side high level language specification.
A host side application programming interface
The standard is designed to be portable across many co-processor
architectures,
CPUs,GPUs, DSPs
An now FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
Open Compute Language (OpenCL)
OpenCL is an open standard for program development within a
heterogeneous computing platform.
A device side high level language specification.
A host side application programming interface
The standard is designed to be portable across many co-processor
architectures,
CPUs,GPUs, DSPs
An now FPGAs.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
FPGAs and OpenCL
OpenCL for FPGAs provides many new opportunities,
No need to deal with RTL level and timing closure.
Hardware/Software co-design within a common high level
language.
Design portability to other devices.
Direct comparison of CPU,GPU, and FPGA solutions.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
FPGAs and OpenCL
OpenCL for FPGAs provides many new opportunities,
No need to deal with RTL level and timing closure.
Hardware/Software co-design within a common high level
language.
Design portability to other devices.
Direct comparison of CPU,GPU, and FPGA solutions.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
FPGAs and OpenCL
OpenCL for FPGAs provides many new opportunities,
No need to deal with RTL level and timing closure.
Hardware/Software co-design within a common high level
language.
Design portability to other devices.
Direct comparison of CPU,GPU, and FPGA solutions.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
FPGAs and OpenCL
OpenCL for FPGAs provides many new opportunities,
No need to deal with RTL level and timing closure.
Hardware/Software co-design within a common high level
language.
Design portability to other devices.
Direct comparison of CPU,GPU, and FPGA solutions.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Force Landing of UAVs
Pulse Coupled Neural Networks
Heterogeneous Computing and OpenCL
FPGAs and OpenCL
OpenCL for FPGAs provides many new opportunities,
No need to deal with RTL level and timing closure.
Hardware/Software co-design within a common high level
language.
Design portability to other devices.
Direct comparison of CPU,GPU, and FPGA solutions.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Parallelisation of the PCNN
A single iteration of the PCNN is embarrassingly parallel, but not
across iterations
Each neuron can determine if it fires in isolation.
Must wait for neighbours to complete before next iteration.
Mapping to OpenCL execution model
An OpenCL kernel function defines the execution of a single
work item (or threads for CUDA programmers).
A Work Group is a collection of work items which shared the
same local memory, these execute in a SIMD fashion.
The neuron grid is partitioned into manageable work groups
which can work together to minimise global memory access.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Parallelisation of the PCNN
A single iteration of the PCNN is embarrassingly parallel, but not
across iterations
Each neuron can determine if it fires in isolation.
Must wait for neighbours to complete before next iteration.
Mapping to OpenCL execution model
An OpenCL kernel function defines the execution of a single
work item (or threads for CUDA programmers).
A Work Group is a collection of work items which shared the
same local memory, these execute in a SIMD fashion.
The neuron grid is partitioned into manageable work groups
which can work together to minimise global memory access.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Parallelisation of the PCNN
A single iteration of the PCNN is embarrassingly parallel, but not
across iterations
Each neuron can determine if it fires in isolation.
Must wait for neighbours to complete before next iteration.
Mapping to OpenCL execution model
An OpenCL kernel function defines the execution of a single
work item (or threads for CUDA programmers).
A Work Group is a collection of work items which shared the
same local memory, these execute in a SIMD fashion.
The neuron grid is partitioned into manageable work groups
which can work together to minimise global memory access.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Parallelisation of the PCNN
A single iteration of the PCNN is embarrassingly parallel, but not
across iterations
Each neuron can determine if it fires in isolation.
Must wait for neighbours to complete before next iteration.
Mapping to OpenCL execution model
An OpenCL kernel function defines the execution of a single
work item (or threads for CUDA programmers).
A Work Group is a collection of work items which shared the
same local memory, these execute in a SIMD fashion.
The neuron grid is partitioned into manageable work groups
which can work together to minimise global memory access.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Parallelisation of the PCNN
A single iteration of the PCNN is embarrassingly parallel, but not
across iterations
Each neuron can determine if it fires in isolation.
Must wait for neighbours to complete before next iteration.
Mapping to OpenCL execution model
An OpenCL kernel function defines the execution of a single
work item (or threads for CUDA programmers).
A Work Group is a collection of work items which shared the
same local memory, these execute in a SIMD fashion.
The neuron grid is partitioned into manageable work groups
which can work together to minimise global memory access.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Parallelisation of the PCNN
A single iteration of the PCNN is embarrassingly parallel, but not
across iterations
Each neuron can determine if it fires in isolation.
Must wait for neighbours to complete before next iteration.
Mapping to OpenCL execution model
An OpenCL kernel function defines the execution of a single
work item (or threads for CUDA programmers).
A Work Group is a collection of work items which shared the
same local memory, these execute in a SIMD fashion.
The neuron grid is partitioned into manageable work groups
which can work together to minimise global memory access.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
GPU vs FPGA kernel compilation
There is an important distinction between GPU and FPGA kernels,
GPU kernel code compiles to microcode which is loaded at
runtime.
The compilation process is different for the FPGA,
OpenCL kernel compiles to an HDL circuit design,
HDL is synthesised into digital primitives like Adders,
Flip-flops, multiplexers, etc...
This result is mapped to hardware LUTs, BRAMs, and DSPs.
Layout area on chip is minimised,
translated into a configuration image to be loaded at program
runtime.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
GPU vs FPGA kernel compilation
There is an important distinction between GPU and FPGA kernels,
GPU kernel code compiles to microcode which is loaded at
runtime.
The compilation process is different for the FPGA,
OpenCL kernel compiles to an HDL circuit design,
HDL is synthesised into digital primitives like Adders,
Flip-flops, multiplexers, etc...
This result is mapped to hardware LUTs, BRAMs, and DSPs.
Layout area on chip is minimised,
translated into a configuration image to be loaded at program
runtime.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
GPU vs FPGA kernel compilation
There is an important distinction between GPU and FPGA kernels,
GPU kernel code compiles to microcode which is loaded at
runtime.
The compilation process is different for the FPGA,
OpenCL kernel compiles to an HDL circuit design,
HDL is synthesised into digital primitives like Adders,
Flip-flops, multiplexers, etc...
This result is mapped to hardware LUTs, BRAMs, and DSPs.
Layout area on chip is minimised,
translated into a configuration image to be loaded at program
runtime.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
GPU vs FPGA kernel compilation
There is an important distinction between GPU and FPGA kernels,
GPU kernel code compiles to microcode which is loaded at
runtime.
The compilation process is different for the FPGA,
OpenCL kernel compiles to an HDL circuit design,
HDL is synthesised into digital primitives like Adders,
Flip-flops, multiplexers, etc...
This result is mapped to hardware LUTs, BRAMs, and DSPs.
Layout area on chip is minimised,
translated into a configuration image to be loaded at program
runtime.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
GPU vs FPGA kernel compilation
There is an important distinction between GPU and FPGA kernels,
GPU kernel code compiles to microcode which is loaded at
runtime.
The compilation process is different for the FPGA,
OpenCL kernel compiles to an HDL circuit design,
HDL is synthesised into digital primitives like Adders,
Flip-flops, multiplexers, etc...
This result is mapped to hardware LUTs, BRAMs, and DSPs.
Layout area on chip is minimised,
translated into a configuration image to be loaded at program
runtime.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
GPU vs FPGA kernel compilation
There is an important distinction between GPU and FPGA kernels,
GPU kernel code compiles to microcode which is loaded at
runtime.
The compilation process is different for the FPGA,
OpenCL kernel compiles to an HDL circuit design,
HDL is synthesised into digital primitives like Adders,
Flip-flops, multiplexers, etc...
This result is mapped to hardware LUTs, BRAMs, and DSPs.
Layout area on chip is minimised,
translated into a configuration image to be loaded at program
runtime.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
GPU vs FPGA kernel compilation
There is an important distinction between GPU and FPGA kernels,
GPU kernel code compiles to microcode which is loaded at
runtime.
The compilation process is different for the FPGA,
OpenCL kernel compiles to an HDL circuit design,
HDL is synthesised into digital primitives like Adders,
Flip-flops, multiplexers, etc...
This result is mapped to hardware LUTs, BRAMs, and DSPs.
Layout area on chip is minimised,
translated into a configuration image to be loaded at program
runtime.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
GPU vs FPGA kernel compilation
There is an important distinction between GPU and FPGA kernels,
GPU kernel code compiles to microcode which is loaded at
runtime.
The compilation process is different for the FPGA,
OpenCL kernel compiles to an HDL circuit design,
HDL is synthesised into digital primitives like Adders,
Flip-flops, multiplexers, etc...
This result is mapped to hardware LUTs, BRAMs, and DSPs.
Layout area on chip is minimised,
translated into a configuration image to be loaded at program
runtime.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Neuron Architecture
The resulting FPGA circuit for a single neuron,
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
PCNN Work Group
Neuron work items are grouped into a 16× 16 array to form a
single work unit.
A work group of neurons will be processed concurrently on a
harware compute unit.
Inputs for all neurons are loaded to common BRAM on
execution to avoid SRAM latency.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
PCNN Work Group
Neuron work items are grouped into a 16× 16 array to form a
single work unit.
A work group of neurons will be processed concurrently on a
harware compute unit.
Inputs for all neurons are loaded to common BRAM on
execution to avoid SRAM latency.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
PCNN Work Group
Neuron work items are grouped into a 16× 16 array to form a
single work unit.
A work group of neurons will be processed concurrently on a
harware compute unit.
Inputs for all neurons are loaded to common BRAM on
execution to avoid SRAM latency.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Global Architecture
The Stratix V can host six of our PCNN compute units,
16× 16× 6 = 1, 536 hardware neuron processors.
Theoretically, ≈ 700 Million neuron firings per-second.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Global Architecture
The Stratix V can host six of our PCNN compute units,
16× 16× 6 = 1, 536 hardware neuron processors.
Theoretically, ≈ 700 Million neuron firings per-second.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Global Architecture
The Stratix V can host six of our PCNN compute units,
16× 16× 6 = 1, 536 hardware neuron processors.
Theoretically, ≈ 700 Million neuron firings per-second.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Overview
FPGA Neuron Architecture
PCNN Work Group
Logic Utilisation
Logic Utilisation
Resource Utilisation
Logic fabric 92%
Dedicated Registers 47%
Block RAM 100%
DSP Blocks 8%
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Compute Performance
Power Efficiency
Power Consumption
Execution Time
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Compute Performance
Power Efficiency
Power Consumption
Efficiency Neuron firings/Watt
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Compute Performance
Power Efficiency
Power Consumption
Power Consumption
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Conclusion
Working toward fully automated UAV forced landings,
Selection of landing sites for UAV forced landing.
Vegetation classified using PCNN.
Traditional approaches to PCNN require desktop CPU
performance,
Not feasible for low powered embedded devices.
Though developement of an OpenCL PCNN we have
Implemented an FPGA PCNN co-processor using Altera’s
OpenCL SDK.
Comparable with optimised CPU code in raw performance
Comparable power efficiency vs GPU (≈ 4x CPU).
Power requirement is 40% of CPU’s and 25% of GPU’s.
Using an FPGA co-processor, PCNN can be feasibly applied to the
landing site identification problem.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Conclusion
Working toward fully automated UAV forced landings,
Selection of landing sites for UAV forced landing.
Vegetation classified using PCNN.
Traditional approaches to PCNN require desktop CPU
performance,
Not feasible for low powered embedded devices.
Though developement of an OpenCL PCNN we have
Implemented an FPGA PCNN co-processor using Altera’s
OpenCL SDK.
Comparable with optimised CPU code in raw performance
Comparable power efficiency vs GPU (≈ 4x CPU).
Power requirement is 40% of CPU’s and 25% of GPU’s.
Using an FPGA co-processor, PCNN can be feasibly applied to the
landing site identification problem.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Conclusion
Working toward fully automated UAV forced landings,
Selection of landing sites for UAV forced landing.
Vegetation classified using PCNN.
Traditional approaches to PCNN require desktop CPU
performance,
Not feasible for low powered embedded devices.
Though developement of an OpenCL PCNN we have
Implemented an FPGA PCNN co-processor using Altera’s
OpenCL SDK.
Comparable with optimised CPU code in raw performance
Comparable power efficiency vs GPU (≈ 4x CPU).
Power requirement is 40% of CPU’s and 25% of GPU’s.
Using an FPGA co-processor, PCNN can be feasibly applied to the
landing site identification problem.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Conclusion
Working toward fully automated UAV forced landings,
Selection of landing sites for UAV forced landing.
Vegetation classified using PCNN.
Traditional approaches to PCNN require desktop CPU
performance,
Not feasible for low powered embedded devices.
Though developement of an OpenCL PCNN we have
Implemented an FPGA PCNN co-processor using Altera’s
OpenCL SDK.
Comparable with optimised CPU code in raw performance
Comparable power efficiency vs GPU (≈ 4x CPU).
Power requirement is 40% of CPU’s and 25% of GPU’s.
Using an FPGA co-processor, PCNN can be feasibly applied to the
landing site identification problem.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Conclusion
Working toward fully automated UAV forced landings,
Selection of landing sites for UAV forced landing.
Vegetation classified using PCNN.
Traditional approaches to PCNN require desktop CPU
performance,
Not feasible for low powered embedded devices.
Though developement of an OpenCL PCNN we have
Implemented an FPGA PCNN co-processor using Altera’s
OpenCL SDK.
Comparable with optimised CPU code in raw performance
Comparable power efficiency vs GPU (≈ 4x CPU).
Power requirement is 40% of CPU’s and 25% of GPU’s.
Using an FPGA co-processor, PCNN can be feasibly applied to the
landing site identification problem.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Conclusion
Working toward fully automated UAV forced landings,
Selection of landing sites for UAV forced landing.
Vegetation classified using PCNN.
Traditional approaches to PCNN require desktop CPU
performance,
Not feasible for low powered embedded devices.
Though developement of an OpenCL PCNN we have
Implemented an FPGA PCNN co-processor using Altera’s
OpenCL SDK.
Comparable with optimised CPU code in raw performance
Comparable power efficiency vs GPU (≈ 4x CPU).
Power requirement is 40% of CPU’s and 25% of GPU’s.
Using an FPGA co-processor, PCNN can be feasibly applied to the
landing site identification problem.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Conclusion
Working toward fully automated UAV forced landings,
Selection of landing sites for UAV forced landing.
Vegetation classified using PCNN.
Traditional approaches to PCNN require desktop CPU
performance,
Not feasible for low powered embedded devices.
Though developement of an OpenCL PCNN we have
Implemented an FPGA PCNN co-processor using Altera’s
OpenCL SDK.
Comparable with optimised CPU code in raw performance
Comparable power efficiency vs GPU (≈ 4x CPU).
Power requirement is 40% of CPU’s and 25% of GPU’s.
Using an FPGA co-processor, PCNN can be feasibly applied to the
landing site identification problem.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Introduction
PCNN Implementation
Performance
Conclusion
Conclusion
Working toward fully automated UAV forced landings,
Selection of landing sites for UAV forced landing.
Vegetation classified using PCNN.
Traditional approaches to PCNN require desktop CPU
performance,
Not feasible for low powered embedded devices.
Though developement of an OpenCL PCNN we have
Implemented an FPGA PCNN co-processor using Altera’s
OpenCL SDK.
Comparable with optimised CPU code in raw performance
Comparable power efficiency vs GPU (≈ 4x CPU).
Power requirement is 40% of CPU’s and 25% of GPU’s.
Using an FPGA co-processor, PCNN can be feasibly applied to the
landing site identification problem.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
Thank-you!
Introduction
PCNN Implementation
Performance
Conclusion
[1] R. Eckhorn, H. J. Reiboeck, M. Arndt, and P. W. Dicke.
A neural network for feature linking via synchronous activity:
Results from a cat visual cortex and from simulations.
[2] X. Gu, Y. Fang, and Y. Wang.
Attention selection using global topological properties based
on pulse coupled neural network.
Computer Vision and Image Understanding, 117:1400–1411,
2013.
[3] X. Gu, L. Zhang, and D. Yu.
General design approach to unit-linking pcnn for image
processing.
In Proceedings of International Joint Conference on Neural
Networks, pages 1837–1841, 2005.
[4] Z. Li, R. F. Hayward, R. A. Walker, and Y. Liu.
A biologically inspired object spectral-texture descriptor and
its application to vegetation classification in power-line
corridors.
IEEE Geoscience and Remote Sensing Letters, 8(4):631–635,
2011.
[5] A. Lu, W. Ding, J. Wang, and H. Li.
Automonmous vision-based safe area selection algorithm for
uav emergency forced landing.
In International Conference on Information and Computer
Applications 2012, pages 254–261, 2012.
[6] L. Mejias and P. Eng.
Controlled emergency landing of an unpowered unmanned
aerial system.
Journal of Intelligent and Robotic Systems, 70:421–435, 2013.
R. F. Hayward et al. PCNN Performance for Real-Time Identification
