37 research outputs found
Pedestrian detection with high-resolution event camera
Despite the dynamic development of computer vision algorithms, the
implementation of perception and control systems for autonomous vehicles such
as drones and self-driving cars still poses many challenges. A video stream
captured by traditional cameras is often prone to problems such as motion blur
or degraded image quality due to challenging lighting conditions. In addition,
the frame rate - typically 30 or 60 frames per second - can be a limiting
factor in certain scenarios. Event cameras (DVS -- Dynamic Vision Sensor) are a
potentially interesting technology to address the above mentioned problems. In
this paper, we compare two methods of processing event data by means of deep
learning for the task of pedestrian detection. We used a representation in the
form of video frames, convolutional neural networks and asynchronous sparse
convolutional neural networks. The results obtained illustrate the potential of
event cameras and allow the evaluation of the accuracy and efficiency of the
methods used for high-resolution (1280 x 720 pixels) footage.Comment: Accepted for the PP-RAI'2023 - 4th Polish Conference on Artificial
Intelligenc
Traffic Sign Classification Using Deep and Quantum Neural Networks
Quantum Neural Networks (QNNs) are an emerging technology that can be used in
many applications including computer vision. In this paper, we presented a
traffic sign classification system implemented using a hybrid quantum-classical
convolutional neural network. Experiments on the German Traffic Sign
Recognition Benchmark dataset indicate that currently QNN do not outperform
classical DCNN (Deep Convolutuional Neural Networks), yet still provide an
accuracy of over 90% and are a definitely promising solution for advanced
computer vision.Comment: Accepted for the ICCVG 2022 conferenc
High-definition event frame generation using SoC FPGA devices
In this paper we have addressed the implementation of the accumulation and
projection of high-resolution event data stream (HD -1280 x 720 pixels) onto
the image plane in FPGA devices. The results confirm the feasibility of this
approach, but there are a number of challenges, limitations and trade-offs to
be considered. The required hardware resources of selected data
representations, such as binary frame, event frame, exponentially decaying time
surface and event frequency, were compared with those available on several
popular platforms from AMD Xilinx. The resulting event frames can be used for
typical vision algorithms, such as object classification and detection, using
both classical and deep neural network methods.Comment: Paper accepted for the SPA 2023 conferenc
Comparative study of subset selection methods for rapid prototyping of 3D object detection algorithms
Object detection in 3D is a crucial aspect in the context of autonomous
vehicles and drones. However, prototyping detection algorithms is
time-consuming and costly in terms of energy and environmental impact. To
address these challenges, one can check the effectiveness of different models
by training on a subset of the original training set. In this paper, we present
a comparison of three algorithms for selecting such a subset - random sampling,
random per class sampling, and our proposed MONSPeC (Maximum Object Number
Sampling per Class). We provide empirical evidence for the superior
effectiveness of random per class sampling and MONSPeC over basic random
sampling. By replacing random sampling with one of the more efficient
algorithms, the results obtained on the subset are more likely to transfer to
the results on the entire dataset. The code is available at:
https://github.com/vision-agh/monspec.Comment: Accepted for MMAR 2023 (27 th International Conference on Methods and
Models in Automation and Robotics
Pipeline Implementation of Peer Group Filtering in FPGA
In the paper a parallel FPGA implementation of the Peer Group Filtering algorithm is described. Implementation details, results, performance of the design and FPGA logic resources are discussed. The PGF algorithm customized for FPGA is compared with the original one and Vector Median Filtering
Real-time FPGA implementation of the Semi-Global Matching stereo vision algorithm for a 4K/UHD video stream
In this paper, we propose a real-time FPGA implementation of the Semi-Global
Matching (SGM) stereo vision algorithm. The designed module supports a 4K/Ultra
HD (3840 x 2160 pixels @ 30 frames per second) video stream in a 4 pixel per
clock (ppc) format and a 64-pixel disparity range. The baseline SGM
implementation had to be modified to process pixels in the 4ppc format and meet
the timing constrains, however, our version provides results comparable to the
original design. The solution has been positively evaluated on the Xilinx VC707
development board with a Virtex-7 FPGA device.Comment: Paper accepted for the DASIP 2023 workshop in conjunction with HiPEAC
202
Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation
Deep neural networks virtually dominate the domain of most modern vision
systems, providing high performance at a cost of increased computational
complexity.Since for those systems it is often required to operate both in
real-time and with minimal energy consumption (e.g., for wearable devices or
autonomous vehicles, edge Internet of Things (IoT), sensor networks), various
network optimisation techniques are used, e.g., quantisation, pruning, or
dedicated lightweight architectures. Due to the logarithmic distribution of
weights in neural network layers, a method providing high performance with
significant reduction in computational precision (for 4-bit weights and less)
is the Power-of-Two (PoT) quantisation (and therefore also with a logarithmic
distribution). This method introduces additional possibilities of replacing the
typical for neural networks Multiply and ACcumulate (MAC -- performing, e.g.,
convolution operations) units, with more energy-efficient Bitshift and
ACcumulate (BAC). In this paper, we show that a hardware neural network
accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104
SoC FPGA can be at least more energy efficient than the uniform
quantisation version. To further reduce the actual power requirement by
omitting part of the computation for zero weights, we also propose a new
pruning method adapted to logarithmic quantisation.Comment: Accepted for the ICCVG 2022 conferenc
PointPillars Backbone Type Selection For Fast and Accurate LiDAR Object Detection
3D object detection from LiDAR sensor data is an important topic in the
context of autonomous cars and drones. In this paper, we present the results of
experiments on the impact of backbone selection of a deep convolutional neural
network on detection accuracy and computation speed. We chose the PointPillars
network, which is characterised by a simple architecture, high speed, and
modularity that allows for easy expansion. During the experiments, we paid
particular attention to the change in detection efficiency (measured by the mAP
metric) and the total number of multiply-addition operations needed to process
one point cloud. We tested 10 different convolutional neural network
architectures that are widely used in image-based detection problems. For a
backbone like MobilenetV1, we obtained an almost 4x speedup at the cost of a
1.13% decrease in mAP. On the other hand, for CSPDarknet we got an acceleration
of more than 1.5x at an increase in mAP of 0.33%. We have thus demonstrated
that it is possible to significantly speed up a 3D object detector in LiDAR
point clouds with a small decrease in detection efficiency. This result can be
used when PointPillars or similar algorithms are implemented in embedded
systems, including SoC FPGAs. The code is available at
https://github.com/vision-agh/pointpillars\_backbone.Comment: Accepted for the ICCVG 2022 conferenc
Implementation of a perception system for autonomous vehicles using a detection-segmentation network in SoC FPGA
Perception and control systems for autonomous vehicles are an active area of
scientific and industrial research. These solutions should be characterised by
high efficiency in recognising obstacles and other environmental elements in
different road conditions, real-time capability, and energy efficiency.
Achieving such functionality requires an appropriate algorithm and a suitable
computing platform. In this paper, we have used the MultiTaskV3
detection-segmentation network as the basis for a perception system that can
perform both functionalities within a single architecture. It was appropriately
trained, quantised, and implemented on the AMD Xilinx Kria KV260 Vision AI
embedded platform. By using this device, it was possible to parallelise and
accelerate the computations. Furthermore, the whole system consumes relatively
little power compared to a CPU-based implementation (an average of 5 watts,
compared to the minimum of 55 watts for weaker CPUs, and the small size (119mm
x 140mm x 36mm) of the platform allows it to be used in devices where the
amount of space available is limited. It also achieves an accuracy higher than
97% of the mAP (mean average precision) for object detection and above 90% of
the mIoU (mean intersection over union) for image segmentation. The article
also details the design of the Mecanum wheel vehicle, which was used to test
the proposed solution in a mock-up city.Comment: The paper was accepted for the 19th International Symposium on
Applied Reconfigurable Computing - ARC 2023, Cottbus - German