Search CORE

350 research outputs found

Profile driven dataflow optimisation of mean shift visual tracking

Author: Bhowmik Deepayan
Michaelson Greg
Qian Xinyuan
Stewart Robert
Wallace Andrew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/12/2014
Field of study

Profile guided optimisation is a common technique used by compilers and runtime systems to shorten execution runtimes and to optimise locality aware scheduling and memory access on heterogeneous hardware platforms. Some profiling tools trace the execution of low level code, whilst others are designed for abstract models of computation to provide rich domain-specific context in profiling reports. We have implemented mean shift, a computer vision tracking algorithm, in the RVC-CAL dataflow language and use both dynamic runtime and static dataflow profiling mechanisms to identify and eliminate bottlenecks in our naive initial version. We use these profiling reports to tune the CPU scheduler reducing runtime by 88%, and to optimise our dataflow implementation that reduces runtime by a further 43% - an overall runtime reduction of 93%. We also assess the portability of our mean shift optimisations by trading off CPU runtime against resource utilisation on FPGAs. Applying all dataflow optimisations reduces FPGA design space significantly, requiring fewer slice LUTs and less block memory

CiteSeerX

Profile Guided Dataflow Transformation for FPGAs and CPUs

Author: Bhowmik Deepayan
Michaelson Greg
Stewart Robert
Wallace Andrew
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/10/2015
Field of study

This paper proposes a new high-level approach for optimising field programmable gate array (FPGA) designs. FPGA designs are commonly implemented in low-level hardware description languages (HDLs), which lack the abstractions necessary for identifying opportunities for significant performance improvements. Using a computer vision case study, we show that modelling computation with dataflow abstractions enables substantial restructuring of FPGA designs before lowering to the HDL level, and also improve CPU performance. Using the CPU transformations, runtime is reduced by 43 %. Using the FPGA transformations, clock frequency is increased from 67MHz to 110MHz. Our results outperform commercial low-level HDL optimisations, showcasing dataflow program abstraction as an amenable computation model for highly effective FPGA optimisation

Image Processing Using FPGAs

Author: Bailey Donald
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

Low Latency Rendering with Dataflow Architectures

Author: Friston S
Publication venue: UCL (University College London)
Publication date: 28/03/2017
Field of study

The research presented in this thesis concerns latency in VR and synthetic environments. Latency is the end-to-end delay experienced by the user of an interactive computer system, between their physical actions and the perceived response to these actions. Latency is a product of the various processing, transport and buffering delays present in any current computer system. For many computer mediated applications, latency can be distracting, but it is not critical to the utility of the application. Synthetic environments on the other hand attempt to facilitate direct interaction with a digitised world. Direct interaction here implies the formation of a sensorimotor loop between the user and the digitised world - that is, the user makes predictions about how their actions affect the world, and see these predictions realised. By facilitating the formation of the this loop, the synthetic environment allows users to directly sense the digitised world, rather than the interface, and induce perceptions, such as that of the digital world existing as a distinct physical place. This has many applications for knowledge transfer and efficient interaction through the use of enhanced communication cues. The complication is, the formation of the sensorimotor loop that underpins this is highly dependent on the fidelity of the virtual stimuli, including latency. The main research questions we ask are how can the characteristics of dataflow computing be leveraged to improve the temporal fidelity of the visual stimuli, and what implications does this have on other aspects of the fidelity. Secondarily, we ask what effects latency itself has on user interaction. We test the effects of latency on physical interaction at levels previously hypothesized but unexplored. We also test for a previously unconsidered effect of latency on higher level cognitive functions. To do this, we create prototype image generators for interactive systems and virtual reality, using dataflow computing platforms. We integrate these into real interactive systems to gain practical experience of how the real perceptible benefits of alternative rendering approaches, but also what implications are when they are subject to the constraints of real systems. We quantify the differences of our systems compared with traditional systems using latency and objective image fidelity measures. We use our novel systems to perform user studies into the effects of latency. Our high performance apparatuses allow experimentation at latencies lower than previously tested in comparable studies. The low latency apparatuses are designed to minimise what is currently the largest delay in traditional rendering pipelines and we find that the approach is successful in this respect. Our 3D low latency apparatus achieves lower latencies and higher fidelities than traditional systems. The conditions under which it can do this are highly constrained however. We do not foresee dataflow computing shouldering the bulk of the rendering workload in the future but rather facilitating the augmentation of the traditional pipeline with a very high speed local loop. This may be an image distortion stage or otherwise. Our latency experiments revealed that many predictions about the effects of low latency should be re-evaluated and experimenting in this range requires great care

Algorithms and architectures for the multirate additive synthesis of musical tones

Author: Phillips Desmond Keith
Publication venue
Publication date: 01/01/1996
Field of study

In classical Additive Synthesis (AS), the output signal is the sum of a large number of independently controllable sinusoidal partials. The advantages of AS for music synthesis are well known as is the high computational cost. This thesis is concerned with the computational optimisation of AS by multirate DSP techniques. In note-based music synthesis, the expected bounds of the frequency trajectory of each partial in a finite lifecycle tone determine critical time-invariant partial-specific sample rates which are lower than the conventional rate (in excess of 40kHz) resulting in computational savings. Scheduling and interpolation (to suppress quantisation noise) for many sample rates is required, leading to the concept of Multirate Additive Synthesis (MAS) where these overheads are minimised by synthesis filterbanks which quantise the set of available sample rates. Alternative AS optimisations are also appraised. It is shown that a hierarchical interpretation of the QMF filterbank preserves AS generality and permits efficient context-specific adaptation of computation to required note dynamics. Practical QMF implementation and the modifications necessary for MAS are discussed. QMF transition widths can be logically excluded from the MAS paradigm, at a cost. Therefore a novel filterbank is evaluated where transition widths are physically excluded. Benchmarking of a hypothetical orchestral synthesis application provides a tentative quantitative analysis of the performance improvement of MAS over AS. The mapping of MAS into VLSI is opened by a review of sine computation techniques. Then the functional specification and high-level design of a conceptual MAS Coprocessor (MASC) is developed which functions with high autonomy in a loosely-coupled master- slave configuration with a Host CPU which executes filterbanks in software. Standard hardware optimisation techniques are used, such as pipelining, based upon the principle of an application-specific memory hierarchy which maximises MASC throughput

Durham e-Theses

Hardware acceleration of the trace transform for vision applications

Author: Fahmy Suhaib A.
Publication venue: Department of Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2008
Field of study

Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

Spiral - Imperial College Digital Repository

Deep learning : enhancing the security of software-defined networks

Author: Dawoud Ahmed
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2019
Field of study

Software-defined networking (SDN) is a communication paradigm that promotes network flexibility and programmability by separating the control plane from the data plane. SDN consolidates the logic of network devices into a single entity known as the controller. SDN raises significant security challenges related to its architecture and associated characteristics such as programmability and centralisation. Notably, security flaws pose a risk to controller integrity, confidentiality and availability. The SDN model introduces separation of the forwarding and control planes. It detaches the control logic from switching and routing devices, forming a central plane or network controller that facilitates communications between applications and devices. The architecture enhances network resilience, simplifies management procedures and supports network policy enforcement. However, it is vulnerable to new attack vectors that can target the controller. Current security solutions rely on traditional measures such as firewalls or intrusion detection systems (IDS). An IDS can use two different approaches: signature-based or anomaly-based detection. The signature-based approach is incapable of detecting zero-day attacks, while anomaly-based detection has high false-positive and false-negative alarm rates. Inaccuracies related to false-positive attacks may have significant consequences, specifically from threats that target the controller. Thus, improving the accuracy of the IDS will enhance controller security and, subsequently, SDN security. A centralised network entity that controls the entire network is a primary target for intruders. The controller is located at a central point between the applications and the data plane and has two interfaces for plane communications, known as northbound and southbound, respectively. Communications between the controller, the application and data planes are prone to various types of attacks, such as eavesdropping and tampering. The controller software is vulnerable to attacks such as buffer and stack overflow, which enable remote code execution that can result in attackers taking control of the entire network. Additionally, traditional network attacks are more destructive. This thesis introduces a threat detection approach aimed at improving the accuracy and efficiency of the IDS, which is essential for controller security. To evaluate the effectiveness of the proposed framework, an empirical study of SDN controller security was conducted to identify, formalise and quantify security concerns related to SDN architecture. The study explored the threats related to SDN architecture, specifically threats originating from the existence of the control plane. The framework comprises two stages, involving the use of deep learning (DL) algorithms and clustering algorithms, respectively. DL algorithms were used to reduce the dimensionality of inputs, which were forwarded to clustering algorithms in the second stage. Features were compressed to a single value, simplifying and improving the performance of the clustering algorithm. Rather than using the output of the neural network, the framework presented a unique technique for dimensionality reduction that used a single value—reconstruction error—for the entire input record. The use of a DL algorithm in the pre-training stage contributed to solving the problem of dimensionality related to k-means clustering. Using unsupervised algorithms facilitated the discovery of new attacks. Further, this study compares generative energy-based models (restricted Boltzmann machines) with non-probabilistic models (autoencoders). The study implements TensorFlow in four scenarios. Simulation results were statistically analysed using a confusion matrix, which was evaluated and compared with similar related works. The proposed framework, which was adapted from existing similar approaches, resulted in promising outcomes and may provide a robust prospect for deployment in modern threat detection systems in SDN. The framework was implemented using TensorFlow and was benchmarked to the KDD99 dataset. Simulation results showed that the use of the DL algorithm to reduce dimensionality significantly improved detection accuracy and reduced false-positive and false-negative alarm rates. Extensive simulation studies on benchmark tasks demonstrated that the proposed framework consistently outperforms all competing approaches. This improvement is a further step towards the development of a reliable IDS to enhance the security of SDN controllers

An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

Author: Wyngaard Janet Ruth
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2014
Field of study

Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Moore’s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

The University Defence Research Collaboration In Signal Processing

Author: Aparicio-Navarro Francisco J.
Barrington Stephen
Chambers Jonathon A.
Gong Yu
Kyriakopoulos Konstantinos
Rixson Matthew
Publication venue: Defence Science and Technology Laboratory (Dstl) publication, DSTL/PUB107185.
Publication date: 08/02/2018
Field of study

This chapter describes the development of algorithms for automatic detection of anomalies from multi-dimensional, undersampled and incomplete datasets. The challenge in this work is to identify and classify behaviours as normal or abnormal, safe or threatening, from an irregular and often heterogeneous sensor network. Many defence and civilian applications can be modelled as complex networks of interconnected nodes with unknown or uncertain spatio-temporal relations. The behavior of such heterogeneous networks can exhibit dynamic properties, reflecting evolution in both network structure (new nodes appearing and existing nodes disappearing), as well as inter-node relations. The UDRC work has addressed not only the detection of anomalies, but also the identification of their nature and their statistical characteristics. Normal patterns and changes in behavior have been incorporated to provide an acceptable balance between true positive rate, false positive rate, performance and computational cost. Data quality measures have been used to ensure the models of normality are not corrupted by unreliable and ambiguous data. The context for the activity of each node in complex networks offers an even more efficient anomaly detection mechanism. This has allowed the development of efficient approaches which not only detect anomalies but which also go on to classify their behaviour

De Montfort University Open Research Archive

Real-time synthetic primate vision

Author: Dankers Andrew
Publication venue
Publication date: 17/08/2018
Field of study

The Australian National University