655 research outputs found

    Improving low latency applications for reconfigurable devices

    Get PDF
    This thesis seeks to improve low latency application performance via architectural improvements in reconfigurable devices. This is achieved by improving resource utilisation and access, and by exploiting the different environments within which reconfigurable devices are deployed. Our first contribution leverages devices deployed at the network level to enable the low latency processing of financial market data feeds. Financial exchanges transmit messages via two identical data feeds to reduce the chance of message loss. We present an approach to arbitrate these redundant feeds at the network level using a Field-Programmable Gate Array (FPGA). With support for any messaging protocol, we evaluate our design using the NASDAQ TotalView-ITCH, OPRA, and ARCA data feed protocols, and provide two simultaneous outputs: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. Our second contribution is a new ring-based architecture for low latency, parallel access to FPGA memory. Traditional FPGA memory is formed by grouping block memories (BRAMs) together and accessing them as a single device. Our architecture accesses these BRAMs independently and in parallel. Targeting memory-based computing, which stores pre-computed function results in memory, we benefit low latency applications that rely on: highly-complex functions; iterative computation; or many parallel accesses to a shared resource. We assess square root, power, trigonometric, and hyperbolic functions within the FPGA, and provide a tool to convert Python functions to our new architecture. Our third contribution extends the ring-based architecture to support any FPGA processing element. We unify E heterogeneous processing elements within compute pools, with each element implementing the same function, and the pool serving D parallel function calls. Our implementation-agnostic approach supports processing elements with different latencies, implementations, and pipeline lengths, as well as non-deterministic latencies. Compute pools evenly balance access to processing elements across the entire application, and are evaluated by implementing eight different neural network activation functions within an FPGA.Open Acces

    Achievable information rates for nonlinear frequency division multiplexed fibre optic systems

    Get PDF
    Fibre optic infrastructure is critical to meet the high data rate and long-distance communication requirements of modern networks. Recent developments in wireless communication technologies, such as 5G and 6G, offer the potential for ultra-high data rates and low-latency communication within a single cell. However, to extend this high performance to the backbone network, the data rate of the fibre optics connection between wireless base stations may become a bottleneck due to the capacity crunch phenomena induced by the signal dependent Kerr nonlinear effect. To address this, the nonlinear Fourier transform (NFT) is proposed as a solution to resolve the Kerr nonlinearity and linearise the nonlinear evolution of time domain pulses in the nonlinear frequency domain (NFD) for a lossless and noiseless fibre. Nonlinear frequency division multiplexing (NFDM), which encodes information on NFD using the discrete and continuous spectra revealed by NFT, is also proposed. However, implementing such signalling in an optical amplifier noise-perturbed fibre results in complicated, signal-dependent noise in NFD, the signal-dependent statistics and unknown model of which make estimating the capacity of such a system an open problem. In this thesis, the solitonic part of the NFD, the discrete spectrum is first studied. Modulating the information in the amplitude of soliton pulse, the maximum time-scaled mutual information is estimated. Such a definition allows us to directly incorporate the dependence of soliton pulse width to its amplitude into capacity formulation. The commonly used memoryless channel model based on noncentral chi-squared distribution is initially considered. Applying a variance normalising transform, this channel is approximated by a unit-variance additive white Gaussian noise (AWGN) model. Based on a numerical capacity analysis of the approximated AWGN channel, a general form of capacity-approaching input distributions is determined. These optimal distributions are discrete comprising a mass point at zero (off symbol) and a finite number of mass points almost uniformly distributed away from zero. Using this general form of input distributions, a novel closed-form approximation of the capacity is determined showing a good match to numerical results. A mismatch capacity bounds are developed based on split-step simulations of the nonlinear Schro¨\rm \ddot{o}dinger equation considering both single soliton and soliton sequence transmissions. This relaxes the initial assumption of memoryless channel to show the impact of both inter-soliton interaction and Gordon-Haus effects. Our results show that the inter-soliton interaction effect becomes increasingly significant at higher soliton amplitudes and would be the dominant impairment compared to the timing jitter induced by the Gordon-Haus effect. Next, the intrinsic soliton interaction, Gordon Haus effect and their coupled perturbation on the soliton system are visualised. The feasibility of employing an artificial neural network to resolve the inter-soliton interaction, which is the dominant impairment in higher power regimes, is investigated. A method is suggested to improve the achievable information rate of an amplitude modulated soliton communication system using a classification neural network against the inter-soliton interaction. Significant gain is demonstrated not only over the eigenvalue estimation of nonlinear Fourier transform, but also the continuous spectrum and eigenvalue correlation assisted detection scheme in the literature. Lastly, for the nonsolitonic radiation of the NFT, the continuous spectrum is exploited. An approximate channel model is proposed for direct signalling on the continuous spectrum of a NFDM communication system, describing the effect of noise and nonlinearity at the receiver. The optimal input distribution that maximises the mutual information of the proposed approximated channel under peak amplitude constraint is then studied. We present that, considering the input-dependency of the noise, the conventional amplitude-constrained constellation designs can be shaped geometrically to provide significant mutual information gains. However, it is observed that further probabilistic shaping and constellation size optimisation can only provide limited additional gains beyond the best geometrically shaped counterparts, the 64 amplitude phase shift keying. Then, an approximated channel model that neglects the correlation between subcarriers is proposed for the matched filtered signalling system, based on which the input constellation is shaped geometrically. We demonstrate that, although the inter-subcarrier interference in the filtered system is not included in the channel model, shaping of the matched filtered system can provide promising gains in mismatch capacity over the unfiltered scenario

    Passive IoT Device-Type Identification Using Few-Shot Learning

    Get PDF
    The ever-growing number and diversity of connected devices have contributed to rising network security challenges. Vulnerable and unauthorized devices may pose a significant security risk with severe consequences. Device-type identification is instrumental in reducing risk and thwarting cyberattacks that may be caused by vulnerable devices. At present, IoT device identification methods use traditional machine learning or deep learning techniques, which require a large amount of labeled data to generate the device fingerprints. Moreover, these techniques require building a new model whenever a new device is introduced. To address these limitations, we propose a few-shot learning-based approach on siamese neural networks to identify IoT device-type connected to a network by analyzing their network communications, which can be effective under conditions of insufficient labeled data and/or resources. We evaluate our method on data obtained from real-world IoT devices. The experimental results show the effectiveness of the proposed method even with a small amount of data samples. Besides, it indicates that our approach outperforms IoT Sentinel, the state-of-the-art approach for IoT fingerprinting, by a margin of 10% additional accuracy

    A variational autoencoder application for real-time anomaly detection at CMS

    Get PDF
    Despite providing invaluable data in the field of High Energy Physics, towards higher luminosity runs the Large Hadron Collider (LHC) will face challenges in discovering interesting results through conventional methods used in previous run periods. Among the proposed approaches, the one we focus on in this thesis work – in collaboration with CERN teams, involves the use of a joint variational autoencoder (JointVAE) machine learning model, trained on known physics processes to identify anomalous events that correspond to previously unidentified physics signatures. By doing so, this method does not rely on any specific new physics signatures and can detect anomalous events in an unsupervised manner, complementing the traditional LHC search tactics that rely on model-dependent hypothesis testing. The algorithm produces a list of anomalous events, which experimental collaborations will examine and eventually confirm as new physics phenomena. Furthermore, repetitive event topologies in the dataset can inspire new physics model building and experimental searches. Implementing this algorithm in the trigger system of LHC experiments can detect previously unnoticed anomalous events, thus broadening the discovery potential of the LHC. This thesis presents a method for implementing the JointVAE model, for real-time anomaly detection in the Compact Muon Solenoid (CMS) experiment. Among the challenges of implementing machine learning models in fast applications, such as the trigger system of the LHC experiments, low latency and reduced resource consumption are essential. Therefore, the JointVAE model has been studied for its implementation feasibility in Field-Programmable Gate Arrays (FPGAs), utilizing a tool based on High-Level Synthesis (HLS) named HLS4ML. The tool, combined with the quantization of neural networks, will reduce the model size, latency, and energy consumption

    Radar-Based Multi-Target Classification Using Deep Learning

    Get PDF
    Real-time, radar-based human activity and target recognition has several applications in various fields. Examples include hand gesture recognition, border and home surveillance, pedestrian recognition for automotive safety and fall detection for assisted living. This dissertation sought to improve the speed and accuracy of a previously developed model classifying human activity and targets using radar data for outdoor surveillance purposes. An improvement in accuracy and speed of classification helps surveillance systems to provide reliable results on time. For example, the results can be used to intercept trespassers, poachers or smugglers. To achieve these objectives, radar data was collected using a C-band pulse-Doppler radar and converted to spectrograms using the Short-time Fourier transform (STFT) algorithm. Spectrograms of the following classes were utilised in classification: one human walking, two humans walking, one human running, moving vehicles, a swinging sphere and clutter/noise. A seven-layer residual network was proposed, which utilised batch normalisation (BN), global average pooling (GAP), and residual connections to achieve a classification accuracy of 92.90% and 87.72% on the validation and test data, respectively. Compared to the previously proposed model, this represented a 10% improvement in accuracy on the validation data and a 3% improvement on the test data. Applying model quantisation provided up to 3.8 times speedup in inference, with a less than 0.4% accuracy drop on both the validation and test data. The quantised model could support a range of up to 89.91 kilometres in real-time, allowing it to be used in radars that operate within this range

    Evaluating cognitive load of text-to-speech synthesis

    Get PDF
    This thesis addresses the vital topic of evaluating synthetic speech and its impact on the end-user, taking into consideration potential negative implications on cognitive load. While conventional methods like transcription tests and Mean Opinion Scores (MOS) tests offer a valuable overall understanding of system performance, they fail to provide deeper insights into the reasons behind the performance. As text-to-speech (TTS) systems are increasingly used in real-world applications, it becomes crucial to explore whether synthetic speech imposes a greater cognitive load on listeners compared to human speech, as excessive cognitive effort could lead to fatigue over time. The study focuses on assessing the cognitive load of synthetic speech by presenting two methodologies: the dual-task paradigm and pupillometry. The dual-task paradigm initially seemed promising but was eventually deemed unreliable and unsuitable due to uncertainties in experimental setups which requires further investigation. However, pupillometry emerged as a viable approach, demonstrating its efficacy in detecting differences in cognitive load among various speech synthesizers. Notably, the research confirmed that accurate measurement of listening difficulty requires imposing sufficient cognitive load on listeners. To achieve this, the most viable experimental setup involved measuring the pupil response while listening to speech in the presence of noise. Through these experiments, intriguing contrasts between human and synthetic speech were revealed. Human speech consistently demanded the least cognitive load. On the other hand, state-of-the-art TTS systems showed promising results, indicating a significant improvement in their cognitive load performance compared to rule-based synthesizers of the past. Pupillometry offers a deeper understanding of the contributing factors to increased cognitive load in synthetic speech processing. Particularly, an experiment highlighted that the separate modeling of spectral feature prediction and duration in TTS systems led to heightened cognitive load. However, encouragingly, many modern end-to-end TTS systems have addressed these issues by predicting acoustic features within a unified framework, and thus effectively reducing the overall cognitive load imposed by synthetic speech. As the gap between human and synthetic speech diminishes with advancements in TTS technology, continuous evaluation using pupillometry remains essential for optimizing TTS systems for low cognitive load. Although pupillometry demands advanced analysis techniques and is time-consuming, the meaningful insights it provides into the cognitive load of synthetic speech contribute to an enhanced user experience and better TTS system development. Overall, this work successfully establishes pupillometry as a viable and effective method for measuring cognitive load of synthetic speech, propelling synthetic speech evaluation beyond traditional metrics. By gaining a deeper understanding of synthetic speech's interaction with the human cognitive processing system, researchers and developers can work towards creating TTS systems that offer improved user experiences with reduced cognitive load, ultimately enhancing the overall usability and acceptance of such technologies. Note: There was a 2-year break in the work reported in this thesis where an initial pilot was performed in early 2020 and was then suspended due to the covid-19 pandemic. Experiments were therefore rerun in 2022/23 with the most recent state-of-the-art models so that we could determine whether the increased cognitive load result is still applicable. This thesis was thus concluded by answering whether such cognitive load methods developed in this thesis are still useful, practical and/or relevant for current state-of-the-art text-to-speech systems

    Ensembles of Pruned Deep Neural Networks for Accurate and Privacy Preservation in IoT Applications

    Get PDF
    The emergence of the AIoT (Artificial Intelligence of Things) represents the powerful convergence of Artificial Intelligence (AI) with the expansive realm of the Internet of Things (IoT). By integrating AI algorithms with the vast network of interconnected IoT devices, we open new doors for intelligent decision-making and edge data analysis, transforming various domains from healthcare and transportation to agriculture and smart cities. However, this integration raises pivotal questions: How can we ensure deep learning models are aptly compressed and quantised to operate seamlessly on devices constrained by computational resources, without compromising accuracy? How can these models be effectively tailored to cope with the challenges of statistical heterogeneity and the uneven distribution of class labels inherent in IoT applications? Furthermore, in an age where data is a currency, how do we uphold the sanctity of privacy for the sensitive data that IoT devices incessantly generate while also ensuring the unhampered deployment of these advanced deep learning models? Addressing these intricate challenges forms the crux of this thesis, with its contributions delineated as follows: Ensyth: A novel approach designed to synthesise pruned ensembles of deep learning models, which not only makes optimal use of limited IoT resources but also ensures a notable boost in predictability. Experimental evidence gathered from CIFAR-10, CIFAR-5, and MNIST-FASHION datasets solidify its merit, especially given its capacity to achieve high predictability. MicroNets: Venturing into the realms of efficiency, this is a multi-phase pruning pipeline that fuses the principles of weight pruning, channel pruning. Its objective is clear: foster efficient deep ensemble learning, specially crafted for IoT devices. Benchmark tests conducted on CIFAR-10 and CIFAR-100 datasets demonstrate its prowess, highlighting a compression ratio of nearly 92%, with these pruned ensembles surpassing the accuracy metrics set by conventional models. FedNets: Recognising the challenges of statistical heterogeneity in federated learning and the ever-growing concerns of data privacy, this innovative federated learning framework is introduced. It facilitates edge devices in their collaborative quest to train ensembles of pruned deep neural networks. More than just training, it ensures data privacy remains uncompromised. Evaluations conducted on the Federated CIFAR-100 dataset offer a testament to its efficacy. In this thesis, substantial contributions have been made to the AIoT application domain. Ensyth, MicroNets, and FedNets collaboratively tackle the challenges of efficiency, accuracy, statistical heterogeneity arising from distributed class labels, and privacy concerns inherent in deploying AI applications on IoT devices. The experimental results underscore the effectiveness of these approaches, paving the way for their practical implementation in real-world scenarios. By offering an integrated solution that satisfies multiple key requirements simultaneously, this research brings us closer to the realisation of effective and privacy-preserved AIoT systems

    Intelligent Sensing and Learning for Advanced MIMO Communication Systems

    Get PDF
    • …
    corecore