272 research outputs found

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    A system to predict the S&P 500 using a bio-inspired algorithm

    Get PDF
    The goal of this research was to develop an algorithmic system capable of predicting the directional trend of the S&P 500 financial index. The approach I have taken was inspired by the biology of the human retina. Extensive research has been published attempting to predict different financial markets using historical data, testing on an in-sample and trend basis with many employing sophisticated mathematical techniques. In reviewing and evaluating these in-sample methodologies, it became evident that this approach was unable to achieve sufficiently reliable prediction performance for commercial exploitation. For these reasons, I moved to an out-of-sample strategy and am able to predict tomorrow’s (t+1) directional trend of the S&P 500 at 55.1%. The key elements that underpin my bio-inspired out-of-sample system are: Identification of 51 financial market data (FMD) inputs, including other indices, currency pairs, swap rates, that affect the 500 component companies of the S&P 500. The use of an extensive historical data set, comprising the actual daily closing prices of the chosen 51 FMD inputs and S&P 500. The ability to compute this large data set in a time frame of less than 24 hours. The data set was fed into a linear regression algorithm to determine the predicted value of tomorrow’s (t+1) S&P 500 closing price. This process was initially carried out in MatLab which proved the concept of my approach, but (3) above was not met. In order to successfully meet the requirement of handling such a large data set to complete the prediction target on time, I decided to adopt a novel graphics processing unit (GPU) based computational architecture. Through extensive optimisation of my GPU engine, I was able to achieve a sufficient speed up of 150x to meet (3). In achieving my optimum directional trend of 55.1%, an extensive range of tests exploring a number of trade offs were carried out using an 8 year data set. The results I have obtained will form the basis of a commercial investment fund. It should be noted that my algorithm uses financial data of the past 60-days, and as such would not be able to predict rapid market changes such as a stock market crash

    Inference in supervised spectral classifiers for on-board hyperspectral imaging: An overview

    Get PDF
    Machine learning techniques are widely used for pixel-wise classification of hyperspectral images. These methods can achieve high accuracy, but most of them are computationally intensive models. This poses a problem for their implementation in low-power and embedded systems intended for on-board processing, in which energy consumption and model size are as important as accuracy. With a focus on embedded anci on-board systems (in which only the inference step is performed after an off-line training process), in this paper we provide a comprehensive overview of the inference properties of the most relevant techniques for hyperspectral image classification. For this purpose, we compare the size of the trained models and the operations required during the inference step (which are directly related to the hardware and energy requirements). Our goal is to search for appropriate trade-offs between on-board implementation (such as model size anci energy consumption) anci classification accuracy

    Automated optimization of reconfigurable designs

    Get PDF
    Currently, the optimization of reconfigurable design parameters is typically done manually and often involves substantial amount effort. The main focus of this thesis is to reduce this effort. The designer can focus on the implementation and design correctness, leaving the tools to carry out optimization. To address this, this thesis makes three main contributions. First, we present initial investigation of reconfigurable design optimization with the Machine Learning Optimizer (MLO) algorithm. The algorithm is based on surrogate model technology and particle swarm optimization. By using surrogate models the long hardware generation time is mitigated and automatic optimization is possible. For the first time, to the best of our knowledge, we show how those models can both predict when hardware generation will fail and how well will the design perform. Second, we introduce a new algorithm called Automatic Reconfigurable Design Efficient Global Optimization (ARDEGO), which is based on the Efficient Global Optimization (EGO) algorithm. Compared to MLO, it supports parallelism and uses a simpler optimization loop. As the ARDEGO algorithm uses multiple optimization compute nodes, its optimization speed is greatly improved relative to MLO. Hardware generation time is random in nature, two similar configurations can take vastly different amount of time to generate making parallelization complicated. The novelty is efficient use of the optimization compute nodes achieved through extension of the asynchronous parallel EGO algorithm to constrained problems. Third, we show how results of design synthesis and benchmarking can be reused when a design is ported to a different platform or when its code is revised. This is achieved through the new Auto-Transfer algorithm. A methodology to make the best use of available synthesis and benchmarking results is a novel contribution to design automation of reconfigurable systems.Open Acces

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

    Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

    Get PDF
    Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
    • …
    corecore