160 research outputs found

    Late-bound code generation

    Get PDF
    Each time a function or method is invoked during the execution of a program, a stream of instructions is issued to some underlying hardware platform. But exactly what underlying hardware, and which instructions, is usually left implicit. However in certain situations it becomes important to control these decisions. For example, particular problems can only be solved in real-time when scheduled on specialised accelerators, such as graphics coprocessors or computing clusters. We introduce a novel operator for hygienically reifying the behaviour of a runtime function instance as a syntactic fragment, in a language which may in general differ from the source function definition. Translation and optimisation are performed by recursively invoked, dynamically dispatched code generators. Side-effecting operations are permitted, and their ordering is preserved. We compare our operator with other techniques for pragmatic control, observing that: the use of our operator supports lifting arbitrary mutable objects, and neither requires rewriting sections of the source program in a multi-level language, nor interferes with the interface to individual software components. Due to its lack of interference at the abstraction level at which software is composed, we believe that our approach poses a significantly lower barrier to practical adoption than current methods. The practical efficacy of our operator is demonstrated by using it to offload the user interface rendering of a smartphone application to an FPGA coprocessor, including both statically and procedurally defined user interface components. The generated pipeline is an application-specific, statically scheduled processor-per-primitive rendering pipeline, suitable for place-and-route style optimisation. To demonstrate the compatibility of our operator with existing languages, we show how it may be defined within the Python programming language. We introduce a transformation for weakening mutable to immutable named bindings, termed let-weakening, to solve the problem of propagating information pertaining to named variables between modular code generating units.Open Acces

    On Design and Optimization of Convolutional Neural Network for Embedded Systems

    Get PDF
    This work presents the research on optimizing neural networks and deploying them for real-time practical applications. We analyze different optimization methods, namely binarization, separable convolution and pruning. We implement each method for the application of vehicle classification and we empirically evaluate and analyze the results. The objective is to make large neural networks suitable for real-time applications by reducing the computation requirements through these optimization approaches. The data set is of vehicles from 4 classes of vehicle types, and a convolutional model was used to solve the problem initially. Our results show that these optimization methods offer many performance benefits in this application in terms of reduced execution time (by up to 5 ×), reduced model storage requirements, with out largely impacting accuracy, making them a suitable tool for use in streamlining heavy neural networks to be used on resource-constrained envrionments. The platforms used in the research are a desktop platform, and two embedded platforms

    Data-Driven Methods for Data Center Operations Support

    Get PDF
    During the last decade, cloud technologies have been evolving at an impressive pace, such that we are now living in a cloud-native era where developers can leverage on an unprecedented landscape of (possibly managed) services for orchestration, compute, storage, load-balancing, monitoring, etc. The possibility to have on-demand access to a diverse set of configurable virtualized resources allows for building more elastic, flexible and highly-resilient distributed applications. Behind the scenes, cloud providers sustain the heavy burden of maintaining the underlying infrastructures, consisting in large-scale distributed systems, partitioned and replicated among many geographically dislocated data centers to guarantee scalability, robustness to failures, high availability and low latency. The larger the scale, the more cloud providers have to deal with complex interactions among the various components, such that monitoring, diagnosing and troubleshooting issues become incredibly daunting tasks. To keep up with these challenges, development and operations practices have undergone significant transformations, especially in terms of improving the automations that make releasing new software, and responding to unforeseen issues, faster and sustainable at scale. The resulting paradigm is nowadays referred to as DevOps. However, while such automations can be very sophisticated, traditional DevOps practices fundamentally rely on reactive mechanisms, that typically require careful manual tuning and supervision from human experts. To minimize the risk of outages—and the related costs—it is crucial to provide DevOps teams with suitable tools that can enable a proactive approach to data center operations. This work presents a comprehensive data-driven framework to address the most relevant problems that can be experienced in large-scale distributed cloud infrastructures. These environments are indeed characterized by a very large availability of diverse data, collected at each level of the stack, such as: time-series (e.g., physical host measurements, virtual machine or container metrics, networking components logs, application KPIs); graphs (e.g., network topologies, fault graphs reporting dependencies among hardware and software components, performance issues propagation networks); and text (e.g., source code, system logs, version control system history, code review feedbacks). Such data are also typically updated with relatively high frequency, and subject to distribution drifts caused by continuous configuration changes to the underlying infrastructure. In such a highly dynamic scenario, traditional model-driven approaches alone may be inadequate at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust data-driven methods to support their decisions based on historical information. For instance, effective anomaly detection capabilities may also help in conducting more precise and efficient root-cause analysis. Also, leveraging on accurate forecasting and intelligent control strategies would improve resource management. Given their ability to deal with high-dimensional, complex data, Deep Learning-based methods are the most straightforward option for the realization of the aforementioned support tools. On the other hand, because of their complexity, this kind of models often requires huge processing power, and suitable hardware, to be operated effectively at scale. These aspects must be carefully addressed when applying such methods in the context of data center operations. Automated operations approaches must be dependable and cost-efficient, not to degrade the services they are built to improve. i

    Real-Time On-Site OpenGL-Based Object Speed Measuring Using Constant Sequential Image

    Get PDF
    This thesis presents a method that can detect moving objects and measure their speed of movement, using a constant rate series of sequential images, such as video recordings. It uses the industry standard non-vendor specific OpenGL ES so can be implemented on any platform with OpenGL ES support. It can run on low-end embedded system as it uses simple and basic foundations based on a few assumptions to lowering the overall implementation complexity in OpenGL ES. It also does not require any special peripheral devices, so existing infrastructure can be used with minimal modification, which will further lower the cost of this system. The sequential images are streamed from an IO device via the CPU into the GPU where a custom shader is used to detect changing pixels between frames to find potential moving objects. The GPU shader continues by measuring the pixel displacement of each object, and then maps this into a practical distance. These results are then sent back to the CPU for future processing. The algorithm was tested on two real world traffic videos (720p video at 10 FPS) and it successfully extracted the speed data of road vehicles in view on a low-end embedded system (Raspberry Pi 4)

    Accelerating Halide on an FPGA by using CIRCT and Calyx as an intermediate step to go from a high-level and software-centric IRs down to RTL

    Get PDF
    Image processing and, more generally, array processing play an essential role in modern life: from applying filters to the images that we upload to social media to running object detection algorithms on self-driving cars. Optimizing these algorithms can be complex and often results in non-portable code. The Halide language provides a simple way to write image and array processing algorithms by separating the algorithm definition (what needs to be executed) from its execution schedule (how it is executed), delivering state-of-the-art performance that exceeds hand-tuned parallel and vectorized code. Due to the inherent parallel nature of these algorithms, FPGAs present an attractive acceleration platform. While previous work has added an RTL code generator to Halide, and utilized other heterogeneous computing languages as an intermediate step, these projects are no longer maintained. MLIR is an attractive solution, allowing the generation of code that can target multiple devices, such as parallelized and vectorized CPU code, OpenMP, and CUDA. CIRCT builds on top of MLIR to convert generic MLIR code to register transfer level (RTL) languages by using Calyx, a new intermediate language (IL) for compiling high-level programs into hardware designs. This thesis presents a novel flow that implements an MLIR code generator for Halide that generates RTL code, adding the necessary wrappers to execute that code on Xilinx FPGA devices. Additionally, it implements a Halide runtime using the Xilinx Runtime (XRT), enabling seamless execution of the generated Halide RTL kernels. While this thesis provides initial support for running Halide kernels and not all features and optimizations are supported, it also details the future work needed to improve the performance of the generated RTL kernels. The proposed flow serves as a foundation for further research and development in the field of hardware acceleration for image and array processing applications using Halide

    Information Tehnologies and Systems 2022 (ITS 2022)

    Get PDF
    Сборник включает прошедшие рецензирование доклады международной научной конференции "Информационные технологии и системы 2022" (ИТС 2022). Сборник предназначен для преподавателей высших учебных заведений, научных сотрудников, студентов, аспирантов, магистрантов, а также для специалистов предприятий в сфере IT-технологий. Материалы сборника одобрены организационным комитетом и печатаются в авторской редакции

    Using reconstructed visual reality in ant navigation research

    Get PDF
    Insects have low resolution eyes and a tiny brain, yet they continuously solve very complex navigational problems; an ability that underpins fundamental biological processes such as pollination and parental care. Understanding the methods they employ would have profound impact on the fields of machine vision and robotics. As our knowledge on insect navigation grows, our physical, physiological and neural models get more complex and detailed. To test these models we need to perform increasingly sophisticated experiments. Evolution has optimised the animals to operate in their natural environment. To probe the fine details of the methods they utilise we need to use natural visual scenery which, for experimental purposes, we must be able to manipulate arbitrarily. Performing physiological experiments on insects outside the laboratory is not practical and our ability to modify the natural scenery for outdoor behavioural experiments is very limited. The solution is reconstructed visual reality, a projector that can present the visual aspect of the natural environment to the animal with high fidelity, taking the peculiarities of insect vision into account. While projectors have been used in insect research before, during my candidature I designed and built a projector specifically tuned to insect vision. To allow the ant to experience a full panoramic view, the projector completely surrounds her. The device (Antarium) is a polyhedral approximation of a sphere. It contains 20 thousand pixels made out of light emitting diodes (LEDs) that match the spectral sensitivity of Myrmecia. Insects have a much higher fusion frequency limit than humans, therefore the device has a very high flicker frequency (9kHz) and also a high frame rate (190fps). In the Antarium the animal is placed in the centre of the projector on a trackball. To test the trackball and to collect reference data, outdoor experiments were performed where ants were captured, tethered and placed on the trackball. The apparatus with the ant on it was then placed at certain locations relative to the nest and the foraging tree and the movements of the animal on the ball were recorded and analysed. The outdoor experiments proved that the trackball was well suited for our ants, and also provided the baseline behaviour reference for the subsequent Antarium experiments. To assess the Antarium, the natural habitat of the experimental animals was recreated as a 3-dimensional model. That model was then projected for the ants and their movements on the trackball was recorded, just like in the outdoor experiments Initial feasibility tests were performed by projecting a static image, which matches what the animals experienced during the outdoor experiments. To assess whether the ant was orienting herself relative to the scene we rotated the projected scene around her and her response monitored. Statistical methods were used to compare the outdoor and in-Antarium behaviour. The results proved that the concept was solid, but they also uncovered several shortcomings of the Antarium. Nevertheless, even with its limitations the Antarium was used to perform experiments that would be very hard to do in a real environment. In one experiment the foraging tree was repositioned in or deleted from the scene to see whether the animals go to where the tree is or where by their knowledge it should be. The results suggest the latter but the absence or altered location of the foraging tree certainly had a significant effect on the animals. In another experiment the scene, including the sky, were re-coloured to see whether colour plays a significant role in navigation. Results indicate that even very small amount of UV information statistically significantly improves the navigation of the animals. To rectify the device limitations discovered during the experiments a new, improved projector was designed and is currently being built
    corecore