33 research outputs found

    Enabling on-device domain adaptation of convolutional neural networks

    Get PDF
    Convolutional Neural Networks (CNN) are used ubiquitously in computer vision applications ranging from image classification to video-stream object detection. However due to the large memory and compute costs of executing CNNs, specialised hardware such as GPUs or ASICs are required to perform both CNN inference and training within reasonable time and memory budgets. Consequently, most applications today perform both CNN inference and training on servers where user data is sent from an edge device back to a server to process. This raises data privacy concerns and places a strict necessity for good edge-server communication links. Recently, with improvements in the specialised hardware (especially GPUs) available on edge devices, an increased number of applications have moved the inference stage onto the edge, but few to none have considered performing training on an edge device. With a focus on CNNs used for image classification, the work in this PhD explores when it would be useful to perform retraining of networks on an edge device, what the gains would be of doing so and how one can perform such training even in resource constrained settings. This exploration begins with the assumption that the classes observed by the model upon deployment is a subset of the classes present in the dataset used to train the model initially. This scenario is simulated by constructing semantically meaningful subsets of classes from existing large image classification datasets (eg. ImageNet) and exploring the gains, in terms of classification accuracy and the memory consumption and latency of the inference and training stages, that can be achieved by pruning (architecture modification) and retraining (weights adaptation) a deployed network to the observed class distribution. The exploration is split into three stages. First, an oracle is constructed that predicts the gains that can be achieved by pruning and retraining a network under the assumption that we know the exact label of each image observed upon deployment and do not have any hardware resource constraints. This demonstrates the accuracy and performance gains that can theoretically be achieved per network and subset combination. The significant gains demonstrated here for certain subsets of data motivate the remainder of the work in this PhD. The works that follow explore ways to perform such adaptation on hardware that is resource constrained and also when there is uncertainty in the labels of the observed data-points that are used to perform this adaptation. Pruning was utilised as a method to enable training to be performed on resource constraint hardware by reducing the memory and latency footprints of the training process. When doing so, it was observed that depending on the manner in which a network is pruned, a set of networks that all consume the same amount of memory for storing weights, can each have drastically different latencies and memory consumptions while performing training. Hence, the size of a stored model is not a useful predictor of which networks can be feasibly trained within edge hardware resource budgets. To cater for this, a novel, accurate and data-driven model for predicting the training memory consumption and latency of a network on a specific target hardware and execution framework (PyTorch, Tensorflow, etc.) combination is proposed. Doing so enables the selection of a pruned network, whose memory consumption and latency of training fits within the available memory and latency budgets that are dictated by the target hardware and application. This then allows for the network to be adapted to the observed data distribution. An additional benefit of using the proposed data-driven model is that it allows to rapidly create new models specific to each network, hardware and execution framework combination. Finally, the analysis is extended to account for uncertainty in the class labels of the observed data distribution. This uncertainty in the label distribution can negatively impact any attempts to retrain the network. To combat this, a novel Variational Auto-Encoder (VAE) based retraining methodology that uses uncertain predictions of the label of an image to adapt the weights of the network to the observed data distribution on-device is proposed. In doing so, the work in this PhD answers the questions of why we should aim to train a network on the edge, how we can select networks that fit within the available hardware resource constraints and how we could account for the uncertainty in labels that arises when we do not have access to ground-truth labels when training. We also propose possibilities for future research directions that could extend and adapt the ideas of this thesis to other applications.Open Acces

    Microfabricated Tools and Engineering Methods for Sensing Bioanalytes

    Get PDF
    There is a convergence between the needs of the medical community and the capabilities of the engineering community. For example, the scale of biomedical devices and sensors allow for finer, more cost-effective quantification of biological and chemical targets. By using micro-fabrication techniques, we design and demonstrate a variety of microfluidic sensors and actuators that allow us to interact with a biochemical environment. We demonstrate the performance of microfluidic blood-filtrations chips, immune-diagnostic assays, and evaporative coolers. Furthermore, we show how micro-fabricated platinum filaments can be used for highly localized heating and temperature measurement. We demonstrate that these filaments can be used as miniature IR spectroscopic sources. Finally, we describe and demonstrate novel combinatorial coding methods for increasing the information extracted from biochemical reactions. We show proof-principle of these techniques in the context of Taqman PCR as well as persistence length PCR

    Microscaled and nanoscaled platinum sensors

    Get PDF
    We show small and robust platinum resistive heaters and thermometers that are defined by microlithography on silicon substrates. These devices can be used for a wide range of applications, including thermal sensor arrays, programmable thermal sources, and even incandescent light emitters. To explore the miniaturization of such devices, we have developed microscaled and nanoscaled platinum resistor arrays with wire widths as small as 75 nm, fabricated lithographically to provide highly localized heating and accurate resistance (and hence temperature) measurements. We present some of these potential applications of microfabricated platinum resistors in sensing and spectroscopy

    Supercolor Coding Methods for Large-Scale Multiplexing of Biochemical Assays

    Get PDF
    We present a novel method for the encoding and decoding of multiplexed biochemical assays. The method enables a theoretically unlimited number of independent targets to be detected and uniquely identified in any combination in the same sample. For example, the method offers easy access to 12-plex and larger PCR assays, as contrasted to the current 4-plex assays. This advancement would allow for large panels of tests to be run simultaneously in the same sample, saving reagents, time, consumables, and manual labor, while also avoiding the traditional loss of sensitivity due to sample aliquoting. Thus, the presented method is a major technological breakthrough with far-reaching impact on biotechnology, biomedical science, and clinical diagnostics. Herein, we present the mathematical theory behind the method as well as its experimental proof of principle using Taqman PCR on sequences specific to infectious diseases

    GSA to HDL: Towards principled generation of dynamically scheduled circuits

    Full text link
    High-level synthesis (HLS) refers to the automatic translation of a software program written in a high-level language into a hardware design. Modern HLS tools have moved away from the traditional approach of static (compile time) scheduling of operations to generating dynamic circuits that schedule operations at run time. Such circuits trade-off area utilisation for increased dynamism and throughput. However, existing lowering flows in dynamically scheduled HLS tools rely on conservative assumptions on their input program due to both the intermediate representations (IR) utilised as well as the lack of formal specifications on the translation into hardware. These assumptions cause suboptimal hardware performance. In this work, we lift these assumptions by proposing a new and efficient abstraction for hardware mapping; namely h-GSA, an extension of the Gated Single Static Assignment (GSA) IR. Using this abstraction, we propose a lowering flow that transforms GSA into h-GSA and maps h-GSA into dynamically scheduled hardware circuits. We compare the schedules generated by our approach to those by the state-of-the-art dynamic-scheduling HLS tool, Dynamatic, and illustrate the potential performance improvement from hardware mapping using the proposed abstraction.Comment: Presented at the 19th International Summer School on Advanced Computer Architecture and Compilation for High-performance Embedded Systems (ACACES 2023

    A challenging case of fenofibrate induced neutropenia

    Get PDF
    Fenofibrate induced neutropenia is a rare condition. We reported 59-year-old male who developed neutropenia after taking fenofibrate for 20 days. He presented with complaints of fever, upper respiratory tract infection (URTI) symptoms. At presentation his total counts were 800. He was evaluated thoroughly for other causes of neutropenia and was treated with antibiotics, steroids, G-CSF. Once the fenofibrate was stopped, patient counts improved and his symptoms subsided. Hence, it is very important to always keep in mind the possibility of side effects of drugs, however remotely rare they are.
    corecore