33 research outputs found
Enabling on-device domain adaptation of convolutional neural networks
Convolutional Neural Networks (CNN) are used ubiquitously in computer vision applications ranging from image classification to video-stream object detection. However due to the large memory and compute costs of executing CNNs, specialised hardware such as GPUs or ASICs are required to perform both CNN inference and training within reasonable time and memory budgets. Consequently, most applications today perform both CNN inference and training on servers where user data is sent from an edge device back to a server to process. This raises data privacy concerns and places a strict necessity for good edge-server communication links. Recently, with improvements in the specialised hardware (especially GPUs) available on edge
devices, an increased number of applications have moved the inference stage onto the edge, but few to none have considered performing training on an edge device. With a focus on CNNs used for image classification, the work in this PhD explores when it would be useful to perform retraining of networks on an edge device, what the gains would be of doing so and
how one can perform such training even in resource constrained settings.
This exploration begins with the assumption that the classes observed by the model upon deployment is a subset of the classes present in the dataset used to train the model initially.
This scenario is simulated by constructing semantically meaningful subsets of classes from existing large image classification datasets (eg. ImageNet) and exploring the gains, in terms
of classification accuracy and the memory consumption and latency of the inference and training
stages, that can be achieved by pruning (architecture modification) and retraining (weights
adaptation) a deployed network to the observed class distribution.
The exploration is split into three stages. First, an oracle is constructed that predicts the gains that can be achieved by pruning and retraining a network under the assumption that we know the exact label of each image observed upon deployment and do not have any hardware resource constraints. This demonstrates the accuracy and performance gains that can theoretically be achieved per network and subset combination. The significant gains demonstrated here for certain subsets of data motivate the remainder of the work in this PhD. The works that follow explore ways to perform such adaptation on hardware that is resource constrained and also when there is uncertainty in the labels of the observed data-points that are used to perform this adaptation.
Pruning was utilised as a method to enable training to be performed on resource constraint hardware by reducing the memory and latency footprints of the training process. When doing so, it was observed that depending on the manner in which a network is pruned, a set of
networks that all consume the same amount of memory for storing weights, can each have drastically different latencies and memory consumptions while performing training. Hence, the size of a stored model is not a useful predictor of which networks can be feasibly trained within edge hardware resource budgets. To cater for this, a novel, accurate and data-driven model for predicting the training memory consumption and latency of a network on a specific target hardware and execution framework (PyTorch, Tensorflow, etc.) combination is proposed. Doing so enables the selection of a pruned network, whose memory consumption and latency of
training fits within the available memory and latency budgets that are dictated by the target
hardware and application. This then allows for the network to be adapted to the observed data distribution. An additional benefit of using the proposed data-driven model is that it allows to rapidly create new models specific to each network, hardware and execution framework combination.
Finally, the analysis is extended to account for uncertainty in the class labels of the observed
data distribution. This uncertainty in the label distribution can negatively impact any attempts to retrain the network. To combat this, a novel Variational Auto-Encoder (VAE) based retraining methodology that uses uncertain predictions of the label of an image to adapt the weights of the network to the observed data distribution on-device is proposed.
In doing so, the work in this PhD answers the questions of why we should aim to train a network on the edge, how we can select networks that fit within the available hardware resource constraints and how we could account for the uncertainty in labels that arises when we do not have access to ground-truth labels when training. We also propose possibilities for future research directions that could extend and adapt the ideas of this thesis to other applications.Open Acces
Microfabricated Tools and Engineering Methods for Sensing Bioanalytes
There is a convergence between the needs of the medical community and the capabilities of the engineering community. For example, the scale of biomedical devices and sensors allow for finer, more cost-effective quantification of biological and chemical targets. By using micro-fabrication techniques, we design and demonstrate a variety of microfluidic sensors and actuators that allow us to interact with a biochemical environment. We demonstrate the performance of microfluidic blood-filtrations chips, immune-diagnostic assays, and evaporative coolers. Furthermore, we show how micro-fabricated platinum filaments can be used for highly localized heating and temperature measurement. We demonstrate that these filaments can be used as miniature IR spectroscopic sources. Finally, we describe and demonstrate novel combinatorial coding methods for increasing the information extracted from biochemical reactions. We show proof-principle of these techniques in the context of Taqman PCR as well as persistence length PCR
Microscaled and nanoscaled platinum sensors
We show small and robust platinum resistive heaters and thermometers that are defined by microlithography on silicon substrates. These devices can be used for a wide range of applications, including thermal sensor arrays, programmable thermal sources, and even incandescent light emitters. To explore the miniaturization of such devices, we have developed microscaled and nanoscaled platinum resistor arrays with wire widths as small as 75 nm, fabricated lithographically to provide highly localized heating and accurate resistance (and hence temperature) measurements. We present some of these potential applications of microfabricated platinum resistors in sensing and spectroscopy
Supercolor Coding Methods for Large-Scale Multiplexing of Biochemical Assays
We present a novel method for the encoding and decoding of multiplexed biochemical assays. The method enables a theoretically unlimited number of independent targets to be detected and uniquely identified in any combination in the same sample. For example, the method offers easy access to 12-plex and larger PCR assays, as contrasted to the current 4-plex assays. This advancement would allow for large panels of tests to be run simultaneously in the same sample, saving reagents, time, consumables, and manual labor, while also avoiding the traditional loss of sensitivity due to sample aliquoting. Thus, the presented method is a major technological breakthrough with far-reaching impact on biotechnology, biomedical science, and clinical diagnostics. Herein, we present the mathematical theory behind the method as well as its experimental proof of principle using Taqman PCR on sequences specific to infectious diseases
GSA to HDL: Towards principled generation of dynamically scheduled circuits
High-level synthesis (HLS) refers to the automatic translation of a software
program written in a high-level language into a hardware design. Modern HLS
tools have moved away from the traditional approach of static (compile time)
scheduling of operations to generating dynamic circuits that schedule
operations at run time. Such circuits trade-off area utilisation for increased
dynamism and throughput. However, existing lowering flows in dynamically
scheduled HLS tools rely on conservative assumptions on their input program due
to both the intermediate representations (IR) utilised as well as the lack of
formal specifications on the translation into hardware. These assumptions cause
suboptimal hardware performance. In this work, we lift these assumptions by
proposing a new and efficient abstraction for hardware mapping; namely h-GSA,
an extension of the Gated Single Static Assignment (GSA) IR. Using this
abstraction, we propose a lowering flow that transforms GSA into h-GSA and maps
h-GSA into dynamically scheduled hardware circuits. We compare the schedules
generated by our approach to those by the state-of-the-art dynamic-scheduling
HLS tool, Dynamatic, and illustrate the potential performance improvement from
hardware mapping using the proposed abstraction.Comment: Presented at the 19th International Summer School on Advanced
Computer Architecture and Compilation for High-performance Embedded Systems
(ACACES 2023
A challenging case of fenofibrate induced neutropenia
Fenofibrate induced neutropenia is a rare condition. We reported 59-year-old male who developed neutropenia after taking fenofibrate for 20 days. He presented with complaints of fever, upper respiratory tract infection (URTI) symptoms. At presentation his total counts were 800. He was evaluated thoroughly for other causes of neutropenia and was treated with antibiotics, steroids, G-CSF. Once the fenofibrate was stopped, patient counts improved and his symptoms subsided. Hence, it is very important to always keep in mind the possibility of side effects of drugs, however remotely rare they are.