31 research outputs found
Machine Learning Meets Internet of Things: From Theory to Practice
Standalone execution of problem-solving Artificial Intelligence (AI) on IoT devices produces a higher level of autonomy and privacy. This is because the sensitive user data collected by the devices need not be transmitted to the cloud for inference. The chipsets used to design IoT devices are resource-constrained due to their limited memory footprint, fewer computation cores, and low clock speeds. These limitations constrain one from deploying and executing complex problem-solving AI (usually an ML model) on IoT devices. Since there is a high potential for building intelligent IoT devices, in this tutorial, we teach researchers and developers; (i) How to deep compress CNNs and efficiently deploy on resource-constrained devices; (ii) How to efficiently port and execute ranking, regression, and classification problems solving ML classifiers on IoT devices; (iii) How to create ML-based self-learning devices that can locally re-train themselves on-the-fly using the unseen real-world data
Smart speaker design and implementation with biometric authentication and advanced voice interaction capability
Advancements in semiconductor technology have reduced dimensions and cost
while improving the performance and capacity of chipsets. In addition,
advancement in the AI frameworks and libraries brings possibilities to
accommodate more AI at the resource-constrained edge of consumer IoT devices.
Sensors are nowadays an integral part of our environment which provide
continuous data streams to build intelligent applications. An example could be
a smart home scenario with multiple interconnected devices. In such smart
environments, for convenience and quick access to web-based service and
personal information such as calendars, notes, emails, reminders, banking, etc,
users link third-party skills or skills from the Amazon store to their smart
speakers. Also, in current smart home scenarios, several smart home products
such as smart security cameras, video doorbells, smart plugs, smart carbon
monoxide monitors, and smart door locks, etc. are interlinked to a modern smart
speaker via means of custom skill addition. Since smart speakers are linked to
such services and devices via the smart speaker user's account. They can be
used by anyone with physical access to the smart speaker via voice commands. If
done so, the data privacy, home security and other aspects of the user get
compromised. Recently launched, Tensor Cam's AI Camera, Toshiba's Symbio,
Facebook's Portal are camera-enabled smart speakers with AI functionalities.
Although they are camera-enabled, yet they do not have an authentication scheme
in addition to calling out the wake-word. This paper provides an overview of
cybersecurity risks faced by smart speaker users due to lack of authentication
scheme and discusses the development of a state-of-the-art camera-enabled,
microphone array-based modern Alexa smart speaker prototype to address these
risks
Imbal-OL: Online Machine Learning from Imbalanced Data Streams in Real-world IoT
Typically a Neural Networks (NN) is trained on data
centers using historic datasets, then a C source file (model as a
char array) of the trained model is generated and flashed on IoT
devices. This standard process impedes the flexibility of billions of
deployed ML-powered devices as they cannot learn unseen/fresh
data patterns (static intelligence) and are impossible to adapt
to dynamic scenarios. Currently, to address this issue, Online
Machine Learning (OL) algorithms are deployed on IoT devices
that provide devices the ability to locally re-train themselves -
continuously updating the last few NN layers using unseen data
patterns encountered after deployment.
In OL, catastrophic forgetting is common when NNs are
trained using non-stationary data distribution. The majority of
recent work in the OL domain embraces the implicit assumption
that the distribution of local training data is balanced. But the
fact is, the sensor data streams in real-world IoT are severely
imbalanced and temporally correlated. This paper introduces
Imbal-OL, a resource-friendly technique that can be used as
an OL plugin to balance the size of classes in a range of data
streams. When Imbal-OL processed stream is used for OL, the
models can adapt faster to changes in the stream while parallelly
preventing catastrophic forgetting. Experimental evaluation of
Imbal-OL using CIFAR datasets over ResNet-18 demonstrates
its ability to deal with imperfect data streams, as it manages
to produce high-quality models even under challenging learning
setting
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural Networks Execution Approach
Edge analytics refers to the application of data analytics and Machine Learning (ML) algorithms on IoT devices. The concept of edge analytics is gaining popularity due to its ability to perform AI-based analytics at the device level, enabling autonomous decision-making, without depending on the cloud. However, the majority of Internet of Things (IoT) devices are embedded systems with a low-cost microcontroller unit (MCU) or a small CPU as its brain, which often are incapable of handling complex ML algorithms.
In this paper, we propose an approach for the ecient execution of already deeply compressed, large neural networks (NNs) on tiny IoT devices. After optimizing NNs using state-of-the-art deep model compression methods, when the resultant models are executed by MCUs or small CPUs using the model execution sequence produced by our approach, higher levels of conserved SRAM can be achieved. During the evaluation for nine popular models, when comparing the default NN execution sequence with the sequence produced by our approach, we found that 1.61-38.06% less SRAM was used to produce inference results, the inference time was reduced by 0.28-4.9 ms, and energy consumption was reduced by 4-84 mJ. Despite achieving such high conserved levels of SRAM, our method 100% preserved the accuracy, F1 score, etc. (model performance)
Demo Abstract: Porting and Execution of Anomalies Detection Models on Embedded Systems in IoT
In the Industry 4.0 era, Microcontrollers (MCUs) based tiny embedded sensor systems have become the sensing paradigm to interact with the physical world. In 2020, 25.6 billion MCUs were shipped, and over 250 billion MCUs are already operating in the wild. Such low-power, low-cost MCUs are being used as the brain to control diverse applications and soon will become the global digital nervous system. In an Industrial IoT setup, such tiny MCU-based embedded systems are equipped with anomaly detection models and mounted on production plant machines for monitoring the machine’s health/condition. These models process the machine’s health data (from temperature, RPM, vibration sensors) and raise timely alerts when it predicts/detects data patterns that show deviations from the normal operation state.
In this demo, we train One Class Support Vector Machines (OCSVM) based anomaly detection models and port the trained models to their MCU executable versions. We then deploy and execute the ported models on 4 popular MCUs and report their on-board inference performance along with their memory (Flash and SRAM) consumption. The steps/procedure that we show in the demo is generic, and the viewers can use it to efficiently port a wide variety of datasets-trained classifiers and execute them on different resource-constrained MCU and small CPU-based devices
OWSNet: Towards Real-time Offensive Words Spotting Network for Consumer IoT Devices
Every modern household owns at least a dozen of IoT devices like smart speakers, video doorbells, smartwatches, where most of them are equipped with a Keyword spotting(KWS) system-based digital voice assistant like Alexa. The state-of-the-art KWS systems require a large number of operations, higher computation, memory resources to show top performance. In this paper, in contrast to existing resource-demanding KWS systems, we propose a light-weight temporal convolution based KWS system named OWSNet, that can comfortably execute on a variety of IoT devices around us and can accurately spot multiple keywords in real-time without disturbing the device\u27s routine functionalities.
When OWSNet is deployed on consumer IoT devices placed in the workplace, home, etc., in addition to spotting wake/trigger words like `Hey Siri\u27, `Alexa\u27, it can also accurately spot offensive words in real-time. If regular wake words are spotted, it activates the voice assistant; else if offensive words are spotted, it starts to capture and stream audio data to speech analytics APIs for autonomous threat and insecurities detection in the scene. The evaluation results show that the OWSNet is faster than state-of-the-art models as it produced ~ 1-74 times faster inference on Raspberry Pi 4 and ~ 1-12 times faster inference on NVIDIA Jetson Nano. In this paper, to optimize IoT use-case models like OWSNet, we present a generic multi-component ML model optimization sequence that can reduce the memory and computation demands of a wide range of ML models thus enabling their execution on low resource, cost, power IoT devices
OTA-TinyML: Over the air deployment of TinyML models and execution on IoT devices
This article presents a novel over-the-air (OTA) technique to remotely deploy tiny ML models over Internet of Things (IoT) devices and perform tasks, such as machine learning (ML) model updates, firmware reflashing, reconfiguration, or repurposing. We discuss relevant challenges for OTA ML deployment over IoT both at the scientific and engineering level. We propose OTA-TinyML to enable resource-constrained IoT devices to perform end-to-end fetching, storage, and execution of many TinyML models. OTA-TinyML loads the C source file of ML models from a web server into the embedded IoT devices via HTTPS. OTA-TinyML is tested by performing remote fetching of six types of ML models, storing them on four types of memory units, then loading and executing on seven popular MCU boards
Toward distributed, global, deep learning using IoT devices
Deep learning (DL) using large scale, high-quality IoT datasets can be computationally expensive. Utilizing such datasets to produce a problem-solving model within a reasonable time frame requires a scalable distributed training platform/system. We present a novel approach where to train one DL model on the hardware of thousands of mid-sized IoT devices across the world, rather than the use of GPU cluster available within a data center. We analyze the scalability and model convergence of the subsequently generated model, identify three bottlenecks that are: high computational operations, time consuming dataset loading I/O, and the slow exchange of model gradients. To highlight research challenges for globally distributed DL training and classification, we consider a case study from the video data processing domain. A need for a two-step deep compression method, which increases the training speed and scalability of DL training processing, is also outlined. Our initial experimental validation shows that the proposed method is able to improve the tolerance of the distributed training process to varying internet bandwidth, latency, and Quality of Service metrics