33 research outputs found

    Deploying Deep Neural Networks in Edge with Distribution

    Get PDF
    The widespread applicability of deep neural networks (DNNs) has led edge computing to emerge as a trend to extend our capabilities to several domains such as robotics, autonomous technologies, and Internet-of-things devices. Because of the tight resource constraints of such individual edge devices, computing accurate predictions while providing a fast execution is a key challenge. Moreover, modern DNNs increasingly demand more computation power than their predecessors. As a result, the current approach is to rely on compute resources in the cloud by offloading the inference computations of DNNs. This approach not only does raise privacy concerns but also relies on network infrastructure and data centers that are not scalable and do not guarantee fast execution. My key insight is that edge devices can break their individual resource constraints by distributing the computation of DNNs on collaborating peer edge devices. In my approach, edge devices cooperate to conduct single-batch inferences in real-time while exploiting several model-parallelism methods. Nonetheless, since communication is costly and current DNN models capture a single-chain of dependency pattern, distributing and parallelizing the computations of current DNNs may not be an effective solution for edge domains. Therefore, to efficiently benefit from computing resources with low communication overhead, I propose new handcrafted edge-tailored models that consist of several independent and narrow DNNs. Additionally, I explore an automated neural architecture search methodology and propose custom DNN architectures with low communication overheads and high parallelization opportunities. Finally, to increase reliability, decrease susceptibility to short disconnectivity or losing a device, I propose a coded distributed computing recovery method that enables distributed DNN models on edge devices to tolerate failures and not lose time-sensitive and real-time information.Ph.D

    Domain-aware Genetic Algorithms for Hardware and Mapping Optimization for Efficient DNN Acceleration

    Get PDF
    The proliferation of AI across a variety of domains (vision, language, speech, recommendations, games) has led to the rise of domain-specific accelerators for deep learning. At design-time, these accelerators carefully architect the on-chip dataflow to maximize data reuse (over space and time) and size the hardware resources (PEs and buffers) to maximize performance and energy-efficiency, while meeting the chip’s area and power targets. At compile-time, the target Deep Neural Network (DNN) model is mapped over the accelerator. The mapping refers to tiling the computation and data (i.e., tensors) and scheduling them over the PEs and scratchpad buffers respectively, while honoring the microarchitectural constraints (number of PEs, buffer sizes, and dataflow). The design-space of valid hardware resource assignments for a given dataflow and the valid mappings for a given hardware is extremely large (~O(10^24)) per layer for state-of-the-art DNN models today. This makes exhaustive searches infeasible. Unfortunately, there can be orders of magnitude performance and energy-efficiency differences between an optimal and sub-optimal choice, making these decisions a crucial part of the entire design process. Moreover, manual tuning by domain experts become unprecedentedly challenged due to increased irregularity (due to neural architecture search) and sparsity of DNN models. This necessitate the existence of Map Space Exploration (MSE). In this thesis, our goal is to deliver a deep analysis of the MSE for DNN accelerators, propose different techniques to improve MSE, and generalize the MSE framework to a wider landscape (from mapping to HW-mapping co-exploration, from single-accelerator to multi-accelerator scheduling). As part of it, we discuss the correlation between hardware flexibility and the formed map space, formalized the map space representation by four mapping axes: tile, order, parallelism, and shape. Next, we develop dedicated exploration operators for these axes and use genetic algorithm framework to converge the solution. Next, we develop "sparsity-aware" technique to enable sparsity consideration in MSE and a "warm-start" technique to solve the search speed challenge commonly seen across learning-based search algorithms. Finally, we extend out MSE to support hardware and map space co-exploration and multi-accelerator scheduling.Ph.D

    Sensing and Signal Processing in Smart Healthcare

    Get PDF
    In the last decade, we have witnessed the rapid development of electronic technologies that are transforming our daily lives. Such technologies are often integrated with various sensors that facilitate the collection of human motion and physiological data and are equipped with wireless communication modules such as Bluetooth, radio frequency identification, and near-field communication. In smart healthcare applications, designing ergonomic and intuitive human–computer interfaces is crucial because a system that is not easy to use will create a huge obstacle to adoption and may significantly reduce the efficacy of the solution. Signal and data processing is another important consideration in smart healthcare applications because it must ensure high accuracy with a high level of confidence in order for the applications to be useful for clinicians in making diagnosis and treatment decisions. This Special Issue is a collection of 10 articles selected from a total of 26 contributions. These contributions span the areas of signal processing and smart healthcare systems mostly contributed by authors from Europe, including Italy, Spain, France, Portugal, Romania, Sweden, and Netherlands. Authors from China, Korea, Taiwan, Indonesia, and Ecuador are also included

    Machine Learning in Digital Signal Processing for Optical Transmission Systems

    Get PDF
    The future demand for digital information will exceed the capabilities of current optical communication systems, which are approaching their limits due to component and fiber intrinsic non-linear effects. Machine learning methods are promising to find new ways of leverage the available resources and to explore new solutions. Although, some of the machine learning methods such as adaptive non-linear filtering and probabilistic modeling are not novel in the field of telecommunication, enhanced powerful architecture designs together with increasing computing power make it possible to tackle more complex problems today. The methods presented in this work apply machine learning on optical communication systems with two main contributions. First, an unsupervised learning algorithm with embedded additive white Gaussian noise (AWGN) channel and appropriate power constraint is trained end-to-end, learning a geometric constellation shape for lowest bit-error rates over amplified and unamplified links. Second, supervised machine learning methods, especially deep neural networks with and without internal cyclical connections, are investigated to combat linear and non-linear inter-symbol interference (ISI) as well as colored noise effects introduced by the components and the fiber. On high-bandwidth coherent optical transmission setups their performances and complexities are experimentally evaluated and benchmarked against conventional digital signal processing (DSP) approaches. This thesis shows how machine learning can be applied to optical communication systems. In particular, it is demonstrated that machine learning is a viable designing and DSP tool to increase the capabilities of optical communication systems

    Deep Learning-Based Intrusion Detection Methods for Computer Networks and Privacy-Preserving Authentication Method for Vehicular Ad Hoc Networks

    Get PDF
    The incidence of computer network intrusions has significantly increased over the last decade, partially attributed to a thriving underground cyber-crime economy and the widespread availability of advanced tools for launching such attacks. To counter these attacks, researchers in both academia and industry have turned to machine learning (ML) techniques to develop Intrusion Detection Systems (IDSes) for computer networks. However, many of the datasets use to train ML classifiers for detecting intrusions are not balanced, with some classes having fewer samples than others. This can result in ML classifiers producing suboptimal results. In this dissertation, we address this issue and present better ML based solutions for intrusion detection. Our contributions in this direction can be summarized as follows: Balancing Data Using Synthetic Data to detect intrusions in Computer Networks: In the past, researchers addressed the issue of imbalanced data in datasets by using over-sampling and under-sampling techniques. In this study, we go beyond such traditional methods and utilize a synthetic data generation method called Con- ditional Generative Adversarial Network (CTGAN) to balance the datasets and in- vestigate its impact on the performance of widely used ML classifiers. To the best of our knowledge, no one else has used CTGAN to generate synthetic samples for balancing intrusion detection datasets. We use two widely used publicly available datasets and conduct extensive experiments and show that ML classifiers trained on these datasets balanced with synthetic samples generated by CTGAN have higher prediction accuracy and Matthew Correlation Coefficient (MCC) scores than those trained on imbalanced datasets by 8% and 13%, respectively. Deep Learning approach for intrusion detection using focal loss function: To overcome the data imbalance problem for intrusion detection, we leverage the specialized loss function, called focal loss, that automatically down-weighs easy ex- amples and focuses on the hard negatives by facilitating dynamically scaled-gradient updates for training ML models effectively. We implement our approach using two well-known Deep Learning (DL) neural network architectures. Compared to training DL models using cross-entropy loss function, our approach (training DL models using focal loss function) improved accuracy, precision, F1 score, and MCC score by 24%, 39%, 39%, and 60% respectively. Efficient Deep Learning approach to detect Intrusions using Few-shot Learning: To address the issue of imbalance the datasets and develop a highly effective IDS, we utilize the concept of few-shot learning. We present a Few-Shot and Self-Supervised learning framework, called FS3, for detecting intrusions in IoT networks. FS3 works in three phases. Our approach involves first pretraining an encoder on a large-scale external dataset in a selfsupervised manner. We then employ few-shot learning (FSL), which seeks to replicate the encoder’s ability to learn new patterns from only a few training examples. During the encoder training us- ing a small number of samples, we train them contrastively, utilizing the triplet loss function. The third phase introduces a novel K-Nearest neighbor algorithm that sub- samples the majority class instances to further reduce imbalance and improve overall performance. Our proposed framework FS3, utilizing only 20% of labeled data, out- performs fully supervised state-of-the-art models by up to 42.39% and 43.95% with respect to the metrics precision and F1 score, respectively. The rapid evolution of the automotive industry and advancements in wireless com- munication technologies will result in the widespread deployment of Vehicular ad hoc networks (VANETs). However, despite the network’s potential to enable intelligent and autonomous driving, it also introduces various attack vectors that can jeopardize its security. In this dissertation, we present efficient privacy-preserving authenticated message dissemination scheme in VANETs. Conditional Privacy-preserving Authentication and Message Dissemination Scheme using Timestamp based Pseudonyms: To authenticate a message sent by a vehicle using its pseudonym, a certificate of the pseudonym signed by the central authority is generally utilized. If a vehicle is found to be malicious, certificates associated with all the pseudonyms assigned to it must be revoked. Certificate revocation lists (CRLs) should be shared with all entities that will be corresponding with the vehicle. As each vehicle has a large pool of pseudonyms allocated to it, the CRL can quickly grow in size as the number of revoked vehicles increases. This results in high storage overheads for storing the CRL, and significant authentication overheads as the receivers must check their CRL for each message received to verify its pseudonym. To address this issue, we present a timestamp-based pseudonym allocation scheme that reduces the storage overhead and authentication overhead by streamlining the CRL management process
    corecore