268 research outputs found

    Design and Implementation of a Domain Specific Language for Deep Learning

    Get PDF
    \textit {Deep Learning} (DL) has found great success in well-diversified areas such as machine vision, speech recognition, big data analysis, and multimedia understanding recently. However, the existing state-of-the-art DL frameworks, e.g. Caffe2, Theano, TensorFlow, MxNet, Torch7, and CNTK, are programming libraries with fixed user interfaces, internal representations, and execution environments. Modifying the code of DL layers or data structure is very challenging without in-depth understanding of the underlying implementation. The optimization of the code and execution in these tools is often limited and relies on the specific DL computation graph manipulation and scheduling that lack systematic and universal strategies. Furthermore, most of these tools demand many dependencies beside the tool itself and require to be built to some specific platforms for DL training or inference. \\\\ \noindent This dissertation presents {\it DeepDSL}, a \textit {domain specific language} (DSL) embedded in Scala, that compiles DL networks encoded with DeepDSL to efficient, compact, and portable Java source programs for DL training and inference. DeepDSL represents DL networks as abstract tensor functions, performs symbolic gradient derivations to generate the Intermediate Representation (IR), optimizes the IR expressions, and compiles the optimized IR expressions to cross-platform Java code that is easily modifiable and debuggable. Also, the code directly runs on GPU without additional dependencies except a small set of \textit{JNI} (Java Native Interface) wrappers for invoking the underneath GPU libraries. Moreover, DeepDSL provides static analysis for memory consumption and error detection. \\\\ \noindent DeepDSL\footnote{Our previous results are reported in~\cite{zhao2017}; design and implementation details are summarized in~\cite{Zhao2018}.} has been evaluated with many current state-of-the-art DL networks (e.g. Alexnet, GoogleNet, VGG, Overfeat, and Deep Residual Network). While the DSL code is highly compact with less than 100 lines for each of the network, the Java source code generated by the DeepDSL compiler is highly efficient. Our experiments show that the output java source has very competitive runtime performance and memory efficiency compared to the existing DL frameworks

    Autonomous Probabilistic Coprocessing with Petaflips per Second

    Full text link
    In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary stochastic neuron, but with one key difference: there is no sequencer used to enforce an ordering of p-bit updates, as is typically required. Instead, we explore \textit{sequencerless} designs where all p-bits are allowed to flip autonomously and demonstrate that such designs can allow ultrafast operation unconstrained by available clock speeds without compromising the solution's fidelity. Based on experimental results from a hardware benchmark of the autonomous design and benchmarked device models, we project that a nanomagnetic implementation can scale to achieve petaflips per second with millions of neurons. A key contribution of this paper is the focus on a hardware metric −- flips per second −- as a problem and substrate-independent figure-of-merit for an emerging class of hardware annealers known as Ising Machines. Much like the shrinking feature sizes of transistors that have continually driven Moore's Law, we believe that flips per second can be continually improved in later technology generations of a wide class of probabilistic, domain specific hardware.Comment: 13 pages, 8 figures, 1 tabl

    Reactive Probabilistic Programming for Scalable Bayesian Inference

    Get PDF

    PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES

    Get PDF
    Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations. The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation. The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users. The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems

    PLC & SCADA based substation automation

    Get PDF
    lectrical power systems are a technical wonder. Electricity and its accessibility are the\ud greatest engineering achievements of the 20th century. A modern society cannot exist without electricity.\ud Generating stations, transmission lines and distribution systems are the main components of\ud power system. Smaller power systems (called regional grids) are interconnected to form a larger network\ud called national grid, in which power is exchanged between different areas depending upon surplus and\ud deficiency. This requires a knowledge of load flows, which is impossible without meticulous planning and\ud monitoring .Also, the system needs to operate in such a way that the losses and in turn the cost of\ud production are minimum.\ud The major factors that influence the operation of a power system are the changes in load and\ud stability. As is easily understood from the different load curves and load duration curve, the connected\ud load, load varies widely throughout the day. These changes have an impact on the stability of power\ud system. As a severe change in a short span can even lead to loss of synchronism. Stability is also affected\ud by the occurrence of faults, Faults need to be intercepted at an easily stage and corrective measures like\ud isolating the faulty line must be taken.\ud As the power consumption increases globally, unprecedented challenges are being faced,\ud which require modern, sophisticated methods to counter them. This calls for the use of automation in the\ud power system. The Supervisory Control and Data Acquisition (SCADA) and Programmable Logic\ud Controllers (PLC) are an answer to this.\ud SCADA refers to a system that enables on electricity utility to remotely monitor, co-ordinate,\ud control and operate transmission and distribution components, equipment and real-time mode from a\ud remote location with acquisition at date for analysis and planning from one control location.\ud PLC on the other hand is like the brain of the system with the joint operation of the SCADA\ud and the PLC, it is possible to control and operate the power system remotely. Task like\ud Opening of circuit breakers, changing transformer taps and managing the load demand can be carried out\ud efficiently.\ud This type of an automatic network can manage load, maintain quality, detect theft of\ud electricity and tempering of meters. It gives the operator an overall view of the entire network. Also, flow\ud of power can be closely scrutinized and Pilferage points can be located. Human errors leading to tripping\ud can be eliminated. This directly increases the reliability and lowers the operating cost.\ud In short our project is an integration of network monitoring functions with geographical\ud mapping, fault location, load management and intelligent metering

    Towards Developing Computer Vision Algorithms and Architectures for Real-world Applications

    Get PDF
    abstract: Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for object segmentation and feature extraction for objects and actions recognition in video data, and sparse feature selection algorithms for medical image analysis, as well as automated feature extraction using convolutional neural network for blood cancer grading. To detect and classify objects in video, the objects have to be separated from the background, and then the discriminant features are extracted from the region of interest before feeding to a classifier. Effective object segmentation and feature extraction are often application specific, and posing major challenges for object detection and classification tasks. In this dissertation, we address effective object flow based ROI generation algorithm for segmenting moving objects in video data, which can be applied in surveillance and self driving vehicle areas. Optical flow can also be used as features in human action recognition algorithm, and we present using optical flow feature in pre-trained convolutional neural network to improve performance of human action recognition algorithms. Both algorithms outperform the state-of-the-arts at their time. Medical images and videos pose unique challenges for image understanding mainly due to the fact that the tissues and cells are often irregularly shaped, colored, and textured, and hand selecting most discriminant features is often difficult, thus an automated feature selection method is desired. Sparse learning is a technique to extract the most discriminant and representative features from raw visual data. However, sparse learning with \textit{L1} regularization only takes the sparsity in feature dimension into consideration; we improve the algorithm so it selects the type of features as well; less important or noisy feature types are entirely removed from the feature set. We demonstrate this algorithm to analyze the endoscopy images to detect unhealthy abnormalities in esophagus and stomach, such as ulcer and cancer. Besides sparsity constraint, other application specific constraints and prior knowledge may also need to be incorporated in the loss function in sparse learning to obtain the desired results. We demonstrate how to incorporate similar-inhibition constraint, gaze and attention prior in sparse dictionary selection for gastroscopic video summarization that enable intelligent key frame extraction from gastroscopic video data. With recent advancement in multi-layer neural networks, the automatic end-to-end feature learning becomes feasible. Convolutional neural network mimics the mammal visual cortex and can extract most discriminant features automatically from training samples. We present using convolutinal neural network with hierarchical classifier to grade the severity of Follicular Lymphoma, a type of blood cancer, and it reaches 91\% accuracy, on par with analysis by expert pathologists. Developing real world computer vision applications is more than just developing core vision algorithms to extract and understand information from visual data; it is also subject to many practical requirements and constraints, such as hardware and computing infrastructure, cost, robustness to lighting changes and deformation, ease of use and deployment, etc.The general processing pipeline and system architecture for the computer vision based applications share many similar design principles and architecture. We developed common processing components and a generic framework for computer vision application, and a versatile scale adaptive template matching algorithm for object detection. We demonstrate the design principle and best practices by developing and deploying a complete computer vision application in real life, building a multi-channel water level monitoring system, where the techniques and design methodology can be generalized to other real life applications. The general software engineering principles, such as modularity, abstraction, robust to requirement change, generality, etc., are all demonstrated in this research.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    ContrÎle et supervision du procédé d'électrolyse de l'aluminium par systÚme expert = [Control and supervision of the aluminium electrolysis process with expert system]

    Get PDF
    La cuve d'Ă©lectrolyse est l'Ă©lĂ©ment central dans la rĂ©duction de l'aluminium. En dĂ©pit des systĂšmes de contrĂŽle automatique appliquĂ©s sur l'opĂ©ration des cuves, une quantitĂ© significative d'informations sur leurs Ă©tats n'est pas encore utilisĂ©e dans le processus dĂ©cisionnel. D'ailleurs, la qualitĂ© de dĂ©cision dĂ©pend bien souvent de l'opĂ©rateur responsable. Ce systĂšme expert Ă  base de rĂšgles Ă  deux niveaux est construit Ă  partir de l'expertise disponible des opĂ©rateurs et de celle des ingĂ©nieurs du procĂ©dĂ© d'Ă©lectrolyse. Ce systĂšme est conçu pour diagnostiquer autant les cuves de type gĂ©nĂ©ral que celles de types particuliers. De plus, il peut fonctionner en mode autonome tout en utilisant des donnĂ©es d'entrĂ©es locales Ă  la station de travail ou en mode rĂ©seau en utilisant les donnĂ©es du procĂ©dĂ© rĂ©el comme valeur d'entrĂ©e. Dans l'architecture rĂ©seau, le procĂ©dĂ© rĂ©el peut ĂȘtre remplacĂ© par un simulateur de cuve (un modĂšle mathĂ©matique) utilisant des mĂ©canismes de transfert d'information semblable au systĂšme d'acquisition de donnĂ©es des procĂ©dĂ©s en temps rĂ©el. Cela permet de tester explicitement les tĂąches du systĂšme expert sur la surveillance du procĂ©dĂ© et ses alarmes. L'agencement a comme objectif de proposer une aide aux opĂ©rateurs pour crĂ©er des analyses dĂ©taillĂ©es de l'Ă©tat des cuves, de dĂ©tecter la prĂ©sence de dĂ©faut dans le procĂ©dĂ©, de faire l'analyse de tendance et de proposer l'affectation de cible Ă  long terme. Il peut Ă©galement ĂȘtre prĂ©sentĂ© Ă  l'ingĂ©nieur de contrĂŽle comme rĂ©fĂ©rence pour le rĂ©glage de points de consigne sur certains rĂ©gulateurs pertinents. L'architecture de la base de connaissance est conçue de maniĂšre Ă  permettre la distribution de l'application aux divers types de cuves afin de simplifier la mise Ă  jour Ă©ventuelle du systĂšme. C'est pour cette raison que la structure de la base de connaissances et la stratĂ©gie de raisonnement sont conçues avec des caractĂšres uniques. Cette thĂšse fournit l'ensemble de la connaissance saisie au sujet du procĂ©dĂ© d'Ă©lectrolyse de l'aluminium et des secteurs appropriĂ©s. Celle-ci comprend la connaissance gĂ©nĂ©rale du domaine pour l'ingĂ©nierie cognitive aussi bien que la connaissance spĂ©ciale pour les types particuliers de cuves. Elle dĂ©crit Ă©galement la construction du systĂšme expert et montre quelques exemples accompagnĂ©s de discussions dĂ©taillĂ©es sur diffĂ©rents cas de diagnostique

    Leveraging NFV heterogeneity at the network edge

    Get PDF
    With network function virtualisation (NFV) and network programmability, network functions (NFs) such as firewalls, traffic load balancers, content filters, and intrusion detection systems (IDS) are virtualized and either instantiated on user space hosts using virtual machines (VMs), lightweight containers, or in the network data plane using programmable switching technology such as P4 or offloaded onto Smart network interface cards (NICs) – often chained together to create a service function chain (SFC), based on defined service level agreement (SLA). The need to leverage heterogeneous programmable platforms to support the in-network acceleration of functions keeps growing as emerging use cases come with peculiar requirements. This thesis identifies various heterogeneous frameworks for deploying virtual network functions that network operators can leverage in service provider networks. A novel taxonomy that provides network operators and the wider research community valuable insights is proposed. The thesis presents the performance gains obtained from using heterogeneous frameworks for deploying virtual network functions using real testbeds. In addition, this thesis investigates the optimal placement of vNFs over the distributed edge network while considering the heterogeneity of packet processing elements. In particular, the work questions the status quo of how vNFs are currently being deployed, i.e., the lack of frameworks to support the seamless deployment of vNFs that are implemented on diverse packet processing platforms – leveraging the capability of the programmable network data plane. In response, the thesis presents a novel integer linear programming (ILP) model for the hybrid placement of diverse network functions that leverages the heterogeneity of the network data plane and the abundant processing capability of user space hosts, with the objective function of minimizing end-to-end latency for vNF placement. A novel hybrid placement heuristic algorithm, HYPHA, is also proposed to find a quick, efficient solution to the hybrid vNF placement problem. Using optimal stopping theory (OST) principles, an optimal placement scheduling model is presented to handle dynamic edge placement scenarios. The results in this work demonstrate that employing a hybrid deployment scheme that leverages the processing capability of the network data plane yields minimal user-tovNF latency and overall end-to-end latency while fulfilling the placement of a diverse set of user requests from emerging use cases to speed up service delivery by network operators. The results also show that network operators can leverage the high-speed, low-latency feature of data plane packet processing elements for hosting delay-sensitive applications and improving service delivery for subscribed users. It is shown that the proposed hybrid heuristic algorithm can obtain near-optimal vNF mapping while incurring fewer latency threshold violations set by network operators. Furthermore, in addition to emerging edge use cases, the placement solution presented in this thesis can be adapted to place network functions efficiently in core network infrastructure while leveraging the heterogeneity of servers. The dynamic placement scheduler also minimises the number of latency violations and vNF migrations between heterogeneous hosts based on SLAs set by network operators

    Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems

    Get PDF
    Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency
    • 

    corecore