Search CORE

268 research outputs found

Design and Implementation of a Domain Specific Language for Deep Learning

Author: Huang Xiao Bing
Publication venue: UWM Digital Commons
Publication date: 01/05/2018
Field of study

\textit {Deep Learning} (DL) has found great success in well-diversified areas such as machine vision, speech recognition, big data analysis, and multimedia understanding recently. However, the existing state-of-the-art DL frameworks, e.g. Caffe2, Theano, TensorFlow, MxNet, Torch7, and CNTK, are programming libraries with fixed user interfaces, internal representations, and execution environments. Modifying the code of DL layers or data structure is very challenging without in-depth understanding of the underlying implementation. The optimization of the code and execution in these tools is often limited and relies on the specific DL computation graph manipulation and scheduling that lack systematic and universal strategies. Furthermore, most of these tools demand many dependencies beside the tool itself and require to be built to some specific platforms for DL training or inference. \\\\ \noindent This dissertation presents {\it DeepDSL}, a \textit {domain specific language} (DSL) embedded in Scala, that compiles DL networks encoded with DeepDSL to efficient, compact, and portable Java source programs for DL training and inference. DeepDSL represents DL networks as abstract tensor functions, performs symbolic gradient derivations to generate the Intermediate Representation (IR), optimizes the IR expressions, and compiles the optimized IR expressions to cross-platform Java code that is easily modifiable and debuggable. Also, the code directly runs on GPU without additional dependencies except a small set of \textit{JNI} (Java Native Interface) wrappers for invoking the underneath GPU libraries. Moreover, DeepDSL provides static analysis for memory consumption and error detection. \\\\ \noindent DeepDSL\footnote{Our previous results are reported in~\cite{zhao2017}; design and implementation details are summarized in~\cite{Zhao2018}.} has been evaluated with many current state-of-the-art DL networks (e.g. Alexnet, GoogleNet, VGG, Overfeat, and Deep Residual Network). While the DSL code is highly compact with less than 100 lines for each of the network, the Java source code generated by the DeepDSL compiler is highly efficient. Our experiments show that the output java source has very competitive runtime performance and memory efficiency compared to the existing DL frameworks

University of Wisconsin-Milwaukee

Autonomous Probabilistic Coprocessing with Petaflips per Second

Author: Camsari Kerem Y.
Datta Supriyo
Faria Rafatul
Ghantasala Lakshmi A.
Jaiswal Risi
Sutton Brian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary stochastic neuron, but with one key difference: there is no sequencer used to enforce an ordering of p-bit updates, as is typically required. Instead, we explore \textit{sequencerless} designs where all p-bits are allowed to flip autonomously and demonstrate that such designs can allow ultrafast operation unconstrained by available clock speeds without compromising the solution's fidelity. Based on experimental results from a hardware benchmark of the autonomous design and benchmarked device models, we project that a nanomagnetic implementation can scale to achieve petaflips per second with millions of neurons. A key contribution of this paper is the focus on a hardware metric

-

flips per second

-

as a problem and substrate-independent figure-of-merit for an emerging class of hardware annealers known as Ising Machines. Much like the shrinking feature sizes of transistors that have continually driven Moore's Law, we believe that flips per second can be continually improved in later technology generations of a wide class of probabilistic, domain specific hardware.Comment: 13 pages, 8 figures, 1 tabl

arXiv.org e-Print Archive

eScholarship - University of California

Reactive Probabilistic Programming for Scalable Bayesian Inference

Author: Bagaev Dmitry Vladislavovich
Publication venue: Eindhoven University of Technology
Publication date: 19/12/2023
Field of study

Pure OAI Repository

PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES

Author: Cui Longyin
Publication venue: UKnowledge
Publication date: 01/01/2023
Field of study

Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations. The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation. The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users. The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems

University of Kentucky

PLC & SCADA based substation automation

Author: Ansari Sameer (12EE76)
Chitapure Sirajuddin (12EE75)
Kondkari Faizan Ahmed (12EE85)
Patel Iftekar
Shaikh Adnan (12EE82)
Publication venue: AIKTC
Publication date
Field of study

lectrical power systems are a technical wonder. Electricity and its accessibility are the\ud greatest engineering achievements of the 20th century. A modern society cannot exist without electricity.\ud Generating stations, transmission lines and distribution systems are the main components of\ud power system. Smaller power systems (called regional grids) are interconnected to form a larger network\ud called national grid, in which power is exchanged between different areas depending upon surplus and\ud deficiency. This requires a knowledge of load flows, which is impossible without meticulous planning and\ud monitoring .Also, the system needs to operate in such a way that the losses and in turn the cost of\ud production are minimum.\ud The major factors that influence the operation of a power system are the changes in load and\ud stability. As is easily understood from the different load curves and load duration curve, the connected\ud load, load varies widely throughout the day. These changes have an impact on the stability of power\ud system. As a severe change in a short span can even lead to loss of synchronism. Stability is also affected\ud by the occurrence of faults, Faults need to be intercepted at an easily stage and corrective measures like\ud isolating the faulty line must be taken.\ud As the power consumption increases globally, unprecedented challenges are being faced,\ud which require modern, sophisticated methods to counter them. This calls for the use of automation in the\ud power system. The Supervisory Control and Data Acquisition (SCADA) and Programmable Logic\ud Controllers (PLC) are an answer to this.\ud SCADA refers to a system that enables on electricity utility to remotely monitor, co-ordinate,\ud control and operate transmission and distribution components, equipment and real-time mode from a\ud remote location with acquisition at date for analysis and planning from one control location.\ud PLC on the other hand is like the brain of the system with the joint operation of the SCADA\ud and the PLC, it is possible to control and operate the power system remotely. Task like\ud Opening of circuit breakers, changing transformer taps and managing the load demand can be carried out\ud efficiently.\ud This type of an automatic network can manage load, maintain quality, detect theft of\ud electricity and tempering of meters. It gives the operator an overall view of the entire network. Also, flow\ud of power can be closely scrutinized and Pilferage points can be located. Human errors leading to tripping\ud can be eliminated. This directly increases the reliability and lowers the operating cost.\ud In short our project is an integration of network monitoring functions with geographical\ud mapping, fault location, load management and intelligent metering

Towards Developing Computer Vision Algorithms and Architectures for Real-world Applications

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for object segmentation and feature extraction for objects and actions recognition in video data, and sparse feature selection algorithms for medical image analysis, as well as automated feature extraction using convolutional neural network for blood cancer grading. To detect and classify objects in video, the objects have to be separated from the background, and then the discriminant features are extracted from the region of interest before feeding to a classifier. Effective object segmentation and feature extraction are often application specific, and posing major challenges for object detection and classification tasks. In this dissertation, we address effective object flow based ROI generation algorithm for segmenting moving objects in video data, which can be applied in surveillance and self driving vehicle areas. Optical flow can also be used as features in human action recognition algorithm, and we present using optical flow feature in pre-trained convolutional neural network to improve performance of human action recognition algorithms. Both algorithms outperform the state-of-the-arts at their time. Medical images and videos pose unique challenges for image understanding mainly due to the fact that the tissues and cells are often irregularly shaped, colored, and textured, and hand selecting most discriminant features is often difficult, thus an automated feature selection method is desired. Sparse learning is a technique to extract the most discriminant and representative features from raw visual data. However, sparse learning with \textit{L1} regularization only takes the sparsity in feature dimension into consideration; we improve the algorithm so it selects the type of features as well; less important or noisy feature types are entirely removed from the feature set. We demonstrate this algorithm to analyze the endoscopy images to detect unhealthy abnormalities in esophagus and stomach, such as ulcer and cancer. Besides sparsity constraint, other application specific constraints and prior knowledge may also need to be incorporated in the loss function in sparse learning to obtain the desired results. We demonstrate how to incorporate similar-inhibition constraint, gaze and attention prior in sparse dictionary selection for gastroscopic video summarization that enable intelligent key frame extraction from gastroscopic video data. With recent advancement in multi-layer neural networks, the automatic end-to-end feature learning becomes feasible. Convolutional neural network mimics the mammal visual cortex and can extract most discriminant features automatically from training samples. We present using convolutinal neural network with hierarchical classifier to grade the severity of Follicular Lymphoma, a type of blood cancer, and it reaches 91\% accuracy, on par with analysis by expert pathologists. Developing real world computer vision applications is more than just developing core vision algorithms to extract and understand information from visual data; it is also subject to many practical requirements and constraints, such as hardware and computing infrastructure, cost, robustness to lighting changes and deformation, ease of use and deployment, etc.The general processing pipeline and system architecture for the computer vision based applications share many similar design principles and architecture. We developed common processing components and a generic framework for computer vision application, and a versatile scale adaptive template matching algorithm for object detection. We demonstrate the design principle and best practices by developing and deploying a complete computer vision application in real life, building a multi-channel water level monitoring system, where the techniques and design methodology can be generalized to other real life applications. The general software engineering principles, such as modularity, abstraction, robust to requirement change, generality, etc., are all demonstrated in this research.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Contrôle et supervision du procédé d'électrolyse de l'aluminium par système expert = [Control and supervision of the aluminium electrolysis process with expert system]

Author: Lu Song Ping
Publication venue: 'Universite du Quebec a Chicoutimi'
Publication date: 01/01/2002
Field of study

La cuve d'électrolyse est l'élément central dans la réduction de l'aluminium. En dépit des systèmes de contrôle automatique appliqués sur l'opération des cuves, une quantité significative d'informations sur leurs états n'est pas encore utilisée dans le processus décisionnel. D'ailleurs, la qualité de décision dépend bien souvent de l'opérateur responsable. Ce système expert à base de règles à deux niveaux est construit à partir de l'expertise disponible des opérateurs et de celle des ingénieurs du procédé d'électrolyse. Ce système est conçu pour diagnostiquer autant les cuves de type général que celles de types particuliers. De plus, il peut fonctionner en mode autonome tout en utilisant des données d'entrées locales à la station de travail ou en mode réseau en utilisant les données du procédé réel comme valeur d'entrée. Dans l'architecture réseau, le procédé réel peut être remplacé par un simulateur de cuve (un modèle mathématique) utilisant des mécanismes de transfert d'information semblable au système d'acquisition de données des procédés en temps réel. Cela permet de tester explicitement les tâches du système expert sur la surveillance du procédé et ses alarmes. L'agencement a comme objectif de proposer une aide aux opérateurs pour créer des analyses détaillées de l'état des cuves, de détecter la présence de défaut dans le procédé, de faire l'analyse de tendance et de proposer l'affectation de cible à long terme. Il peut également être présenté à l'ingénieur de contrôle comme référence pour le réglage de points de consigne sur certains régulateurs pertinents. L'architecture de la base de connaissance est conçue de manière à permettre la distribution de l'application aux divers types de cuves afin de simplifier la mise à jour éventuelle du système. C'est pour cette raison que la structure de la base de connaissances et la stratégie de raisonnement sont conçues avec des caractères uniques. Cette thèse fournit l'ensemble de la connaissance saisie au sujet du procédé d'électrolyse de l'aluminium et des secteurs appropriés. Celle-ci comprend la connaissance générale du domaine pour l'ingénierie cognitive aussi bien que la connaissance spéciale pour les types particuliers de cuves. Elle décrit également la construction du système expert et montre quelques exemples accompagnés de discussions détaillées sur différents cas de diagnostique

Crossref

Constellation

Leveraging NFV heterogeneity at the network edge

Author: Adoga Haruna Umar
Publication venue
Publication date: 01/01/2024
Field of study

With network function virtualisation (NFV) and network programmability, network functions (NFs) such as firewalls, traffic load balancers, content filters, and intrusion detection systems (IDS) are virtualized and either instantiated on user space hosts using virtual machines (VMs), lightweight containers, or in the network data plane using programmable switching technology such as P4 or offloaded onto Smart network interface cards (NICs) – often chained together to create a service function chain (SFC), based on defined service level agreement (SLA). The need to leverage heterogeneous programmable platforms to support the in-network acceleration of functions keeps growing as emerging use cases come with peculiar requirements. This thesis identifies various heterogeneous frameworks for deploying virtual network functions that network operators can leverage in service provider networks. A novel taxonomy that provides network operators and the wider research community valuable insights is proposed. The thesis presents the performance gains obtained from using heterogeneous frameworks for deploying virtual network functions using real testbeds. In addition, this thesis investigates the optimal placement of vNFs over the distributed edge network while considering the heterogeneity of packet processing elements. In particular, the work questions the status quo of how vNFs are currently being deployed, i.e., the lack of frameworks to support the seamless deployment of vNFs that are implemented on diverse packet processing platforms – leveraging the capability of the programmable network data plane. In response, the thesis presents a novel integer linear programming (ILP) model for the hybrid placement of diverse network functions that leverages the heterogeneity of the network data plane and the abundant processing capability of user space hosts, with the objective function of minimizing end-to-end latency for vNF placement. A novel hybrid placement heuristic algorithm, HYPHA, is also proposed to find a quick, efficient solution to the hybrid vNF placement problem. Using optimal stopping theory (OST) principles, an optimal placement scheduling model is presented to handle dynamic edge placement scenarios. The results in this work demonstrate that employing a hybrid deployment scheme that leverages the processing capability of the network data plane yields minimal user-tovNF latency and overall end-to-end latency while fulfilling the placement of a diverse set of user requests from emerging use cases to speed up service delivery by network operators. The results also show that network operators can leverage the high-speed, low-latency feature of data plane packet processing elements for hosting delay-sensitive applications and improving service delivery for subscribed users. It is shown that the proposed hybrid heuristic algorithm can obtain near-optimal vNF mapping while incurring fewer latency threshold violations set by network operators. Furthermore, in addition to emerging edge use cases, the placement solution presented in this thesis can be adapted to place network functions efficiently in core network infrastructure while leveraging the heterogeneity of servers. The dynamic placement scheduler also minimises the number of latency violations and vNF migrations between heterogeneous hosts based on SLAs set by network operators

Glasgow Theses Service

Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems

Author: Fang Ye
Publication venue: LSU Digital Commons
Publication date: 01/01/2016
Field of study

Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency

Louisiana State University