193 research outputs found

    Algorithm and Hardware Co-design for Learning On-a-chip

    Get PDF
    abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm. For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability. Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance. From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Volumetric cloud generation using a Chinese brush calligraphy style

    Get PDF
    Includes bibliographical references.Clouds are an important feature of any real or simulated environment in which the sky is visible. Their amorphous, ever-changing and illuminated features make the sky vivid and beautiful. However, these features increase both the complexity of real time rendering and modelling. It is difficult to design and build volumetric clouds in an easy and intuitive way, particularly if the interface is intended for artists rather than programmers. We propose a novel modelling system motivated by an ancient painting style, Chinese Landscape Painting, to address this problem. With the use of only one brush and one colour, an artist can paint a vivid and detailed landscape efficiently. In this research, we develop three emulations of a Chinese brush: a skeleton-based brush, a 2D texture footprint and a dynamic 3D footprint, all driven by the motion and pressure of a stylus pen. We propose a hybrid mapping to generate both the body and surface of volumetric clouds from the brush footprints. Our interface integrates these components along with 3D canvas control and GPU-based volumetric rendering into an interactive cloud modelling system. Our cloud modelling system is able to create various types of clouds occurring in nature. User tests indicate that our brush calligraphy approach is preferred to conventional volumetric cloud modelling and that it produces convincing 3D cloud formations in an intuitive and interactive fashion. While traditional modelling systems focus on surface generation of 3D objects, our brush calligraphy technique constructs the interior structure. This forms the basis of a new modelling style for objects with amorphous shape

    Exploring Decomposition for Solving Pattern Mining Problems

    Get PDF
    This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluation show that the CBPM provides a reduction in both the runtime and memory usage. Also, CBPM based on the approximate strategy provides good accuracy, demonstrating its effectiveness and feasibility. Our GPU implementation achieves significant speedup of up to 552× on a single GPU using big transaction databases.publishedVersio

    Towards Scalable Nanomanufacturing: Modeling The Interaction Of Charged Droplets From Electrospray Using Gpu

    Get PDF
    Electrospray is an atomization method subject to intense study recently due to its monodispersity and the wide size range of droplets it can produce, from nanometers to hundreds of micrometers. This thesis focuses on the numerical and theoretical modeling of the interaction of charged droplets from the single and multiplexed electrospray. We studied two typical scenarios: large area film depositions using multiplexed electrospray and fine pattern printings assisted by linear electrostatic quadrupole focusing. Due to the high computation power requirement in the unsteady n-body problem, graphical processing unit (GPU) which delivers 10 Tera flops in computation power is used to dramatically speed up the numerical simulation both efficiently and with low cost. For large area film deposition, both the spray profile and deposition number density are studied for different arrangements of electrospray and electrodes. Multiplexed electrospray with hexagonal nozzle configuration can not give us uniform deposition though it has the highest packing density. Uniform film deposition with variation \u3c 5% in thickness was observed with the linear nozzle configuration combined with relative motion between ES source and deposition substrate. For fine pattern printing, linear quadrupole is used to focus the droplets in the radial direction while maintaining a constant driving field at the axial direction. Simulation shows that the linear quadrupole can focus the droplets to a resolution of a few nanometers quickly when the interdroplet separation is larger than a certain value. Resolution began to deteriorate drastically when the inter-droplet separation is smaller than that value. This study will shed light on using electrospray as a scalable nanomanufacturing approach

    Searching surveillance video contents using convolutional neural network

    Get PDF
    Manual video inspection, searching, and analyzing is exhausting and inefficient. This paper presents an intelligent system to search surveillance video contents using deep learning. The proposed system reduced the amount of work that is needed to perform video searching and improved the speed and accuracy. A pre-trained VGG-16 CNNs model is used for dataset training. In addition, key frames of videos were extracted in order to save space, reduce the amount of work, and reduce the execution time. The extracted key frames were processed using the sobel operator edge detector and the max-pooling in order to eliminate redundancy. This increases compaction and avoids similarities between extracted frames. A text file, that contains key frame index, time of occurrence, and the classification of the VGG-16 model, is produced. The text file enables humans to easily search for objects of interest. VIRAT and IVY LAB datasets were used in the experiments. In addition, 128 different classes were identified in the datasets. The classes represent important objects for surveillance systems. However, users can identify other classes and utilize the proposed methodology. Experiments and evaluation showed that the proposed system outperformed existing methods in an order of magnitude. The system achieved the best results in speed while providing a high accuracy in classification

    A family of well-balanced WENO and TENO schemes for atmospheric flows

    Get PDF
    We herein present a novel methodology to construct very high order well-balanced schemes for the computation of the Euler equations with gravitational source term, with application to numerical weather prediction (NWP). The proposed method is based on augmented Riemann solvers, which allow preserving the exact equilibrium between fluxes and source terms at cell interfaces. In particular, the augmented HLL solver (HLLS) is considered. Different spatial reconstruction methods can be used to ensure a high order of accuracy in space (e.g. WENO, TENO, linear reconstruction), being the TENO reconstruction the preferred method in this work. To the knowledge of the authors, the TENO method has not been applied to NWP before, although it has been extensively used by the computational fluid dynamics community in recent years. Therefore, we offer a thorough assessment of the TENO method to evidence its suitability for NWP considering some benchmark cases which involve inertia and gravity waves as well as convective processes. The TENO method offers an enhanced behavior when dealing with turbulent flows and underresolved solutions, where the traditional WENO scheme proves to be more diffusive. The proposed methodology, based on the HLLS solver in combination with a very high-order discretization, allows carrying out the simulation of meso- and micro-scale atmospheric flows in an implicit Large Eddy Simulation manner. Due to the HLLS solver, the isothermal, adiabatic and constant Brunt-Väisälä frequency hydrostatic equilibrium states are preserved with machine accuracy

    Doctor of Philosophy

    Get PDF
    dissertationConfocal microscopy has become a popular imaging technique in biology research in recent years. It is often used to study three-dimensional (3D) structures of biological samples. Confocal data are commonly multichannel, with each channel resulting from a different fluorescent staining. This technique also results in finely detailed structures in 3D, such as neuron fibers. Despite the plethora of volume rendering techniques that have been available for many years, there is a demand from biologists for a flexible tool that allows interactive visualization and analysis of multichannel confocal data. Together with biologists, we have designed and developed FluoRender. It incorporates volume rendering techniques such as a two-dimensional (2D) transfer function and multichannel intermixing. Rendering results can be enhanced through tone-mappings and overlays. To facilitate analyses of confocal data, FluoRender provides interactive operations for extracting complex structures. Furthermore, we developed the Synthetic Brainbow technique, which takes advantage of the asynchronous behavior in Graphics Processing Unit (GPU) framebuffer loops and generates random colorizations for different structures in single-channel confocal data. The results from our Synthetic Brainbows, when applied to a sequence of developing cells, can then be used for tracking the movements of these cells. Finally, we present an application of FluoRender in the workflow of constructing anatomical atlases
    • …
    corecore