1,706 research outputs found

    ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

    Full text link
    We address the problem of finding realistic geometric corrections to a foreground object such that it appears natural when composited into a background image. To achieve this, we propose a novel Generative Adversarial Network (GAN) architecture that utilizes Spatial Transformer Networks (STNs) as the generator, which we call Spatial Transformer GANs (ST-GANs). ST-GANs seek image realism by operating in the geometric warp parameter space. In particular, we exploit an iterative STN warping scheme and propose a sequential training strategy that achieves better results compared to naive training of a single generator. One of the key advantages of ST-GAN is its applicability to high-resolution images indirectly since the predicted warp parameters are transferable between reference frames. We demonstrate our approach in two applications: (1) visualizing how indoor furniture (e.g. from product images) might be perceived in a room, (2) hallucinating how accessories like glasses would look when matched with real portraits.Comment: Accepted to CVPR 2018 (website & code: https://chenhsuanlin.bitbucket.io/spatial-transformer-GAN/

    Efficient machine learning: models and accelerations

    Get PDF
    One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems. To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance. As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption. Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression

    NASA JSC neural network survey results

    Get PDF
    A survey of Artificial Neural Systems in support of NASA's (Johnson Space Center) Automatic Perception for Mission Planning and Flight Control Research Program was conducted. Several of the world's leading researchers contributed papers containing their most recent results on artificial neural systems. These papers were broken into categories and descriptive accounts of the results make up a large part of this report. Also included is material on sources of information on artificial neural systems such as books, technical reports, software tools, etc

    Worker’s physical fatigue classification using neural networks

    Get PDF
    Physical fatigue is not only an indication of the user’s physical condition and/or need for sleep or rest, but can also be a significant symptom of various diseases. This fatigue affects the performance of workers in jobs that involve some continuous physical activity, and is the cause of a large proportion of accidents at work. The physical fatigue is commonly measured by the perceived exertion (RPE). Many previous studies have attempted to continuously monitor workers in order to detect the level of fatigue and prevent these accidents, but most have used invasive sensors that are difficult to place and prevent the worker from performing their tasks correctly. Other works use activity measurement sensors such as accelerometers, but the large amount of information obtained is difficult to analyse in order to extract the characteristics of each fatigue state. In this work, we use a dataset that contains data from inertial sensors of several workers performing various activities during their working day, labelled every 10 min based on their level of fatigue using questionnaires and the Borg fatigue scale. Applying Machine Learning techniques, we design, develop and test a system based on a neural network capable of classifying the variation of fatigue caused by the physical activity collected every 10 min; for this purpose, a feature extraction is performed after the time decomposition done with the Discrete Wavelet Transform (DWT). The results show that the proposed system has an accuracy higher than 92% for all the cases, being viable for its application in the proposed scenario.European Commission (EC). Fondo Europeo de Desarrollo Regional (FEDER)Consejería de Economía, Conocimiento, Empresas y Universidad (Junta de Andalucía) US-126371

    Joint Path planning and Power Allocation of a Cellular-Connected UAV using Apprenticeship Learning via Deep Inverse Reinforcement Learning

    Full text link
    This paper investigates an interference-aware joint path planning and power allocation mechanism for a cellular-connected unmanned aerial vehicle (UAV) in a sparse suburban environment. The UAV's goal is to fly from an initial point and reach a destination point by moving along the cells to guarantee the required quality of service (QoS). In particular, the UAV aims to maximize its uplink throughput and minimize the level of interference to the ground user equipment (UEs) connected to the neighbor cellular BSs, considering the shortest path and flight resource limitation. Expert knowledge is used to experience the scenario and define the desired behavior for the sake of the agent (i.e., UAV) training. To solve the problem, an apprenticeship learning method is utilized via inverse reinforcement learning (IRL) based on both Q-learning and deep reinforcement learning (DRL). The performance of this method is compared to learning from a demonstration technique called behavioral cloning (BC) using a supervised learning approach. Simulation and numerical results show that the proposed approach can achieve expert-level performance. We also demonstrate that, unlike the BC technique, the performance of our proposed approach does not degrade in unseen situations

    Unsupervised Feature Extraction Techniques for Plasma Semiconductor Etch Processes

    Get PDF
    As feature sizes on semiconductor chips continue to shrink plasma etching is becoming a more and more critical process in achieving low cost high-volume manufacturing. Due to the highly complex physics of plasma and chemical reactions between plasma species, control of plasma etch processes is one of the most di±cult challenges facing the integrated circuit industry. This is largely due to the di±culty with monitoring plasmas. Optical Emission Spectroscopy (OES) technology can be used to produce rich plasma chemical information in real time and is increasingly being considered in semiconductor manufacturing for process monitoring and control of plasma etch processes. However, OES data is complex and inherently highly redundant, necessitating the development of advanced algorithms for e®ective feature extraction. In this thesis, three new unsupervised feature extraction algorithms have been proposed for OES data analysis and the algorithm properties have been explored with the aid of both arti¯cial and industrial benchmark data sets. The ¯rst algorithm, AWSPCA (AdaptiveWeighting Sparse Principal Component Analysis), is developed for dimension reduction with respect to variations in the analysed variables. The algorithm gener- ates sparse principle components while retaining orthogonality and grouping correlated variables together. The second algorithm, MSC (Max Separation Clustering), is devel- oped for clustering variables with distinctive patterns and providing e®ective pattern representation by a small number of representative variables. The third algorithm, SLHC (Single Linkage Hierarchical Clustering), is developed to achieve a complete and detailed visualisation of the correlation between variables and across clusters in an OES data set. The developed algorithms open up opportunities for using OES data for accurate pro- cess control applications. For example, MSC enables the selection of relevant OES variables for better modeling and control of plasma etching processes. SLHC makes it possible to understand and interpret patterns in OES spectra and how they relate to the plasma chemistry. This in turns can help engineers to achieve an in-depth under- standing of underlying plasma processes

    Improving the Efficacy of Context-Aware Applications

    Get PDF
    In this dissertation, we explore methods for enhancing the context-awareness capabilities of modern computers, including mobile devices, tablets, wearables, and traditional computers. Advancements include proposed methods for fusing information from multiple logical sensors, localizing nearby objects using depth sensors, and building models to better understand the content of 2D images. First, we propose a system called Unagi, designed to incorporate multiple logical sensors into a single framework that allows context-aware application developers to easily test new ideas and create novel experiences. Unagi is responsible for collecting data, extracting features, and building personalized models for each individual user. We demonstrate the utility of the system with two applications: adaptive notification filtering and a network content prefetcher. We also thoroughly evaluate the system with respect to predictive accuracy, temporal delay, and power consumption. Next, we discuss a set of techniques that can be used to accurately determine the location of objects near a user in 3D space using a mobile device equipped with both depth and inertial sensors. Using a novel chaining approach, we are able to locate objects farther away than the standard range of the depth sensor without compromising localization accuracy. Empirical testing shows our method is capable of localizing objects 30m from the user with an error of less than 10cm. Finally, we demonstrate a set of techniques that allow a multi-layer perceptron (MLP) to learn resolution-invariant representations of 2D images, including the proposal of an MCMC-based technique to improve the selection of pixels for mini-batches used for training. We also show that a deep convolutional encoder could be trained to output a resolution-independent representation in constant time, and we discuss several potential applications of this research, including image resampling, image compression, and security
    • …
    corecore