1,706 research outputs found
ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing
We address the problem of finding realistic geometric corrections to a
foreground object such that it appears natural when composited into a
background image. To achieve this, we propose a novel Generative Adversarial
Network (GAN) architecture that utilizes Spatial Transformer Networks (STNs) as
the generator, which we call Spatial Transformer GANs (ST-GANs). ST-GANs seek
image realism by operating in the geometric warp parameter space. In
particular, we exploit an iterative STN warping scheme and propose a sequential
training strategy that achieves better results compared to naive training of a
single generator. One of the key advantages of ST-GAN is its applicability to
high-resolution images indirectly since the predicted warp parameters are
transferable between reference frames. We demonstrate our approach in two
applications: (1) visualizing how indoor furniture (e.g. from product images)
might be perceived in a room, (2) hallucinating how accessories like glasses
would look when matched with real portraits.Comment: Accepted to CVPR 2018 (website & code:
https://chenhsuanlin.bitbucket.io/spatial-transformer-GAN/
Efficient machine learning: models and accelerations
One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems.
To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance.
As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption.
Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression
NASA JSC neural network survey results
A survey of Artificial Neural Systems in support of NASA's (Johnson Space Center) Automatic Perception for Mission Planning and Flight Control Research Program was conducted. Several of the world's leading researchers contributed papers containing their most recent results on artificial neural systems. These papers were broken into categories and descriptive accounts of the results make up a large part of this report. Also included is material on sources of information on artificial neural systems such as books, technical reports, software tools, etc
Worker’s physical fatigue classification using neural networks
Physical fatigue is not only an indication of the user’s physical condition and/or need for sleep or rest, but can also be a significant symptom of various diseases. This fatigue affects the performance of workers in jobs that involve some continuous physical activity, and is the cause of a large proportion of accidents at work. The physical fatigue is commonly measured by the perceived exertion (RPE). Many previous studies have attempted to continuously monitor workers in order to detect the level of fatigue and prevent these accidents, but most have used invasive sensors that are difficult to place and prevent the worker from performing their tasks correctly. Other works use activity measurement sensors such as accelerometers, but the large amount of information obtained is difficult to analyse in order to extract the characteristics of each fatigue state. In this work, we use a dataset that contains data from inertial sensors of several workers performing various activities during their working day, labelled every 10 min based on their level of fatigue using questionnaires and the Borg fatigue scale. Applying Machine Learning techniques, we design, develop and test a system based on a neural network capable of classifying the variation of fatigue caused by the physical activity collected every 10 min; for this purpose, a feature extraction is performed after the time decomposition done with the Discrete Wavelet Transform (DWT). The results show that the proposed system has an accuracy higher than 92% for all the cases, being viable for its application in the proposed scenario.European Commission (EC). Fondo Europeo de Desarrollo Regional (FEDER)ConsejerÃa de EconomÃa, Conocimiento, Empresas y Universidad (Junta de AndalucÃa) US-126371
Joint Path planning and Power Allocation of a Cellular-Connected UAV using Apprenticeship Learning via Deep Inverse Reinforcement Learning
This paper investigates an interference-aware joint path planning and power
allocation mechanism for a cellular-connected unmanned aerial vehicle (UAV) in
a sparse suburban environment. The UAV's goal is to fly from an initial point
and reach a destination point by moving along the cells to guarantee the
required quality of service (QoS). In particular, the UAV aims to maximize its
uplink throughput and minimize the level of interference to the ground user
equipment (UEs) connected to the neighbor cellular BSs, considering the
shortest path and flight resource limitation. Expert knowledge is used to
experience the scenario and define the desired behavior for the sake of the
agent (i.e., UAV) training. To solve the problem, an apprenticeship learning
method is utilized via inverse reinforcement learning (IRL) based on both
Q-learning and deep reinforcement learning (DRL). The performance of this
method is compared to learning from a demonstration technique called behavioral
cloning (BC) using a supervised learning approach. Simulation and numerical
results show that the proposed approach can achieve expert-level performance.
We also demonstrate that, unlike the BC technique, the performance of our
proposed approach does not degrade in unseen situations
Unsupervised Feature Extraction Techniques for Plasma Semiconductor Etch Processes
As feature sizes on semiconductor chips continue to shrink plasma etching is becoming
a more and more critical process in achieving low cost high-volume manufacturing.
Due to the highly complex physics of plasma and chemical reactions between plasma
species, control of plasma etch processes is one of the most di±cult challenges facing the
integrated circuit industry. This is largely due to the di±culty with monitoring plasmas.
Optical Emission Spectroscopy (OES) technology can be used to produce rich plasma
chemical information in real time and is increasingly being considered in semiconductor
manufacturing for process monitoring and control of plasma etch processes. However,
OES data is complex and inherently highly redundant, necessitating the development
of advanced algorithms for e®ective feature extraction.
In this thesis, three new unsupervised feature extraction algorithms have been proposed
for OES data analysis and the algorithm properties have been explored with the aid
of both arti¯cial and industrial benchmark data sets. The ¯rst algorithm, AWSPCA
(AdaptiveWeighting Sparse Principal Component Analysis), is developed for dimension
reduction with respect to variations in the analysed variables. The algorithm gener-
ates sparse principle components while retaining orthogonality and grouping correlated
variables together. The second algorithm, MSC (Max Separation Clustering), is devel-
oped for clustering variables with distinctive patterns and providing e®ective pattern
representation by a small number of representative variables. The third algorithm,
SLHC (Single Linkage Hierarchical Clustering), is developed to achieve a complete and
detailed visualisation of the correlation between variables and across clusters in an OES
data set.
The developed algorithms open up opportunities for using OES data for accurate pro-
cess control applications. For example, MSC enables the selection of relevant OES
variables for better modeling and control of plasma etching processes. SLHC makes it
possible to understand and interpret patterns in OES spectra and how they relate to
the plasma chemistry. This in turns can help engineers to achieve an in-depth under-
standing of underlying plasma processes
Improving the Efficacy of Context-Aware Applications
In this dissertation, we explore methods for enhancing the context-awareness capabilities of modern computers, including mobile devices, tablets, wearables, and traditional computers. Advancements include proposed methods for fusing information from multiple logical sensors, localizing nearby objects using depth sensors, and building models to better understand the content of 2D images.
First, we propose a system called Unagi, designed to incorporate multiple logical sensors into a single framework that allows context-aware application developers to easily test new ideas and create novel experiences. Unagi is responsible for collecting data, extracting features, and building personalized models for each individual user. We demonstrate the utility of the system with two applications: adaptive notification filtering and a network content prefetcher. We also thoroughly evaluate the system with respect to predictive accuracy, temporal delay, and power consumption.
Next, we discuss a set of techniques that can be used to accurately determine the location of objects near a user in 3D space using a mobile device equipped with both depth and inertial sensors. Using a novel chaining approach, we are able to locate objects farther away than the standard range of the depth sensor without compromising localization accuracy. Empirical testing shows our method is capable of localizing objects 30m from the user with an error of less than 10cm.
Finally, we demonstrate a set of techniques that allow a multi-layer perceptron (MLP) to learn resolution-invariant representations of 2D images, including the proposal of an MCMC-based technique to improve the selection of pixels for mini-batches used for training. We also show that a deep convolutional encoder could be trained to output a resolution-independent representation in constant time, and we discuss several potential applications of this research, including image resampling, image compression, and security
- …