6 research outputs found
r-BTN: Cross-domain Face Composite and Synthesis from Limited Facial Patches
We start by asking an interesting yet challenging question, "If an eyewitness
can only recall the eye features of the suspect, such that the forensic artist
can only produce a sketch of the eyes (e.g., the top-left sketch shown in Fig.
1), can advanced computer vision techniques help generate the whole face
image?" A more generalized question is that if a large proportion (e.g., more
than 50%) of the face/sketch is missing, can a realistic whole face
sketch/image still be estimated. Existing face completion and generation
methods either do not conduct domain transfer learning or can not handle large
missing area. For example, the inpainting approach tends to blur the generated
region when the missing area is large (i.e., more than 50%). In this paper, we
exploit the potential of deep learning networks in filling large missing region
(e.g., as high as 95% missing) and generating realistic faces with
high-fidelity in cross domains. We propose the recursive generation by
bidirectional transformation networks (r-BTN) that recursively generates a
whole face/sketch from a small sketch/face patch. The large missing area and
the cross domain challenge make it difficult to generate satisfactory results
using a unidirectional cross-domain learning structure. On the other hand, a
forward and backward bidirectional learning between the face and sketch domains
would enable recursive estimation of the missing region in an incremental
manner (Fig. 1) and yield appealing results. r-BTN also adopts an adversarial
constraint to encourage the generation of realistic faces/sketches. Extensive
experiments have been conducted to demonstrate the superior performance from
r-BTN as compared to existing potential solutions.Comment: Accepted by AAAI 201
DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
Deploying deep neural networks on mobile devices is a challenging task.
Current model compression methods such as matrix decomposition effectively
reduce the deployed model size, but still cannot satisfy real-time processing
requirement. This paper first discovers that the major obstacle is the
excessive execution time of non-tensor layers such as pooling and normalization
without tensor-like trainable parameters. This motivates us to design a novel
acceleration framework: DeepRebirth through "slimming" existing consecutive and
parallel non-tensor and tensor layers. The layer slimming is executed at
different substructures: (a) streamline slimming by merging the consecutive
non-tensor and tensor layer vertically; (b) branch slimming by merging
non-tensor and tensor branches horizontally. The proposed optimization
operations significantly accelerate the model execution and also greatly reduce
the run-time memory cost since the slimmed model architecture contains less
hidden layers. To maximally avoid accuracy loss, the parameters in new
generated layers are learned with layer-wise fine-tuning based on both
theoretical analysis and empirical verification. As observed in the experiment,
DeepRebirth achieves more than 3x speed-up and 2.5x run-time memory saving on
GoogLeNet with only 0.4% drop of top-5 accuracy on ImageNet. Furthermore, by
combining with other model compression techniques, DeepRebirth offers an
average of 65ms inference time on the CPU of Samsung Galaxy S6 with 86.5% top-5
accuracy, 14% faster than SqueezeNet which only has a top-5 accuracy of 80.5%.Comment: AAAI 201
MECHANICAL PROPERTIES AND MICROSTRUCTURES OF REGENERATED CEMENT FROM WASTE CONCRETE
It has been a long time since humans started using waste materials in engineering applications. This approach not only reduces the yield of waste, while minimizing the costs of disposal but also limit the cost of new materials. In the field of construction, the reuse of waste concretes has been a strong research in recent years. However the processing of the wastes normally involves complicated processing and lab equipment. In this report we crush and dehydrate waste concretes with normal lab facilities and re-make the cement composites. The waste concretes were crushed and dehydrated at two temperatures, 1280 and 1400 ËšC. To balance the concentration of silica and lime, extra lime at 28.5 % and 16 % were added to the waste concretes. The resultant materials were evaluated with respect to the chemical composition, mechanical properties, and microstructures. It is concluded that the material dehydrated at 1400 ËšC and containing 28.5 % lime presents the best mechanical performance. This report presents a simple and inexpensive method to reuse the waste concretes in applications such as pavements
MECHANICAL PROPERTIES AND MICROSTRUCTURES OF REGENERATED CEMENT FROM WASTE CONCRETE
It has been a long time since humans started using waste materials in engineering applications. This approach not only reduces the yield of waste, while minimizing the costs of disposal but also limit the cost of new materials. In the field of construction, the reuse of waste concretes has been a strong research in recent years. However the processing of the wastes normally involves complicated processing and lab equipment. In this report we crush and dehydrate waste concretes with normal lab facilities and re-make the cement composites. The waste concretes were crushed and dehydrated at two temperatures, 1280 and 1400 ËšC. To balance the concentration of silica and lime, extra lime at 28.5 % and 16 % were added to the waste concretes. The resultant materials were evaluated with respect to the chemical composition, mechanical properties, and microstructures. It is concluded that the material dehydrated at 1400 ËšC and containing 28.5 % lime presents the best mechanical performance. This report presents a simple and inexpensive method to reuse the waste concretes in applications such as pavements
Cross domain Image Transformation and Generation by Deep Learning
Compared with single domain learning, cross-domain learning is more challenging due to the large domain variation. In addition, cross-domain image synthesis is more difficult than other cross learning problems, including, for example, correlation analysis, indexing, and retrieval, because it needs to learn complex function which contains image details for photo-realism. This work investigates cross-domain image synthesis in two common and challenging tasks, i.e., image-to-image and non-image-to-image transfer/synthesis.The image-to-image transfer is investigated in Chapter 2, where we develop a method for transformation between face images and sketch images while preserving the identity. Different from existing works that conduct domain transfer in a one-pass manner, we design a recurrent bidirectional transformation network (r-BTN), which allows bidirectional domain transfer in an integrated framework. More importantly, it could perceptually compose partial inputs from two domains to simultaneously synthesize face and sketch images with consistent identity. Most existing works could well synthesize images from patches that cover at least 70% of the original image. The proposed r-BTN could yield appealing results from patches that cover less than 10% because of the recursive estimation of the missing region in an incremental manner. Extensive experiments have been conducted to demonstrate the superior performance of r-BTN as compared to existing solutions.Chapter 3 targets at image transformation/synthesis from non-image sources, i.e., generating talking face based on the audio input. Existing works either do not consider temporal dependency thus yielding abrupt facial/lip movement or are limited to the generation for a specific person thus lacking generalization capacity. A novel conditional recurrent generation network which incorporates image and audio features in the recurrent unit for temporal dependency is proposed such that smooth transition can be achieved for lip and facial movements. To achieve image- and video-realism, we adopt a pair of spatial-temporal discriminators. Accurate lip synchronization is essential to the success of talking face video generation where we construct a lip-reading discriminator to boost the accuracy of lip synchronization. Extensive experiments demonstrate the superiority of our framework over the state-of-the-arts in terms of visual quality, lip sync accuracy, and smooth transition regarding lip and facial movement
COMPUTER VISION AND DEEP LEARNING WITH APPLICATIONS TO OBJECT DETECTION, SEGMENTATION, AND DOCUMENT ANALYSIS
There are three work on signature matching for document analysis. In the first work,
we propose a large-scale signature matching method based on locality sensitive hashing
(LSH). Shape Context features are used to describe the structure of signatures. Two stages
of hashing are performed to find the nearest neighbors for query signatures. We show
that our algorithm can achieve a high accuracy even when few signatures are collected
from one same person and perform fast matching when dealing with a large dataset. In
the second work, we present a novel signature matching method based on supervised
topic models. Shape Context features are extracted from signature shape contours which
capture the local variations in signature properties. We then use the concept of topic
models to learn the shape context features which correspond to individual authors. We
demonstrate considerable improvement over state of the art methods. In the third work,
we present a partial signature matching method using graphical models. In additional
to the second work, modified shape context features are extracted from the contour of
signatures to describe both full and partial signatures. Hierarchical Dirichlet processes
are implemented to infer the number of salient regions needed. The results show the
effectiveness of the approach for both the partial and full signature matching.
There are three work on deep learning for object detection and segmentation. In
the first work, we propose a deep neural network fusion architecture for fast and robust
pedestrian detection. The proposed network fusion architecture allows for parallel processing
of multiple networks for speed. A single shot deep convolutional network is
trained as an object detector to generate all possible pedestrian candidates of different
sizes and occlusions. Next, multiple deep neural networks are used in parallel for further
refinement of these pedestrian candidates. We introduce a soft-rejection based network
fusion method to fuse the soft metrics from all networks together to generate the final
confidence scores. Our method performs better than existing state-of-the-arts, especially
when detecting small-size and occluded pedestrians. Furthermore, we propose a method
for integrating pixel-wise semantic segmentation network into the network fusion architecture
as a reinforcement to the pedestrian detector. In the second work, in addition to
the first work, a fusion network is trained to fuse the multiple classification networks.
Furthermore, a novel soft-label method is devised to assign floating point labels to the
pedestrian candidates. This metric for each candidate detection is derived from the percentage
of overlap of its bounding box with those of other ground truth classes. In the
third work, we propose a boundary-sensitive deep neural network architecture for portrait
segmentation. A residual network and atrous convolution based framework is trained as
the base portrait segmentation network. To better solve boundary segmentation, three
techniques are introduced. First, an individual boundary-sensitive kernel is introduced by
labeling the boundary pixels as a separate class and using the soft-label strategy to assign
floating-point label vectors to pixels in the boundary class. Each pixel contributes to multiple
classes when updating loss based on its relative position to the contour. Second, a
global boundary-sensitive kernel is used when updating loss function to assign different
weights to pixel locations on one image to constrain the global shape of the resulted segmentation
map. Third, we add multiple binary classifiers to classify boundary-sensitive
portrait attributes, so as to refine the learning process of our model