2,516 research outputs found
Intelligent Multi-channel Meta-imagers for Accelerating Machine Vision
Rapid developments in machine vision have led to advances in a variety of
industries, from medical image analysis to autonomous systems. These
achievements, however, typically necessitate digital neural networks with heavy
computational requirements, which are limited by high energy consumption and
further hinder real-time decision-making when computation resources are not
accessible. Here, we demonstrate an intelligent meta-imager that is designed to
work in concert with a digital back-end to off-load computationally expensive
convolution operations into high-speed and low-power optics. In this
architecture, metasurfaces enable both angle and polarization multiplexing to
create multiple information channels that perform positive and negatively
valued convolution operations in a single shot. The meta-imager is employed for
object classification, experimentally achieving 98.6% accurate classification
of handwritten digits and 88.8% accuracy in classifying fashion images. With
compactness, high speed, and low power consumption, this approach could find a
wide range of applications in artificial intelligence and machine vision
applications.Comment: 15 pages, 5 figure
Learning Face Age Progression: A Pyramid Architecture of GANs
The two underlying requirements of face age progression, i.e. aging accuracy
and identity permanence, are not well studied in the literature. In this paper,
we present a novel generative adversarial network based approach. It separately
models the constraints for the intrinsic subject-specific characteristics and
the age-specific facial changes with respect to the elapsed time, ensuring that
the generated faces present desired aging effects while simultaneously keeping
personalized properties stable. Further, to generate more lifelike facial
details, high-level age-specific features conveyed by the synthesized face are
estimated by a pyramidal adversarial discriminator at multiple scales, which
simulates the aging effects in a finer manner. The proposed method is
applicable to diverse face samples in the presence of variations in pose,
expression, makeup, etc., and remarkably vivid aging effects are achieved. Both
visual fidelity and quantitative evaluations show that the approach advances
the state-of-the-art.Comment: CVPR 2018. V4 and V2 are the same, i.e. the conference version; V3 is
a related but different work, which is mistakenly submitted and will be
submitted as a new arXiv pape
Deep Learning for Technical Document Classification
In large technology companies, the requirements for managing and organizing
technical documents created by engineers and managers have increased
dramatically in recent years, which has led to a higher demand for more
scalable, accurate, and automated document classification. Prior studies have
only focused on processing text for classification, whereas technical documents
often contain multimodal information. To leverage multimodal information for
document classification to improve the model performance, this paper presents a
novel multimodal deep learning architecture, TechDoc, which utilizes three
types of information, including natural language texts and descriptive images
within documents and the associations among the documents. The architecture
synthesizes the convolutional neural network, recurrent neural network, and
graph neural network through an integrated training process. We applied the
architecture to a large multimodal technical document database and trained the
model for classifying documents based on the hierarchical International Patent
Classification system. Our results show that TechDoc presents a greater
classification accuracy than the unimodal methods and other state-of-the-art
benchmarks. The trained model can potentially be scaled to millions of
real-world multimodal technical documents, which is useful for data and
knowledge management in large technology companies and organizations.Comment: 16 pages, 8 figures, 9 table
- …