Search CORE

2,516 research outputs found

Intelligent Multi-channel Meta-imagers for Accelerating Machine Vision

Author: Huo Yuankai
Kravchenko Ivan I.
Liu Quan
Valentine Jason G.
Zhang Xiaomeng
Zheng Hanyu
Publication venue
Publication date: 12/06/2023
Field of study

Rapid developments in machine vision have led to advances in a variety of industries, from medical image analysis to autonomous systems. These achievements, however, typically necessitate digital neural networks with heavy computational requirements, which are limited by high energy consumption and further hinder real-time decision-making when computation resources are not accessible. Here, we demonstrate an intelligent meta-imager that is designed to work in concert with a digital back-end to off-load computationally expensive convolution operations into high-speed and low-power optics. In this architecture, metasurfaces enable both angle and polarization multiplexing to create multiple information channels that perform positive and negatively valued convolution operations in a single shot. The meta-imager is employed for object classification, experimentally achieving 98.6% accurate classification of handwritten digits and 88.8% accuracy in classifying fashion images. With compactness, high speed, and low power consumption, this approach could find a wide range of applications in artificial intelligence and machine vision applications.Comment: 15 pages, 5 figure

arXiv.org e-Print Archive

Learning Face Age Progression: A Pyramid Architecture of GANs

Author: Huang Di
Jain Anil K.
Wang Yunhong
Yang Hongyu
Publication venue
Publication date: 10/01/2019
Field of study

The two underlying requirements of face age progression, i.e. aging accuracy and identity permanence, are not well studied in the literature. In this paper, we present a novel generative adversarial network based approach. It separately models the constraints for the intrinsic subject-specific characteristics and the age-specific facial changes with respect to the elapsed time, ensuring that the generated faces present desired aging effects while simultaneously keeping personalized properties stable. Further, to generate more lifelike facial details, high-level age-specific features conveyed by the synthesized face are estimated by a pyramidal adversarial discriminator at multiple scales, which simulates the aging effects in a finer manner. The proposed method is applicable to diverse face samples in the presence of variations in pose, expression, makeup, etc., and remarkably vivid aging effects are achieved. Both visual fidelity and quantitative evaluations show that the approach advances the state-of-the-art.Comment: CVPR 2018. V4 and V2 are the same, i.e. the conference version; V3 is a related but different work, which is mistakenly submitted and will be submitted as a new arXiv pape

arXiv.org e-Print Archive

Crossref

Deep Learning for Technical Document Classification

Author: Hu Jie
Jiang Shuo
Luo Jianxi
Magee Christopher L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/02/2022
Field of study

In large technology companies, the requirements for managing and organizing technical documents created by engineers and managers have increased dramatically in recent years, which has led to a higher demand for more scalable, accurate, and automated document classification. Prior studies have only focused on processing text for classification, whereas technical documents often contain multimodal information. To leverage multimodal information for document classification to improve the model performance, this paper presents a novel multimodal deep learning architecture, TechDoc, which utilizes three types of information, including natural language texts and descriptive images within documents and the associations among the documents. The architecture synthesizes the convolutional neural network, recurrent neural network, and graph neural network through an integrated training process. We applied the architecture to a large multimodal technical document database and trained the model for classifying documents based on the hierarchical International Patent Classification system. Our results show that TechDoc presents a greater classification accuracy than the unimodal methods and other state-of-the-art benchmarks. The trained model can potentially be scaled to millions of real-world multimodal technical documents, which is useful for data and knowledge management in large technology companies and organizations.Comment: 16 pages, 8 figures, 9 table

arXiv.org e-Print Archive