181 research outputs found

    Plasmonic photoconductive terahertz focal-plane array with pixel super-resolution

    Full text link
    Imaging systems operating in the terahertz part of the electromagnetic spectrum are in great demand because of the distinct characteristics of terahertz waves in penetrating many optically-opaque materials and providing unique spectral signatures of various chemicals. However, the use of terahertz imagers in real-world applications has been limited by the slow speed, large size, high cost, and complexity of the existing imaging systems. These limitations are mainly imposed due to the lack of terahertz focal-plane arrays (THz-FPAs) that can directly provide the frequency-resolved and/or time-resolved spatial information of the imaged objects. Here, we report the first THz-FPA that can directly provide the spatial amplitude and phase distributions, along with the ultrafast temporal and spectral information of an imaged object. It consists of a two-dimensional array of ~0.3 million plasmonic photoconductive nanoantennas optimized to rapidly detect broadband terahertz radiation with a high signal-to-noise ratio. As the first proof-of-concept, we utilized the multispectral nature of the amplitude and phase data captured by these plasmonic nanoantennas to realize pixel super-resolution imaging of objects. We successfully imaged and super-resolved etched patterns in a silicon substrate and reconstructed both the shape and depth of these structures with an effective number of pixels that exceeds 1-kilo pixels. By eliminating the need for raster scanning and spatial terahertz modulation, our THz-FPA offers more than a 1000-fold increase in the imaging speed compared to the state-of-the-art. Beyond this proof-of-concept super-resolution demonstration, the unique capabilities enabled by our plasmonic photoconductive THz-FPA offer transformative advances in a broad range of applications that use hyperspectral and three-dimensional terahertz images of objects for a wide range of applications.Comment: 62 page

    Learning Compact Compositional Embeddings via Regularized Pruning for Recommendation

    Full text link
    Latent factor models are the dominant backbones of contemporary recommender systems (RSs) given their performance advantages, where a unique vector embedding with a fixed dimensionality (e.g., 128) is required to represent each entity (commonly a user/item). Due to the large number of users and items on e-commerce sites, the embedding table is arguably the least memory-efficient component of RSs. For any lightweight recommender that aims to efficiently scale with the growing size of users/items or to remain applicable in resource-constrained settings, existing solutions either reduce the number of embeddings needed via hashing, or sparsify the full embedding table to switch off selected embedding dimensions. However, as hash collision arises or embeddings become overly sparse, especially when adapting to a tighter memory budget, those lightweight recommenders inevitably have to compromise their accuracy. To this end, we propose a novel compact embedding framework for RSs, namely Compositional Embedding with Regularized Pruning (CERP). Specifically, CERP represents each entity by combining a pair of embeddings from two independent, substantially smaller meta-embedding tables, which are then jointly pruned via a learnable element-wise threshold. In addition, we innovatively design a regularized pruning mechanism in CERP, such that the two sparsified meta-embedding tables are encouraged to encode information that is mutually complementary. Given the compatibility with agnostic latent factor models, we pair CERP with two popular recommendation models for extensive experiments, where results on two real-world datasets under different memory budgets demonstrate its superiority against state-of-the-art baselines. The codebase of CERP is available in https://github.com/xurong-liang/CERP.Comment: Accepted by ICDM'2

    Near-threshold photoproduction of J/ψJ/\psi in two-gluon exchange model

    Full text link
    The near-threshold photoproduction of J/ψJ/\psi is regarded as one golden process to unveil the nucleon mass structure, pentaquark state involving the charm quarks, and the poorly constrained gluon distribution of the nucleon at large xx (>0.1>0.1). In this paper, we present an analysis of the current experimental data under a two-gluon exchange model, which shows a good consistency. Using a parameterized function form with three free parameters, we have determined the nucleonic gluon distribution at the J/ψJ/\psi mass scale. Moreover, we predict the differential cross section of the electroproduction of J/ψJ/\psi as a function of the invariant mass of the final hadrons WW, at EicC, as a practical application of the model and the obtained gluon distribution. According to our estimation, thousands of J/ψJ/\psi events can be detected per year on EicC near the threshold. Therefore, the relevant experimental measurements are suggested to be carried out on EicC.Comment: 6 pages, 7 figure

    Rapid Sensing of Hidden Objects and Defects using a Single-Pixel Diffractive Terahertz Processor

    Full text link
    Terahertz waves offer numerous advantages for the nondestructive detection of hidden objects/defects in materials, as they can penetrate through most optically-opaque materials. However, existing terahertz inspection systems are restricted in their throughput and accuracy (especially for detecting small features) due to their limited speed and resolution. Furthermore, machine vision-based continuous sensing systems that use large-pixel-count imaging are generally bottlenecked due to their digital storage, data transmission and image processing requirements. Here, we report a diffractive processor that rapidly detects hidden defects/objects within a target sample using a single-pixel spectroscopic terahertz detector, without scanning the sample or forming/processing its image. This terahertz processor consists of passive diffractive layers that are optimized using deep learning to modify the spectrum of the terahertz radiation according to the absence/presence of hidden structures or defects. After its fabrication, the resulting diffractive processor all-optically probes the structural information of the sample volume and outputs a spectrum that directly indicates the presence or absence of hidden structures, not visible from outside. As a proof-of-concept, we trained a diffractive terahertz processor to sense hidden defects (including subwavelength features) inside test samples, and evaluated its performance by analyzing the detection sensitivity as a function of the size and position of the unknown defects. We validated its feasibility using a single-pixel terahertz time-domain spectroscopy setup and 3D-printed diffractive layers, successfully detecting hidden defects using pulsed terahertz illumination. This technique will be valuable for various applications, e.g., security screening, biomedical sensing, quality control, anti-counterfeiting measures and cultural heritage protection.Comment: 23 Pages, 5 Figure

    Spectrally-Encoded Single-Pixel Machine Vision Using Diffractive Networks

    Full text link
    3D engineering of matter has opened up new avenues for designing systems that can perform various computational tasks through light-matter interaction. Here, we demonstrate the design of optical networks in the form of multiple diffractive layers that are trained using deep learning to transform and encode the spatial information of objects into the power spectrum of the diffracted light, which are used to perform optical classification of objects with a single-pixel spectroscopic detector. Using a time-domain spectroscopy setup with a plasmonic nanoantenna-based detector, we experimentally validated this machine vision framework at terahertz spectrum to optically classify the images of handwritten digits by detecting the spectral power of the diffracted light at ten distinct wavelengths, each representing one class/digit. We also report the coupling of this spectral encoding achieved through a diffractive optical network with a shallow electronic neural network, separately trained to reconstruct the images of handwritten digits based on solely the spectral information encoded in these ten distinct wavelengths within the diffracted light. These reconstructed images demonstrate task-specific image decompression and can also be cycled back as new inputs to the same diffractive network to improve its optical object classification. This unique machine vision framework merges the power of deep learning with the spatial and spectral processing capabilities of diffractive networks, and can also be extended to other spectral-domain measurement systems to enable new 3D imaging and sensing modalities integrated with spectrally encoded classification tasks performed through diffractive optical networks.Comment: 21 pages, 5 figures, 1 tabl

    Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

    Full text link
    Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech. Separate latent features are derived to learn dysarthric speech characteristics and phoneme context representations. Self-supervised pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments conducted on the UASpeech corpus suggest the proposed adversarial data augmentation approach consistently outperformed the baseline speed perturbation and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end Conformer systems. After LHUC speaker adaptation, the best system using VAE-GAN based augmentation produced an overall WER of 27.78% on the UASpeech test set of 16 dysarthric speakers, and the lowest published WER of 57.31% on the subset of speakers with "Very Low" intelligibility.Comment: Submitted to ICASSP 202

    Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

    Full text link
    Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR systems is hindered by the scarcity of speaker-level data and performance sensitivity to transcription errors. To address these issues, a set of compact and data efficient speaker-dependent (SD) parameter representations are used to facilitate both speaker adaptive training and test-time unsupervised speaker adaptation of state-of-the-art Conformer ASR systems. The sensitivity to supervision quality is reduced using a confidence score-based selection of the less erroneous subset of speaker-level adaptation data. Two lightweight confidence score estimation modules are proposed to produce more reliable confidence scores. The data sparsity issue, which is exacerbated by data selection, is addressed by modelling the SD parameter uncertainty using Bayesian learning. Experiments on the benchmark 300-hour Switchboard and the 233-hour AMI datasets suggest that the proposed confidence score-based adaptation schemes consistently outperformed the baseline speaker-independent (SI) Conformer model and conventional non-Bayesian, point estimate-based adaptation using no speaker data selection. Similar consistent performance improvements were retained after external Transformer and LSTM language model rescoring. In particular, on the 300-hour Switchboard corpus, statistically significant WER reductions of 1.0%, 1.3%, and 1.4% absolute (9.5%, 10.9%, and 11.3% relative) were obtained over the baseline SI Conformer on the NIST Hub5'00, RT02, and RT03 evaluation sets respectively. Similar WER reductions of 2.7% and 3.3% absolute (8.9% and 10.2% relative) were also obtained on the AMI development and evaluation sets.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processin
    • …
    corecore