1,517 research outputs found

    Optical parametric generation in periodically poled KTiOPO4 via extended phase matching

    Get PDF
    We report an experimental demonstration of optical parametric generation in a periodically poled KTiOPO4 crystal based on the principle of mirrorless optical parametric oscillation. A femtosecond pump pulse spectrally centered at 792 nm from a Ti:sapphire amplifier is prechirped to minimize Kerr effects. The pump pulse is then injected into the nonlinear crystal and down converted to signal and idler pulses, approximately centered at 1584 nm, via amplified spontaneous parametric down conversion in a copropagating type-II quasiphase matching configuration. The maximum internal downconversion efficiency is 43%, the highest ever reported for optical parametric generators based on KTiOPO4 crystals. Such a device may find applications in optical signal processing and biological imaging

    Multi-stage Large Language Model Correction for Speech Recognition

    Full text link
    In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems. Different from traditional language models that focus on one single data domain, the rise of LLMs brings us the opportunity to push the limit of state-of-the-art ASR performance, and at the same time to achieve higher robustness and generalize effectively across multiple domains. Motivated by this, we propose a novel multi-stage approach to combine traditional language model re-scoring and LLM prompting. Specifically, the proposed method has two stages: the first stage uses a language model to re-score an N-best list of ASR hypotheses and run a confidence check; The second stage uses prompts to a LLM to perform ASR error correction on less confident results from the first stage. Our experimental results demonstrate the effectiveness of the proposed method by showing a 10% ~ 20% relative improvement in WER over a competitive ASR system -- across multiple test domains.Comment: Submitted to ICASSP 202

    Audio-visual object localization and separation using low-rank and sparsity

    Get PDF
    The ability to localize visual objects that are associated with an audio source and at the same time seperate the audio signal is a corner stone in several audio-visual signal processing applications. Past efforts usually focused on localizing only the visual objects, without audio separation abilities. Besides, they often rely computational expensive pre-processing steps to segment images pixels into object regions before applying localization approaches. We aim to address the problem of audio-visual source localization and separation in an unsupervised manner. The proposed approach employs low-rank in order to model the background visual and audio information and sparsity in order to extract the sparsely correlated components between the audio and visual modalities. In particular, this model decomposes each dataset into a sum of two terms: the low-rank matrices capturing the background uncorrelated information, while the sparse correlated components modelling the sound source in visual modality and the associated sound in audio modality. To this end a novel optimization problem, involving the minimization of nuclear norms and matrix â„“1-norms is solved. We evaluated the proposed method in 1) visual localization and audio separation and 2) visual-assisted audio denoising. The experimental results demonstrate the effectiveness of the proposed method
    • …
    corecore