949 research outputs found

    An IDL-based analysis package for COBE and other skycube-formatted astronomical data

    Get PDF
    UIMAGE is a data analysis package written in IDL for the Cosmic Background Explorer (COBE) project. COBE has extraordinarily stringent accuracy requirements: 1 percent mid-infrared absolute photometry, 0.01 percent submillimeter absolute spectrometry, and 0.0001 percent submillimeter relative photometry. Thus, many of the transformations and image enhancements common to analysis of large data sets must be done with special care. UIMAGE is unusual in this sense in that it performs as many of its operations as possible on the data in its native format and projection, which in the case of COBE is the quadrilateralized sphereical cube ('skycube'). That is, after reprojecting the data, e.g., onto an Aitoff map, the user who performs an operation such as taking a crosscut or extracting data from a pixel is transparently acting upon the skycube data from which the projection was made, thereby preserving the accuracy of the result. Current plans call for formatting external data bases such as CO maps into the skycube format with a high-accuracy transformation, thereby allowing Guest Investigators to use UIMAGE for direct comparison of the COBE maps with those at other wavelengths from other instruments. It is completely menu-driven so that its use requires no knowledge of IDL. Its functionality includes I/O from the COBE archives, FITS files, and IDL save sets as well as standard analysis operations such as smoothing, reprojection, zooming, statistics of areas, spectral analysis, etc. One of UIMAGE's more advanced and attractive features is its terminal independence. Most of the operations (e.g., menu-item selection or pixel selection) that are driven by the mouse on an X-windows terminal are also available using arrow keys and keyboard entry (e.g., pixel coordinates) on VT200 and Tektronix-class terminals. Even limited grey scales of images are available this way. Obviously, image processing is very limited on this type of terminal, but it is nonetheless surprising how much analysis can be done on that medium. Such flexibility has the virtue of expanding the user community to those who must work remotely on non-image terminals, e.g., via modem

    Language Model Combination and Adaptation Using Weighted Finite State Transducers

    Get PDF
    In speech recognition systems language model (LMs) are often constructed by training and combining multiple n-gram models. They can be either used to represent different genres or tasks found in diverse text sources, or capture stochastic properties of different linguistic symbol sequences, for example, syllables and words. Unsupervised LM adaption may also be used to further improve robustness to varying styles or tasks. When using these techniques, extensive software changes are often required. In this paper an alternative and more general approach based on weighted finite state transducers (WFSTs) is investigated for LM combination and adaptation. As it is entirely based on well-defined WFST operations, minimum change to decoding tools is needed. A wide range of LM combination configurations can be flexibly supported. An efficient on-the-fly WFST decoding algorithm is also proposed. Significant error rate gains of 7.3% relative were obtained on a state-of-the-art broadcast audio recognition task using a history dependently adapted multi-level LM modelling both syllable and word sequence

    Adapting an ASR Foundation Model for Spoken Language Assessment

    Full text link
    A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a tendency to skip disfluencies and hesitations in the output. Though useful for readability, these attributes are not helpful for assessing the ability of a candidate and providing feedback. Here a precise transcription of what a candidate said is needed. In this paper, we give a detailed analysis of Whisper outputs and propose two solutions: fine-tuning and soft prompt tuning. Experiments are conducted on both public speech corpora and an English learner dataset. Results show that we can effectively alter the decoding behaviour of Whisper to generate the exact words spoken in the response.Comment: Proceedings of SLaT

    Adapting an Unadaptable ASR System

    Full text link
    As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a large-scale ASR system to assess adaptation methods. An error correction based approach is adopted, as this does not require access to the model, but can be trained from either 1-best or N-best outputs that are normally available via the ASR API. LibriSpeech is used as the primary target domain for adaptation. The generalization ability of the system in two distinct dimensions are then evaluated. First, whether the form of correction model is portable to other speech recognition domains, and secondly whether it can be used for ASR models having a different architecture.Comment: submitted to INTERSPEEC

    N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

    Full text link
    Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pre-trained language model and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline. Another issue with standard error correction is that the generation process is not well-guided. To address this a constrained decoding process, either based on the N-best list or an ASR lattice, is used which allows additional information to be propagated.Comment: submitted to INTERSPEEC

    Blind Normalization of Speech From Different Channels

    Full text link
    We show how to construct a channel-independent representation of speech that has propagated through a noisy reverberant channel. This is done by blindly rescaling the cepstral time series by a non-linear function, with the form of this scale function being determined by previously encountered cepstra from that channel. The rescaled form of the time series is an invariant property of it in the following sense: it is unaffected if the time series is transformed by any time-independent invertible distortion. Because a linear channel with stationary noise and impulse response transforms cepstra in this way, the new technique can be used to remove the channel dependence of a cepstral time series. In experiments, the method achieved greater channel-independence than cepstral mean normalization, and it was comparable to the combination of cepstral mean normalization and spectral subtraction, despite the fact that no measurements of channel noise or reverberations were required (unlike spectral subtraction).Comment: 25 pages, 7 figure
    • …
    corecore