416 research outputs found

    Advances in unlimited-vocabulary speech recognition for morphologically rich languages

    Get PDF
    Automatic speech recognition systems are devices or computer programs that convert human speech into text or make actions based on what is said to the system. Typical applications include dictation, automatic transcription of large audio or video databases, speech-controlled user interfaces, and automated telephone services, for example. If the recognition system is not limited to a certain topic and vocabulary, covering the words in the target languages as well as possible while maintaining a high recognition accuracy becomes an issue. The conventional way to model the target language, especially in English recognition systems, is to limit the recognition to the most common words of the language. A vocabulary of 60 000 words is usually enough to cover the language adequately for arbitrary topics. On the other hand, in morphologically rich languages, such as Finnish, Estonian and Turkish, long words can be formed by inflecting and compounding, which makes it difficult to cover the language adequately by vocabulary-based approaches. This thesis deals with methods that can be used to build efficient speech recognition systems for morphologically rich languages. Before training the statistical n-gram language models on a large text corpus, the words in the corpus are automatically segmented into smaller fragments, referred to as morphs. The morphs are then used as modelling units of the n-gram models instead of whole words. This makes it possible to train the model on the whole text corpus without limiting the vocabulary and enables the model to create even unseen words by joining morphs together. Since the segmentation algorithm is unsupervised and data-driven, it can be readily used for many languages. Speech recognition experiments are made on various Finnish recognition tasks and some of the experiments are also repeated on an Estonian task. It is shown that the morph-based language models reduce recognition errors when compared to word-based models. It seems to be important, however, that the n-gram models are allowed to use long morph contexts, especially if the morphs used by the model are short. This can be achieved by using growing and pruning algorithms to train variable-length n-gram models. The thesis also presents data structures that can be used for representing the variable-length n-gram models efficiently in recognition systems. By analysing the recognition errors made by Finnish recognition systems it is found out that speaker adaptive training and discriminative training methods help to reduce errors in different situations. The errors are also analysed according to word frequencies and manually defined error classes

    Exploration of a Polarized Surface Bidirectional Reflectance Model Using the Ground-Based Multiangle Spectropolarimetric Imager

    Get PDF
    Accurate characterization of surface reflection is essential for retrieval of aerosols using downward-looking remote sensors. In this paper, observations from the Ground-based Multiangle SpectroPolarimetric Imager (GroundMSPI) are used to evaluate a surface polarized bidirectional reflectance distribution function (PBRDF) model. GroundMSPI is an eight-band spectropolarimetric camera mounted on a rotating gimbal to acquire pushbroom imagery of outdoor landscapes. The camera uses a very accurate photoelastic-modulator-based polarimetric imaging technique to acquire Stokes vector measurements in three of the instrument's bands (470, 660, and 865 nm). A description of the instrument is presented, and observations of selected targets within a scene acquired on 6 January 2010 are analyzed. Data collected during the course of the day as the Sun moved across the sky provided a range of illumination geometries that facilitated evaluation of the surface model, which is comprised of a volumetric reflection term represented by the modified Rahman-Pinty-Verstraete function plus a specular reflection term generated by a randomly oriented array of Fresnel-reflecting microfacets. While the model is fairly successful in predicting the polarized reflection from two grass targets in the scene, it does a poorer job for two manmade targets (a parking lot and a truck roof), possibly due to their greater degree of geometric organization. Several empirical adjustments to the model are explored and lead to improved fits to the data. For all targets, the data support the notion of spectral invariance in the angular shape of the unpolarized and polarized surface reflection. As noted by others, this behavior provides valuable constraints on the aerosol retrieval problem, and highlights the importance of multiangle observations.NASAJPLCenter for Space Researc

    The 1st Conference of PhD Students in Computer Science

    Get PDF
    • …
    corecore