104 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking

    Full text link
    The natural language generation (NLG) component of a spoken dialogue system (SDS) usually needs a substantial amount of handcrafting or a well-labeled dataset to be trained on. These limitations add significantly to development costs and make cross-domain, multi-lingual dialogue systems intractable. Moreover, human languages are context-aware. The most natural response should be directly learned from data rather than depending on predefined syntaxes or rules. This paper presents a statistical language generator based on a joint recurrent and convolutional neural network structure which can be trained on dialogue act-utterance pairs without any semantic alignments or predefined grammar trees. Objective metrics suggest that this new model outperforms previous methods under the same experimental conditions. Results of an evaluation by human judges indicate that it produces not only high quality but linguistically varied utterances which are preferred compared to n-gram and rule-based systems.Comment: To be appear in SigDial 201

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Keskusteluavustimen kehittÀminen kuulovammaisia varten automaattista puheentunnistusta kÀyttÀen

    Get PDF
    Understanding and participating in conversations has been reported as one of the biggest challenges hearing impaired people face in their daily lives. These communication problems have been shown to have wide-ranging negative consequences, affecting their quality of life and the opportunities available to them in education and employment. A conversational assistance application was investigated to alleviate these problems. The application uses automatic speech recognition technology to provide real-time speech-to-text transcriptions to the user, with the goal of helping deaf and hard of hearing persons in conversational situations. To validate the method and investigate its usefulness, a prototype application was developed for testing purposes using open-source software. A user test was designed and performed with test participants representing the target user group. The results indicate that the Conversation Assistant method is valid, meaning it can help the hearing impaired to follow and participate in conversational situations. Speech recognition accuracy, especially in noisy environments, was identified as the primary target for further development for increased usefulness of the application. Conversely, recognition speed was deemed to be sufficient and already surpass the transcription speed of human transcribers.Keskustelupuheen ymmÀrtÀminen ja keskusteluihin osallistuminen on raportoitu yhdeksi suurimmista haasteista, joita kuulovammaiset kohtaavat jokapÀivÀisessÀ elÀmÀssÀÀn. NÀillÀ viestintÀongelmilla on osoitettu olevan laaja-alaisia negatiivisia vaikutuksia, jotka heijastuvat elÀmÀnlaatuun ja heikentÀvÀt kuulovammaisten yhdenvertaisia osallistumismahdollisuuksia opiskeluun ja työelÀmÀÀn. TyössÀ kehitettiin ja arvioitiin apusovellusta keskustelupuheen ymmÀrtÀmisen ja keskusteluihin osallistumisen helpottamiseksi. Sovellus kÀyttÀÀ automaattista puheentunnistusta reaaliaikaiseen puheen tekstittÀmiseen kuuroja ja huonokuuloisia varten. MenetelmÀn toimivuuden vahvistamiseksi ja sen hyödyllisyyden tutkimiseksi siitÀ kehitettiin prototyyppisovellus kÀyttÀjÀtestausta varten avointa lÀhdekoodia hyödyntÀen. Testaamista varten suunniteltiin ja toteutettiin kÀyttÀjÀkoe sovelluksen kohderyhmÀÀ edustavilla koekÀyttÀjillÀ. Saadut tulokset viittaavat siihen, ettÀ työssÀ esitetty Keskusteluavustin on toimiva ja hyödyllinen apuvÀline huonokuuloisille ja kuuroille. Puheentunnistustarkkuus erityisesti meluisissa olosuhteissa osoittautui ensisijaiseksi kehityskohteeksi apusovelluksen hyödyllisyyden lisÀÀmiseksi. Puheentunnistuksen nopeus arvioitiin puolestaan jo riittÀvÀn nopeaksi, ylittÀen selkeÀsti kirjoitustulkkien kirjoitusnopeuden

    Detection of Cattle Using Drones and Convolutional Neural Networks

    Get PDF
    [EN] Multirotor drones have been one of the most important technological advances of the last decade. Their mechanics are simple compared to other types of drones and their possibilities in flight are greater. For example, they can take-off vertically. Their capabilities have therefore brought progress to many professional activities. Moreover, advances in computing and telecommunications have also broadened the range of activities in which drones may be used. Currently, artificial intelligence and information analysis are the main areas of research in the field of computing. The case study presented in this article employed artificial intelligence techniques in the analysis of information captured by drones. More specifically, the camera installed in the drone took images which were later analyzed using Convolutional Neural Networks (CNNs) to identify the objects captured in the images. In this research, a CNN was trained to detect cattle, however the same training process could be followed to develop a CNN for the detection of any other object. This article describes the design of the platform for real-time analysis of information and its performance in the detection of cattle

    Incorporating Weak Statistics for Low-Resource Language Modeling

    Get PDF
    Automatic speech recognition (ASR) requires a strong language model to guide the acoustic model and favor likely utterances. While many tasks enjoy billions of language model training tokens, many domains which require ASR do not have readily available electronic corpora.The only source of useful language modeling data is expensive and time-consuming human transcription of in-domain audio. This dissertation seeks to quickly and inexpensively improve low-resource language modeling for use in automatic speech recognition. This dissertation first considers efficient use of non-professional human labor to best improve system performance, and demonstrate that it is better to collect more data, despite higher transcription error, than to redundantly transcribe data to improve quality. In the process of developing procedures to collect such data, this work also presents an efficient rating scheme to detect poor transcribers without gold standard data. As an alternative to this process, automatic transcripts are generated with an ASR system and explore efficiently combining these low-quality transcripts with a small amount of high quality transcripts. Standard n-gram language models are sensitive to the quality of the highest order n-gram and are unable to exploit accurate weaker statistics. Instead, a log-linear language model is introduced, which elegantly incorporates a variety of background models through MAP adaptation. This work introduces marginal class constraints which effectively capture knowledge of transcriber error and improve performance over n-gram features. Finally, this work constrains the language modeling task to keyword search of words unseen in the training text. While overall system performance is good, these words suffer the most due to a low probability in the language model. Semi-supervised learning effectively extracts likely n-grams containing these new keywords from a large corpus of audio. By using a search metric that favors recall over precision, this method captures over 80% of the potential gain
    • 

    corecore