3,512 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF

    Machine translation and post-editing in widlife documentaries: challenges and posiible solutions

    Get PDF
    Este artículo presenta algunos de los desafíos que pueden presentarse si introducimos traducción automática (TA) en el proceso de traducción de documentales de naturaleza. Hasta ahora, TA se ha usado para traducir textos escritos de carácter general y especializado. A pesar de ello, en los últimos años, proyectos financiados por la UE han empezado a trabajar en el ámbito de la traducción audiovisual con el objetivo de usar TA para traducir subtítulos y ya se ha demostrado que los subtítulos poseditados pueden llegar a niveles de calidad adecuados. Pero los documentales no solo pueden traducirse mediante subtítulos que, en países donde la subtitulación no es el principal modo de transferencia audiovisual, se usan voces superpuestas y doblaje en off para hacerlo. Es por este motivo que creemos necesario investigar la introducción de TA para traducir documentales de naturaleza mediante voces superpuestas y doblaje en off. Este artículo describe los desafíos que conlleva traducir automáticamente guiones de documentales presentado un análisis preliminar de las traducciones producidas por distintos motores de traducción automática. En primer lugar aportamos una visión general de las características de las voces superpuestas y el doblaje en off, así como un breve resumen de anteriores investigaciones en las que se intenta introducir TA en el ámbito de la traducción audiovisual. A continuación presentamos la metodología usada para llevar a cabo el análisis de un corpus de guiones de documentales, por un lado, y de un corpus de traducciones automáticas de estos mismos guiones, por el otro. Finalmente, antes de resumir posibles nuevas investigaciones derivadas de este artículo, esclarecemos los posibles desafíos con los que podríamos encontrarnos para conseguir traducciones de guiones de documentales de calidad usando TA, presentamos los resultados de los análisis y sugerimos posibles soluciones a estos desafíos

    Towards Affordable Disclosure of Spoken Word Archives

    Get PDF
    This paper presents and discusses ongoing work aiming at affordable disclosure of real-world spoken word archives in general, and in particular of a collection of recorded interviews with Dutch survivors of World War II concentration camp Buchenwald. Given such collections, the least we want to be able to provide is search at different levels and a flexible way of presenting results. Strategies for automatic annotation based on speech recognition – supporting e.g., within-document search– are outlined and discussed with respect to the Buchenwald interview collection. In addition, usability aspects of the spoken word search are discussed on the basis of our experiences with the online Buchenwald web portal. It is concluded that, although user feedback is generally fairly positive, automatic annotation performance is still far from satisfactory, and requires additional research

    Affect-LM: A Neural Language Model for Customizable Affective Text Generation

    Full text link
    Human verbal communication includes affective messages which are conveyed through use of emotionally colored words. There has been a lot of research in this direction but the problem of integrating state-of-the-art neural language models with affective information remains an area ripe for exploration. In this paper, we propose an extension to an LSTM (Long Short-Term Memory) language model for generating conversational text, conditioned on affect categories. Our proposed model, Affect-LM enables us to customize the degree of emotional content in generated sentences through an additional design parameter. Perception studies conducted using Amazon Mechanical Turk show that Affect-LM generates naturally looking emotional sentences without sacrificing grammatical correctness. Affect-LM also learns affect-discriminative word representations, and perplexity experiments show that additional affective information in conversational text can improve language model prediction

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)
    corecore