12 research outputs found

    Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

    Get PDF
    A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

    Prosodic Predictors of Upcoming Positive or Negative Content in Spoken Messages

    Get PDF
    This article examines potential prosodic predictors of emotional speech in utterances perceived as conveying that good or bad news is about to be delivered. Speakers were asked to call an experimental confederate to inform her about whether or not she had been given a job she had applied for. A perception study was then performed in which initial fragments of the recorded utterances, not containing any explicit lexical cues to emotional content, were presented to listeners who had to rate whether good or bad news would follow the utterance. The utterances were then examined to discover acoustic and prosodic features that distinguished between good and bad news. It was found that speakers in the production study were not simply reflecting their own positive or negative mood during the experiment, but rather appeared to be influenced by the valence of the positive or negative message they were preparing to deliver. Positive and negative utterances appeared to be judged differently with respect to a number of perceived attributes of the speakers’ voices (like sounding hesitant or nervous). These attributes correlated with a number of automatically obtained acoustic features

    A prosódia do metadiscurso: uma análise a partir dos dados do NURC Digital Recife

    Get PDF
    This work has investigated the prosody of metadiscursivity. The aim was to analyze the prosodic characteristics of metadiscursive utterances when coming between two non-metadiscursive ones. The hypothesis which led to this work was that there would be f0 and durational patterns in the prosodic realization of metadiscursive statements in the speech context mentioned above. With that in mind, this study analyzed pitch range, pitch reset, intonational distribution, boundary tones, speech rate, and occurrence of pauses and their duration. Those analyses had as support the prosodic phonology and metric and self-segmentation of intonation theories. Seven surveys were selected from the NURC Digital Project portal, from which it took excerpts with three utterances (the pre-metadiscursive one, the metadiscursive one, and the pos-metadiscursive one). Statistical analyzes used the mixed linear model and the binomial logistic regression. The findings evidence that the metadiscourse is produced as an independent structure of those adjacent to it. That independence is evidenced by a higher speech rate and by non-low border tones in their limits. Beyond this, it was found, in about half of the analyzed contexts, a co-occurrence of silent pauses and non-low border tones within the terms of the metadiscursive statements. Pitch range and intonational distribution patterns were not observed though. Although pitch reset has been observed among utterances, its performance was not significant. These results contribute to the description of metadiscursivity's prosody of Brazilian Portuguese, as well as to the teaching of the constitution and functioning of discourse genres of orality.Este trabalho investigou a prosódia da metadiscursividade. O objetivo foi analisar as características prosódicas dos enunciados metadiscursivos ladeados por não-metadiscursivos. Partindo da hipótese de que existiriam padrões de f0 e de duração que permitiriam caracterizar o enunciado metadiscursivo no contexto de fala relatado acima, este estudo analisou as propriedades de pitch range, pitch reset, distribuição entoacional, tons de fronteira, taxa de elocução e ocorrência e duração de pausas. As análises tomaram por fundamento as teorias de fonologia prosódica e autossegmental e métrica da entoação. Sete inquéritos foram selecionados do portal do Projeto NURC Digital, dos quais foram retirados trechos com três enunciados (o pré-metadiscursivo, o metadiscursivo e o pós-metadiscursivo). As análises estatísticas lançaram mão dos testes modelo linear misto e regressão logística binomial. A investigação constatou que os enunciados metadiscursivos são realizados, prosodicamente, como estruturas independentes das demais que o ladeiam. Essa independência é evidenciada por meio de uma taxa de elocução maior e por tons de fronteira não-baixos em suas delimitações. Somado a isso, constatou também, em cerca da metade dos contextos analisados, um concurso de pausas silenciosas e tons de fronteira não-baixos nos limites do enunciado metadiscursivo. Não foram observados, no entanto, padrões de pitch range e de distribuição entoacional associados a este tipo de enunciado. Embora houvesse pitch reset entre os enunciados, a atuação desse elemento prosódico não foi significativa. Esses resultados contribuem para a descrição da prosódia da metadiscursividade no português brasileiro e para o ensino da constituição e do funcionamento dos gêneros discursivos da oralidade

    Impresiones sobre los hablantes mediáticos a partir de la profesionalidad en su elocución y el contenido de su discurso

    Get PDF
    En este artículo se presenta una investigación experimental que observa la influencia de la profesionalidad en la locución, el contenido del habla y la percepción audiovisual en la formación de impresiones sobre hablantes mediáticos en el contexto comunicativo español. Los resultados de las pruebas estadísticas aportan información útil para comprender el proceso de percepción del habla audiovisual, de la formación de impresiones y de aspectos influyentes de la interpretación sonoro-visual de mensajes no improvisados por las audiencias de contextos comunicativos mediados (v. g. radio y televisión). Los datos aportados por el experimento se interpretan en relación con la teoría de la expectativa del lenguaje (language expectancy theory) que explica y predice las actitudes que tienen los receptores frente a los actos del habla.An experimental investigation is presented that observes the influence of expertise in speaking, the content of speech and audiovisual perception on the formation of impressions of media speakers in the context of Spanish communication. After perceiving each speaker, the subjects answered a questionnaire containing semantic differential questions. The results of the statistical tests provided useful information for the understanding of the process of understanding audiovisual speech, of the formation of impressions, and of factors influencing the audio-visual interpretation of non-improvised messages by the audiences of media communication contexts (i.e. radio and television). The data provided by the experiment is fundamentally interpreted in relation to the language expectancy theory, which explains, and predicts, the attitude that audiences have to the acts of speech

    Order and Disorder: International Conference Proceedings

    Get PDF
    36 national and international contributors attended the conference entitled Order and Disorder held at the University of Jendouba, Tunisia on 6-7th November 2015. These proceedings are a selection of the lectures, and seminars presented, looking at the ways that Order and Disorder are used in areas such as literature and linguistics

    Modelos entonativos para la segmentación automática de programas informativos en unidades-noticias /

    Get PDF
    2n Premi en els XXIII Premis CAC a la Investigació sobre Comunicació AudiovisualEsta tesis pretende descubrir las formas sonoras prosódicas típicas de la noticia en sus límites de principio y de final. En el marco de una investigación aplicada, consistente en el desarrollo de una aplicación automática para la segmentación de noticias, se ha trabajado con las variables tono, ritmo e intensidad de la locución de noticias en informativos televisivos de tres idiomas, como indicadores de los cambios de noticia. Para ello, hemos utilizado una metodología instrumental que, por una parte, considera la praxis de la noticia, y, por otra, las formas de análisis y representación prosódicas. Por lo tanto, en primer lugar, hemos localizado los niveles discursivos de la noticia -fonología, léxico, sintaxis, semántica y pragmática-, y a continuación hemos considerado el uso que de ellos se hace en los "modos de producción de la noticia", desde su nacimiento como hecho noticioso y hasta su enunciación oral ante las cámaras. En segundo lugar, apoyándonos en teorías y análisis de procesamiento del discurso oral, hemos estudiado el tipo de procesamiento que los telespectadores hacen de la noticia como discurso de información nueva, temática y discursiva. Este Marco Teórico general ha dado como resultados las claves estructurales y discursivas (pragmáticas) de la noticia, lo cual ha quedado reflejado en un Modelo Estructural de la noticia. Dicho modelo representa las bases acústicas de la noticia (oral) en televisión como resultado de todo un proceso de configuración del mensaje, participado por tres actores (gatekeeper, redactor y presentador), que a su vez da lugar a una estructura en tres niveles (información, texto o estructura y superestructura). A continuación, se han operacionalizado las variables de estudio, ya que su análisis y representación están totalmente ligados al objeto de estudio. Para el estudio del nivel pragmático del discurso, hemos necesitado revisar la aproximación de la lingüística al análisis de la macroentonación de enunciados, los trabajos sobre ritmo de psicólogos y comunicólogos, y el complejo tratamiento de la intensidad, parámetro acústico escasamente estudiado en procesos de comunicación, y del que hemos generado curvas de intensidad experimentales según las funciones del lenguaje y las fases de la noticia, comparando estilos locutivos y asignando funciones estructurales de intensidad -afectiva, semántica y pragmática. Los resultados de este apartado han sido unas formas prosódicas que aglutinan patrones de variación de los tres parámetros. Esos patrones de variación son las variables que deben marcar el cambio de noticia en función de su variación de inicio o de final de noticia. Ahora bien, antes de implementarlas automáticamente y probar así su efectividad en un continuum de noticias, se ha hecho un estudio cualitativo y manual de 90 casos en que analizamos los tres parámetros y se miden las diferentes formas prosódicas. Se ha demostrado su correlación con el cambio o no de noticia y se ha definido una Hoja de Ruta del Algoritmo. Por fin, ese algoritmo ha sido implementado ad-hoc en el entorno virtual Labview, mediante la localización de pausas por bajadas de intensidad y la localización de datos de tono y parábolas de entonación mediante la transformada de la transformada (cepstrum) de los segmentos anteriores y posteriores de la pausa localizada. Esta plataforma virtual se ha probado para una muestra de 29 informativos reales, en tres lenguas (español, portugués, y catalán), y en informativos de dos canales por idioma. Los resultados demuestran un funcionamiento global deficiente, pues se demuestra muy dependiente del código lingüístico y del formato global del programa informativo. Las formas prosódicas parecen típicas del discurso noticia, pero vagas en su capacidad de aglutinar formatos y lenguas. No en vano, el algoritmo de segmentación de noticias funciona para los informativos en catalán, muestra del Estudio Cualitativo y la Hoja de Ruta. Futuros estudios deberán definir formas prosódicas teniendo en cuenta códigos lingüísticos y formatos o géneros televisivos, y la implementación automática deberá también analizar las formas prosódicas a lo largo de toda la noticia (no sólo en los cortes).This thesis tries to find out the typical prosodic forms of every piece of news at its beginning and end. In the framework of an applied research that aims at developing voice recognition software for news segmentation one has reviewed the variables pitch, rhythm and intensity, and how they are articulated in the announcing of TV news programs in three languages, as indicators of the change of piece of news. For that, an instrumental methodology has been applied which, on one hand, considers the praxis of the piece of news itself, and, on the other, the prosodic forms of analysis and representation. Therefore, in first place, we have identified the discourse levels of the piece of news - phonology, lexical, syntactical, semantics and pragmatics-, and then we have reviewed the use of them in the "Modes of News Production", from the moment that newsworthy events happen to their announcing in front of cameras. Secondly, with the support of theories and analysis for the linguistic processing of the oral discourse, we have reviewed the type of processing that TV spectators do of the news as made of new, thematic and discursive information. This general state of the art has resulted in a Superstructural Model of the News. This model represents the acoustic basis of the TV spoken news as a result of a whole process of message configuration, participated by three actors (gatekeeper, writer and "presenter-anchorman"), who bring together a three level structure (information, text or structure and superstructure). Next, the variables of study were operationalized according to the type of acoustic analysis the object of study required. For that, it has been required a linguistic approach to macro intonation analysis of statements, works on rhythm from psychologists and communication scholars, and the complex treatment of intensity (acoustic parameter with little research so far, and of which we have experimented with curves of the different functions of language and the phases of the pieces of news, by comparing locution styles and assigning structural function to the intensity - affective, semantics and pragmatics). Results show that all these three prosodic forms get together patterns of variation of the three original variables. These patterns of variation are the "complex variables" that should determine the change of pieces of news because they are specific of the beginnings and ends of the news. Nevertheless, before its automatic implementation in software, se have carried out a qualitative study of 90 fragments of locution, in which the three parameters are analyzed and the prosodic forms measured. Many correlations of the concurrence of prosodic forms have been defined in a Roadmap of the Algorithm to indicate the change of the piece of news. At last, this algorithm has been implemented ad-hoc in the virtual platform Labview, by spotting pauses (drops of intensity) and the analysis of intonation slopes (parabolas) through the transformed of the transformed (cepstrum) of the previous and posterior segments of those pauses. This virtual platform is proved for a sample of 29 TV news programs, in three languages (Spanish, Portuguese and Catalan), and in news programs of two channels per language. Results show a deficient functioning of the algorithm applied to the general corpus, as it has been proved very dependent on linguistic codes and global format of the news programs. The prosodic forms seem specific of the news discourse, but are week to characterize different formats and languages. In fact, the algorithm of segmentation does work for every format of news programs in Catalan (corpus used in the Qualitative Study, from which the Roadmap of the Algorithm was defined). Future studies should define prosodic forms considering linguistic codes and formats of television genres, and the automatic implementation should also analyze prosodic forms for the whole piece of news (not only the "cut moments")
    corecore