4,115 research outputs found

    Extracting speech text from comics

    Get PDF
    Overall, it has been challenging to find solutions able to correctly extract distinct types of text balloons from any sort of comics, but in particulary from complex comic books. The challenge comes from the fact that there is no general extraction algorithm in the literature capable of handling any text balloons without making any assumption regarding color depth of the image, orientation or language of the text. Even worse, it is the fact that the comics art evolves over time, so that there is some degree of unpredictability associated to comics. This means that, an algorithm may work well for comic books released twenty years ago, but not so well for current comic books, even considering they belong to the same category or series. With this dissertation it is intended to present a possible solution to this problem, by introducing an algorithm capable of extracting text balloons from comic book pages. The presented algorithm, here called CCD (components and corners detection), relies in the concept of corner detection to identify text snippets inside balloon candidates. So, after discarding a significant number of regions that are not considered as tentative text balloons for one reason or another, we look at the shape of the holes of the remaining regions to check if they still hold a significant number of corners capable to make a candidate be classified as text balloon.No geral, tem sido desafiante encontrar soluções capazes de extrair correctamente distintos tipos de balões de texto a partir de qualquer tipo de banda desenhada, mas particularmente da mais complexa. O desafio provém do facto de que não existe na literatura um algoritmo capaz de lidar com quaisquer balões de texto sem fazer qualquer suposição em relação à profundidade de cor da imagem, orientação ou linguagem do texto. Pior ainda, é o facto de que a arte da banda desenhada evolui ao longo do tempo, o que faz com que exista um certo grau de imprevisibilidade associado aos livros. Isto significa que, um algoritmo pode funcionar bem para livros de banda desenhada lançados há vinte anos atrás, mas não tão bem para livros mais actuais, mesmo considerando que eles pertencem à mesma categoria ou série. Com esta dissertação pretende-se apresentar uma possível solução para este problema, ao introduzir um algoritmo capaz de extrair balões de texto de páginas de banda desenhada. O algoritmo apresentado, aqui designado por CCD (components and corners detection), baseia-se no conceito de detecção de cantos para identificar trechos de texto dentro de componentes candidatos a balão. Assim sendo, depois de descartar um número significativo de regiões que não são consideradas balões de texto por um ou outro motivo, olhamos para a forma dos buracos das restantes regiões para verificar se ainda possuem um número significativo de cantos que seja capaz de fazer com que um candidato seja classificado como balão de texto

    The Unsupervised Acquisition of a Lexicon from Continuous Speech

    Get PDF
    We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

    KETERAMPILAN MENULIS STRUKTUR DAN ISI TEKS PIDATO PERSUASIF SISWA KELAS X TB SMK PGRI 4 DENPASAR

    Get PDF
    The purpose of this study was to determine the skills, difficulties experienced and the factors that cause students to write skills in structure and content of persuasive speech text. The population in this study were all class X TB students of SMK PGRI 4 Denpasar in the 2020/2021 academic year, totaling 165 people from 4 classes. The sample in this study were students of class X TB SMK PGRI 4 Denpasar totaling 4 classes with a total of 62 students. In collecting data using the test method is the preparation of tests, implementation of tests, and scoring tests. The data processing method used in this research is descriptive statistical method. Based on these results it can be concluded. (1) The students' skills in writing the structure and content of persuasive speech text are good. (2) The difficulties faced by students in writing the structure and content of persuasive speech text, namely the title of the speech, the structure of the persuasive speech, the use of effective sentences, the choice of words, and the use of EYD. (3) The factors that cause difficulties experienced by students in writing the structure and content of a persuasive speech text can be seen from the results of interviews with 3 students each answering the use of effective sentences, good and standard word choices, and the use of EYD during the study

    An Analysis of Conjunctions Found in Barack Obama’s Farewell Speech Text

    Get PDF
    In communication people use conjunctions in their speech, so did Barack Obama in his farewell speech. His speech consists of many various conjunctions that will be interesting to study to understand the connection between the sentences delivered by Him. The purpose of this study is to find out types of conjunction especially external conjunction, internal conjunction, and continuatives in Barack Obama’s farewell speech text based on semantics naming. It used the descriptive qualitative method. The instruments were researchers themselves, table, Barack Obama’s speech text. The data collection were taken from were taken from the internet, it was the President of the United State of America (period: 2009-2017) Barack Obama's speech text of his farewell in Chicago. To analyze all data, this study used the theory of conjunction by Martin and Rose (2007). The data were identified types of conjunction, then were classified, were displayed, and were described. The finding out of words that were analyzed conjunctions and continuatives were 18 (eighteen) types of external conjunctions, 8 (eight) types of internal conjunctions and 3 (three) continuatives

    Ideology and Power Relations in Abubakar Shekau’s Speech Texts

    Get PDF
    This paper explores the phenomenon of power that exists in Abubakar Shekau’s speech text. Drawing from Halliday’s Systemic functional grammar and Norman Fairclough’s perspective on language and ideology Shekau’s speech text was analyzed. The analysis is done using Information structure model of Theme and Rheme to explicate the ideologies embedded in the texts.Results showed that Shekau’s ideologies can be grouped into two namely divinity ideologies and political ideologies .The power-relations are three dimensional –that between Shekau and his God, Shekau and his followers and Shekau and the Nigerian government. The ideologies and power-relations are embedded in shekau’s use of personal pronouns

    Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

    Full text link
    This paper proposes Virtuoso, a massively multilingual speech-text joint semi-supervised learning framework for text-to-speech synthesis (TTS) models. Existing multilingual TTS typically supports tens of languages, which are a small fraction of the thousands of languages in the world. One difficulty to scale multilingual TTS to hundreds of languages is collecting high-quality speech-text paired data in low-resource languages. This study extends Maestro, a speech-text joint pretraining framework for automatic speech recognition (ASR), to speech generation tasks. To train a TTS model from various types of speech and text data, different training schemes are designed to handle supervised (paired TTS and ASR data) and unsupervised (untranscribed speech and unspoken text) datasets. Experimental evaluation shows that 1) multilingual TTS models trained on Virtuoso can achieve significantly better naturalness and intelligibility than baseline ones in seen languages, and 2) they can synthesize reasonably intelligible and naturally sounding speech for unseen languages where no high-quality paired TTS data is available.Comment: Submitted to ICASSP 202

    Analysis of Joint Speech-Text Embeddings for Semantic Matching

    Full text link
    Embeddings play an important role in many recent end-to-end solutions for language processing problems involving more than one data modality. Although there has been some effort to understand the properties of single-modality embedding spaces, particularly that of text, their cross-modal counterparts are less understood. In this work, we study a joint speech-text embedding space trained for semantic matching by minimizing the distance between paired utterance and transcription inputs. This was done through dual encoders in a teacher-student model setup, with a pretrained language model acting as the teacher and a transformer-based speech encoder as the student. We extend our method to incorporate automatic speech recognition through both pretraining and multitask scenarios and found that both approaches improve semantic matching. Multiple techniques were utilized to analyze and evaluate cross-modal semantic alignment of the embeddings: a quantitative retrieval accuracy metric, zero-shot classification to investigate generalizability, and probing of the encoders to observe the extent of knowledge transfer from one modality to another.Comment: Submitted to INTERSPEECH 2022 for revie
    • …
    corecore