4,115 research outputs found
Extracting speech text from comics
Overall, it has been challenging to find solutions able to correctly extract distinct types
of text balloons from any sort of comics, but in particulary from complex comic books.
The challenge comes from the fact that there is no general extraction algorithm in
the literature capable of handling any text balloons without making any assumption
regarding color depth of the image, orientation or language of the text. Even worse,
it is the fact that the comics art evolves over time, so that there is some degree of
unpredictability associated to comics. This means that, an algorithm may work well for
comic books released twenty years ago, but not so well for current comic books, even
considering they belong to the same category or series.
With this dissertation it is intended to present a possible solution to this problem, by
introducing an algorithm capable of extracting text balloons from comic book pages.
The presented algorithm, here called CCD (components and corners detection), relies in
the concept of corner detection to identify text snippets inside balloon candidates. So,
after discarding a significant number of regions that are not considered as tentative text
balloons for one reason or another, we look at the shape of the holes of the remaining
regions to check if they still hold a significant number of corners capable to make a
candidate be classified as text balloon.No geral, tem sido desafiante encontrar soluções capazes de extrair correctamente
distintos tipos de balões de texto a partir de qualquer tipo de banda desenhada, mas
particularmente da mais complexa. O desafio provém do facto de que não existe na literatura
um algoritmo capaz de lidar com quaisquer balões de texto sem fazer qualquer
suposição em relação à profundidade de cor da imagem, orientação ou linguagem do
texto. Pior ainda, Ă© o facto de que a arte da banda desenhada evolui ao longo do tempo,
o que faz com que exista um certo grau de imprevisibilidade associado aos livros. Isto
significa que, um algoritmo pode funcionar bem para livros de banda desenhada lançados
há vinte anos atrás, mas não tão bem para livros mais actuais, mesmo considerando
que eles pertencem à mesma categoria ou série.
Com esta dissertação pretende-se apresentar uma possĂvel solução para este problema,
ao introduzir um algoritmo capaz de extrair balões de texto de páginas de banda desenhada.
O algoritmo apresentado, aqui designado por CCD (components and corners
detection), baseia-se no conceito de detecção de cantos para identificar trechos de
texto dentro de componentes candidatos a balĂŁo. Assim sendo, depois de descartar um
número significativo de regiões que não são consideradas balões de texto por um ou
outro motivo, olhamos para a forma dos buracos das restantes regiões para verificar se
ainda possuem um nĂşmero significativo de cantos que seja capaz de fazer com que um
candidato seja classificado como balĂŁo de texto
The Unsupervised Acquisition of a Lexicon from Continuous Speech
We present an unsupervised learning algorithm that acquires a
natural-language lexicon from raw speech. The algorithm is based on the optimal
encoding of symbol sequences in an MDL framework, and uses a hierarchical
representation of language that overcomes many of the problems that have
stymied previous grammar-induction procedures. The forward mapping from symbol
sequences to the speech stream is modeled using features based on articulatory
gestures. We present results on the acquisition of lexicons and language models
from raw speech, text, and phonetic transcripts, and demonstrate that our
algorithm compares very favorably to other reported results with respect to
segmentation performance and statistical efficiency.Comment: 27 page technical repor
KETERAMPILAN MENULIS STRUKTUR DAN ISI TEKS PIDATO PERSUASIF SISWA KELAS X TB SMK PGRI 4 DENPASAR
The purpose of this study was to determine the skills, difficulties experienced and the factors that cause students to write skills in structure and content of persuasive speech text. The population in this study were all class X TB students of SMK PGRI 4 Denpasar in the 2020/2021 academic year, totaling 165 people from 4 classes. The sample in this study were students of class X TB SMK PGRI 4 Denpasar totaling 4 classes with a total of 62 students. In collecting data using the test method is the preparation of tests, implementation of tests, and scoring tests. The data processing method used in this research is descriptive statistical method. Based on these results it can be concluded. (1) The students' skills in writing the structure and content of persuasive speech text are good. (2) The difficulties faced by students in writing the structure and content of persuasive speech text, namely the title of the speech, the structure of the persuasive speech, the use of effective sentences, the choice of words, and the use of EYD. (3) The factors that cause difficulties experienced by students in writing the structure and content of a persuasive speech text can be seen from the results of interviews with 3 students each answering the use of effective sentences, good and standard word choices, and the use of EYD during the study
An Analysis of Conjunctions Found in Barack Obama’s Farewell Speech Text
In communication people use conjunctions in their speech, so did Barack Obama in his farewell speech. His speech consists of many various conjunctions that will be interesting to study to understand the connection between the sentences delivered by Him. The purpose of this study is to find out types of conjunction especially external conjunction, internal conjunction, and continuatives in Barack Obama’s farewell speech text based on semantics naming. It used the descriptive qualitative method. The instruments were researchers themselves, table, Barack Obama’s speech text. The data collection were taken from were taken from the internet, it was the President of the United State of America (period: 2009-2017) Barack Obama's speech text of his farewell in Chicago. To analyze all data, this study used the theory of conjunction by Martin and Rose (2007). The data were identified types of conjunction, then were classified, were displayed, and were described. The finding out of words that were analyzed conjunctions and continuatives were 18 (eighteen) types of external conjunctions, 8 (eight) types of internal conjunctions and 3 (three) continuatives
Ideology and Power Relations in Abubakar Shekau’s Speech Texts
This paper explores the phenomenon of power that exists in Abubakar Shekau’s speech text. Drawing from Halliday’s Systemic functional grammar and Norman Fairclough’s perspective on language and ideology Shekau’s speech text was analyzed. The analysis is done using Information structure model of Theme and Rheme to explicate the ideologies embedded in the texts.Results showed that Shekau’s ideologies can be grouped into two namely divinity ideologies and political ideologies .The power-relations are three dimensional –that between Shekau and his God, Shekau and his followers and Shekau and the Nigerian government. The ideologies and power-relations are embedded in shekau’s use of personal pronouns
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
This paper proposes Virtuoso, a massively multilingual speech-text joint
semi-supervised learning framework for text-to-speech synthesis (TTS) models.
Existing multilingual TTS typically supports tens of languages, which are a
small fraction of the thousands of languages in the world. One difficulty to
scale multilingual TTS to hundreds of languages is collecting high-quality
speech-text paired data in low-resource languages. This study extends Maestro,
a speech-text joint pretraining framework for automatic speech recognition
(ASR), to speech generation tasks. To train a TTS model from various types of
speech and text data, different training schemes are designed to handle
supervised (paired TTS and ASR data) and unsupervised (untranscribed speech and
unspoken text) datasets. Experimental evaluation shows that 1) multilingual TTS
models trained on Virtuoso can achieve significantly better naturalness and
intelligibility than baseline ones in seen languages, and 2) they can
synthesize reasonably intelligible and naturally sounding speech for unseen
languages where no high-quality paired TTS data is available.Comment: Submitted to ICASSP 202
Analysis of Joint Speech-Text Embeddings for Semantic Matching
Embeddings play an important role in many recent end-to-end solutions for
language processing problems involving more than one data modality. Although
there has been some effort to understand the properties of single-modality
embedding spaces, particularly that of text, their cross-modal counterparts are
less understood. In this work, we study a joint speech-text embedding space
trained for semantic matching by minimizing the distance between paired
utterance and transcription inputs. This was done through dual encoders in a
teacher-student model setup, with a pretrained language model acting as the
teacher and a transformer-based speech encoder as the student. We extend our
method to incorporate automatic speech recognition through both pretraining and
multitask scenarios and found that both approaches improve semantic matching.
Multiple techniques were utilized to analyze and evaluate cross-modal semantic
alignment of the embeddings: a quantitative retrieval accuracy metric,
zero-shot classification to investigate generalizability, and probing of the
encoders to observe the extent of knowledge transfer from one modality to
another.Comment: Submitted to INTERSPEECH 2022 for revie
- …