122,526 research outputs found

    Segmentation and indexation of complex objects in comic book

    Get PDF
    Born in the 19th century, comics is a visual medium used to express ideas via images, often combined with text or visual information.It is an art form that uses images deployed in sequence for graphic storytelling (sequential art), spread worldwide initially using newspapers, books and magazines.Nowadays, the development of the new technologies and the World Wide Web is giving birth to a new form of paperless comics that takes advantage of the virtual world freedom.However, traditional comics still represent an important cultural heritage in many countries.They have not yet received the same level of attention as music, cinema or literature about their adaptation to the digital format.Using information technologies with digitized comic books would facilitate the exploration of digital libraries, accelerate their translation, allow augmented reading, speech playback for the visually impaired etc.Heritage museums such as the CIBDI (French acronym for International City of Comic books and Images), the Kyoto International Manga Museum and The Digital Comic Museum have already digitized several thousands of comic albums that some are now in the public domain.Despite the growing market place of digital comics, few research has been carried out to take advantage of the added value provided by these new media.A particularity of documents is their dependence on the type of document that often requires specific processing.The challenge of document analysis systems is to propose generic solutions for specific problems.The design process of comics is so specific that their automated analysis may be seen as a niche research field within document analysis, at the intersection of complex background, semi-structured and mixed content documents.Being at the intersection of several fields combines their difficulties.In this thesis, we review, highlight and illustrate the challenges related to comic book image analysis in order to provide a good overview about the last research progress in this field and the current issues.In order to cover the widest possible scope of study, we propose three different approaches for comic book image analysis.The three approaches aim to provide an automatic description of the image content.Different levels of description are discussed, from spacial positions (low level) to semantic information (high level).The first approach describes the image in an intuitive way, from simple to complex elements using previously extracted elements to guide further processing.Simple elements such as panel, text and balloon regions are extracted first, followed by balloon tails and comic character positions from the direction indicated by the tails.The second approach addresses independent information extraction to recover the main drawback of the first approach: error propagation.This second method is composed by several specific extractors for each type of content, independent from each other.Those extractors can be used in parallel, without needing previous information which cancels the error propagation effect.Extra processing such as balloon type classification and text recognition are also covered.The third approach introduces a knowledge-driven system that combines low and high level processing to build a scalable system for comics image understanding.This approach is intended to improve the overall precision of content extraction methods.We built an expert system composed by an inference engine and two models, one for comics domain and another one for image processing, stored in an ontology.The first model embeds the knowledge about comic books and the second models the image processing related part.These two models allow consistency analysis of extracted information and inference of the relationships between all the extracted elements such as the reading order, the type of text (e.g. spoken, onomatopoeic, illustrative) and the relations between speech balloons and speaking characters.The expert system combines the benefits of the two first approaches and enables high level semantic description such as the reading order, the semantic of the balloon shapes, the relations between the speech balloons and their speakers, and the interaction between the comic characters.Apart from that, in this thesis we have provided the first public comic book image dataset and ground truth to the community along with an overall experimental comparison of all the proposed methods and some of the state-of-the-art method

    Information theoretic feature extraction for audio-visual speech recognition

    Get PDF
    The problem of feature selection has been thoroughly analyzed in the context of pattern classification, with the purpose of avoiding the curse of dimensionality. However, in the context of multimodal signal processing, this problem has been studied less. Our approach to feature extraction is based on information theory, with an application on multimodal classification, in particular audio-visual speech recognition. Contrary to previous work in information theoretic feature selection applied to multimodal signals, our proposed methods penalize features for their redundancy, achieving more compact feature sets and better performance. We propose two greedy selection algorithms, one that penalizes a proportion of feature redundancy, while the other uses conditional mutual information as an evaluation measure, for the selection of visual features for audio-visual speech recognition. Our features perform better than linear discriminant analysis, the most usual transform for dimensionality reduction in the field, across a wide range of dimensionality values and combined with audio at different quality levels

    Arabic nested noun compound extraction based on linguistic features and statistical measures

    Get PDF
    The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, statistical methods, or a hybrid of both. A wide range of the existing approaches concentrate on the extraction of the bi-gram or tri-gram noun compound. Nonetheless, extracting a 4-gram or 5-gram nested noun compound is a challenging task due to the morphological, orthographic, syntactic and semantic variations. Many features have an important effect on the efficiency of extracting a noun compound such as unit-hood, contextual information, and term-hood. Hence, there is a need to improve the effectiveness of the Arabic nested noun compound extraction. Thus, this paper proposes a hybrid linguistic approach and a statistical method with a view to enhance the extraction of the Arabic nested noun compound. A number of pre-processing phases are presented, including transformation, tokenization, and normalisation. The linguistic approaches that have been used in this study consist of a part-of-speech tagging and the named entities pattern, whereas the proposed statistical methods that have been used in this study consist of the NC-value, NTC-value, NLC-value, and the combination of these association measures. The proposed methods have demonstrated that the combined association measures have outperformed the NLC-value, NTC-value, and NC-value in terms of nested noun compound extraction by achieving 90%, 88%, 87%, and 81% for bigram, trigram, 4-gram, and 5-gram, respectively

    Deriving and Exploiting Situational Information in Speech: Investigations in a Simulated Search and Rescue Scenario

    Get PDF
    The need for automatic recognition and understanding of speech is emerging in tasks involving the processing of large volumes of natural conversations. In application domains such as Search and Rescue, exploiting automated systems for extracting mission-critical information from speech communications has the potential to make a real difference. Spoken language understanding has commonly been approached by identifying units of meaning (such as sentences, named entities, and dialogue acts) for providing a basis for further discourse analysis. However, this fine-grained identification of fundamental units of meaning is sensitive to high error rates in the automatic transcription of noisy speech. This thesis demonstrates that topic segmentation and identification techniques can be employed for information extraction from spoken conversations by being robust to such errors. Two novel topic-based approaches are presented for extracting situational information within the search and rescue context. The first approach shows that identifying the changes in the context and content of first responders' report over time can provide an estimation of their location. The second approach presents a speech-based topological map estimation technique that is inspired, in part, by automatic mapping algorithms commonly used in robotics. The proposed approaches are evaluated on a goal-oriented conversational speech corpus, which has been designed and collected based on an abstract communication model between a first responder and a task leader during a search process. Results have confirmed that a highly imperfect transcription of noisy speech has limited impact on the information extraction performance compared with that obtained on the transcription of clean speech data. This thesis also shows that speech recognition accuracy can benefit from rescoring its initial transcription hypotheses based on the derived high-level location information. A new two-pass speech decoding architecture is presented. In this architecture, the location estimation from a first decoding pass is used to dynamically adapt a general language model which is used for rescoring the initial recognition hypotheses. This decoding strategy has resulted in a statistically significant gain in the recognition accuracy of the spoken conversations in high background noise. It is concluded that the techniques developed in this thesis can be extended to more application domains that deal with large volumes of natural spoken conversations

    A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

    Full text link
    Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research

    Emotion Recognition from Acted and Spontaneous Speech

    Get PDF
    Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.
    corecore