7 research outputs found

    Ontologiebasierte Indexierung und Kontextualisierung multimedialer Dokumente fĂŒr das persönliche Wissensmanagement

    Get PDF
    Die Verwaltung persönlicher, multimedialer Dokumente kann mit Hilfe semantischer Technologien und Ontologien intelligent und effektiv unterstĂŒtzt werden. Dies setzt jedoch Verfahren voraus, die den grundlegenden Annotations- und Bearbeitungsaufwand fĂŒr den Anwender minimieren und dabei eine ausreichende DatenqualitĂ€t und -konsistenz sicherstellen. Im Rahmen der Dissertation wurden notwendige Mechanismen zur semi-automatischen Modellierung und Wartung semantischer Dokumentenbeschreibungen spezifiziert. Diese bildeten die Grundlage fĂŒr den Entwurf einer komponentenbasierten, anwendungsunabhĂ€ngigen Architektur als Basis fĂŒr die Entwicklung innovativer, semantikbasierter Lösungen zur persönlichen Dokumenten- und Wissensverwaltung.Personal multimedia document management benefits from Semantic Web technologies and the application of ontologies. However, an ontology-based document management system has to meet a number of challenges regarding flexibility, soundness, and controllability of the semantic data model. The first part of the dissertation proposes necessary mechanisms for the semi-automatic modeling and maintenance of semantic document descriptions. The second part introduces a component-based, application-independent architecture which forms the basis for the development of innovative, semantic-driven solutions for personal document and information management

    Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled

    Get PDF
    In this thesis, research on large vocabulary continuous speech recognition for unknown audio conditions is presented. For automatic speech recognition systems based on statistical methods, it is important that the conditions of the audio used for training the statistical models match the conditions of the audio to be processed. Any mismatch will decrease the accuracy of the recognition. If it is unpredictable what kind of data can be expected, or in other words if the conditions of the audio to be processed are unknown, it is impossible to tune the models. If the material consists of `surprise data' the output of the system is likely to be poor. In this thesis methods are presented for which no external training data is required for training models. These novel methods have been implemented in a large vocabulary continuous speech recognition system called SHoUT. This system consists of three subsystems: speech/non-speech classification, speaker diarization and automatic speech recognition. The speech/non-speech classification subsystem separates speech from silence and unknown audible non-speech events. The type of non-speech present in audio recordings can vary from paper shuffling in recordings of meetings to sound effects in television shows. Because it is unknown what type of non-speech needs to be detected, it is not possible to train high quality statistical models for each type of non-speech sound. The speech/non-speech classification subsystem, also called the speech activity detection subsystem, does not attempt to classify all audible non-speech in a single run. Instead, first a bootstrap speech/silence classification is obtained using a standard speech activity component. Next, the models for speech, silence and audible non-speech are trained on the target audio using the bootstrap classification. This approach makes it possible to classify speech and non-speech with high accuracy, without the need to know what kinds of sound are present in the audio recording. Once all non-speech is filtered out of the audio, it is the task of the speaker diarization subsystem to determine how many speakers occur in the recording and exactly when they are speaking. The speaker diarization subsystem applies agglomerative clustering to create clusters of speech fragments for each speaker in the recording. First, statistical speaker models are created on random chunks of the recording and by iteratively realigning the data, retraining the models and merging models that represent the same speaker, accurate speaker models are obtained for speaker clustering. This method does not require any statistical models developed on a training set, which makes the diarization subsystem insensitive for variation in audio conditions. Unfortunately, because the algorithm is of complexity O(n3)O(n^3), this clustering method is slow for long recordings. Two variations of the subsystem are presented that reduce the needed computational effort, so that the subsystem is applicable for long audio recordings as well. The automatic speech recognition subsystem developed for this research, is based on Viterbi decoding on a fixed pronunciation prefix tree. Using the fixed tree, a flexible modular decoder could be developed, but it was not straightforward to apply full language model look-ahead efficiently. In this thesis a novel method is discussed that makes it possible to apply language model look-ahead effectively on the fixed tree. Also, to obtain higher speech recognition accuracy on audio with unknown acoustical conditions, a selection from the numerous known methods that exist for robust automatic speech recognition is applied and evaluated in this thesis. The three individual subsystems as well as the entire system have been successfully evaluated on three international benchmarks. The diarization subsystem has been evaluated at the NIST RT06s benchmark and the speech activity detection subsystem has been tested at RT07s. The entire system was evaluated at N-Best, the first automatic speech recognition benchmark for Dutch

    Framing opposition to surveillance - Political communication strategies of privacy activists in the aftermath of the Snowden leaks

    Get PDF
    When in the summer of 2013 whistleblower Edward Snowden revealed the scope of the mass surveillance programs conducted by the National Security Agency and its international partners, privacy activists launched several global online and offline campaigns to protect privacy and resist surveillance. Applying methods of social movement frame and discourse analysis, the dissertation seeks to analyze the various ways activists have tried to shape the privacy discourse in a post 9/11 ‘Surveillance Society.’ A close reading of activist materials and texts over the course of four campaigns – “Restore the Fourth,” “Stop Watching Us,” “The Day We Fight Back,” and “Reset the Net” – reveals a set of frame packages, which are juxtaposed with the media coverage the campaigns have generated. In subsequent semistructured interviews with 21 activists from 14 countries, participants involved in the protest events were asked to critically reflect on framing choices, media dynamics and the degree of transnational cooperation among various privacy advocacy groups. The dissertation contributes to the field of grass roots political communication research by discussing the potentials and limits of anti-surveillance frames as well as providing a cultural and oral history of organized resistance against surveillance in the post-Snowden world

    Mediacampaign — A multimodal semantic analysis system for advertisement campaign detection

    No full text
    MediaCampaign's scope is on discovering and inter-relating advertisements and campaigns, i.e. to relate advertisements semantically belonging together, across different countries and different media. The project’s main goal is to automate to a large degree the detection and tracking of advertisement campaigns on television, Internet and in the press. For this purpose we introduce a first prototype of a fully integrated semantic analysis system based on an ontology which automatically detects new creatives and campaigns by utilizing a multimodal analysis system and a framework for the resolution of semantic identity
    corecore