409 research outputs found

    Accessing spoken interaction through dialogue processing [online]

    Get PDF
    Zusammenfassung Unser Leben, unsere Leistungen und unsere Umgebung, alles wird derzeit durch Schriftsprache dokumentiert. Die rasante Fortentwicklung der technischen Möglichkeiten Audio, Bilder und Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt werden um die schriftliche Dokumentation von menschlicher Kommunikation, zum Beispiel Meetings, zu unterstützen, zu ergänzen oder gar zu ersetzen. Diese neuen Technologien können uns in die Lage versetzen Information aufzunehmen, die anderweitig verloren gehen, die Kosten der Dokumentation zu senken und hochwertige Dokumente mit audiovisuellem Material anzureichern. Die Indizierung solcher Aufnahmen stellt die Kerntechnologie dar um dieses Potential auszuschöpfen. Diese Arbeit stellt effektive Alternativen zu schlüsselwortbasierten Indizes vor, die Suchraumeinschränkungen bewirken und teilweise mit einfachen Mitteln zu berechnen sind. Die Indizierung von Sprachdokumenten kann auf verschiedenen Ebenen erfolgen: Ein Dokument gehört stilistisch einer bestimmten Datenbasis an, welche durch sehr einfache Merkmale bei hoher Genauigkeit automatisch bestimmt werden kann. Durch diese Art von Klassifikation kann eine Reduktion des Suchraumes um einen Faktor der Größenordnung 4­10 erfolgen. Die Anwendung von thematischen Merkmalen zur Textklassifikation bei einer Nachrichtendatenbank resultiert in einer Reduktion um einen Faktor 18. Da Sprachdokumente sehr lang sein können müssen sie in thematische Segmente unterteilt werden. Ein neuer probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia­ tive und Stil) liefern vergleichbare oder bessere Resultate als traditionelle schlüsselwortbasierte Ansätze. Diese thematische Segmente können durch die vorherrschende Aktivität charakterisiert werden (erzählen, diskutieren, planen, ...), die durch ein neuronales Netz detektiert werden kann. Die Detektionsraten sind allerdings begrenzt da auch Menschen diese Aktivitäten nur ungenau bestimmen. Eine maximale Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten Daten theoretisch möglich. Eine thematische Klassifikation dieser Segmente wurde ebenfalls auf einer Datenbasis durchgeführt, die Detektionsraten für diesen Index sind jedoch gering. Auf der Ebene der einzelnen Äußerungen können Dialogakte wie Aussagen, Fragen, Rückmeldungen (aha, ach ja, echt?, ...) usw. mit einem diskriminativ trainierten Hidden Markov Model erkannt werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen wie Frage/Antwort­Spielen erweitert werden (Dialogspiele). Dialogakte und ­spiele können eingesetzt werden um Klassifikatoren für globale Sprechstile zu bauen. Ebenso könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz erinnern und versuchen, diese in einer grafischen Repräsentation wiederzufinden. In einer Studie mit sehr pessimistischen Annahmen konnten Benutzer eines aus vier ähnlichen und gleichwahrscheinlichen Gesprächen mit einer Genauigkeit von ~ 43% durch eine graphische Repräsentation von Aktivität bestimmt. Dialogakte könnte in diesem Szenario ebenso nützlich sein, die Benutzerstudie konnte aufgrund der geringen Datenmenge darüber keinen endgültigen Aufschluß geben. Die Studie konnte allerdings für detailierte Basismerkmale wie Formalität und Sprecheridentität keinen Effekt zeigen. Abstract Written language is one of our primary means for documenting our lives, achievements, and environment. Our capabilities to record, store and retrieve audio, still pictures, and video are undergoing a revolution and may support, supplement or even replace written documentation. This technology enables us to record information that would otherwise be lost, lower the cost of documentation and enhance high­quality documents with original audiovisual material. The indexing of the audio material is the key technology to realize those benefits. This work presents effective alternatives to keyword based indices which restrict the search space and may in part be calculated with very limited resources. Indexing speech documents can be done at a various levels: Stylistically a document belongs to a certain database which can be determined automatically with high accuracy using very simple features. The resulting factor in search space reduction is in the order of 4­10 while topic classification yielded a factor of 18 in a news domain. Since documents can be very long they need to be segmented into topical regions. A new probabilistic segmentation framework as well as new features (speaker initiative and style) prove to be very effective compared to traditional keyword based methods. At the topical segment level activities (storytelling, discussing, planning, ...) can be detected using a machine learning approach with limited accuracy; however even human annotators do not annotate them very reliably. A maximum search space reduction factor of 6 is theoretically possible on the databases used. A topical classification of these regions has been attempted on one database, the detection accuracy for that index, however, was very low. At the utterance level dialogue acts such as statements, questions, backchannels (aha, yeah, ...), etc. are being recognized using a novel discriminatively trained HMM procedure. The procedure can be extended to recognize short sequences such as question/answer pairs, so called dialogue games. Dialog acts and games are useful for building classifiers for speaking style. Similarily a user may remember a certain dialog act sequence and may search for it in a graphical representation. In a study with very pessimistic assumptions users are able to pick one out of four similar and equiprobable meetings correctly with an accuracy ~ 43% using graphical activity information. Dialogue acts may be useful in this situation as well but the sample size did not allow to draw final conclusions. However the user study fails to show any effect for detailed basic features such as formality or speaker identity

    Analysis of Data Augmentation Methods for Low-Resource Maltese ASR

    Full text link
    Recent years have seen an increased interest in the computational speech processing of Maltese, but resources remain sparse. In this paper, we consider data augmentation techniques for improving speech recognition for low-resource languages, focusing on Maltese as a test case. We consider three different types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. The goal is to determine which of these techniques, or combination of them, is the most effective to improve speech recognition for languages where the starting point is a small corpus of approximately 7 hours of transcribed speech. Our results show that combining the data augmentation techniques studied here lead us to an absolute WER improvement of 15% without the use of a language model.Comment: 12 page

    Visual Propaganda and Extremism in the Online Environment

    Get PDF
    Visual images have been a central component of propaganda for as long as propaganda has been produced. But recent developments in communication and information technologies have given terrorist and extremist groups options and abilities they never would have been able to come close to even 5 or 10 years ago. There are terrorist groups who, with very little initial investment, are making videos that are coming so close to the quality of BBC or CNN broadcasts that the difference is meaningless, and with access to the web they have instantaneous access to a global audience. Given the broad social science consensus on the power of visual images relative to that of words, the strategic implications of these groups’ sophistication in the use of images in the online environment is carefully considered in a variety of contexts by the authors in this collection.https://press.armywarcollege.edu/monographs/1942/thumbnail.jp

    Subword-based approaches for spoken document retrieval

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 181-187).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for [\em a priori], or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four research issues are addressed. First, what are suitable subword units and how well can they perform? Second, how can these units be reliably extracted from the speech signal? Third, what is the behavior of the subword units when there are speech recognition errors and how well do they perform? And fourth, how can the indexing and retrieval methods be modified to take into account the fact that the speech recognition output will be errorful?(cont.) We first explore a range of subword units ofvarying complexity derived from error-free phonetic transcriptions and measure their ability to effectively index and retrieve speech messages. We find that many subword units capture enough information to perform effective retrieval and that it is possible to achieve performance comparable to that of text-based word units. Next, we develop a phonetic speech recognizer and process the spoken document collection to generate phonetic transcriptions. We then measure the ability of subword units derived from these transcriptions to perform spoken document retrieval and examine the effects of recognition errors on retrieval performance. Retrieval performance degrades for all subword units (to 60% of the clean reference), but remains reasonable for some subword units even without the use of any error compensation techniques. We then investigate a number of robust methods that take into account the characteristics of the recognition errors and try to compensate for them in an effort to improve spoken document retrieval performance when there are speech recognition errors. We study the methods individually and explore the effects of combining them. Using these robust methods improves retrieval performance by 23%. We also propose a novel approach to SDR where the speech recognition and information retrieval components are more tightly integrated.(cont.) This is accomplished by developing new recognizer and retrieval models where the interface between the two components is better matched and the goals of the two components are consistent with each other and with the overall goal of the combined system. Using this new integrated approach improves retrieval performance by 28%. ...by Kenney Ng.Ph.D

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    A Proposed Strategy for a Seventh-day Adventist Ministry on Community Radio in Norway

    Get PDF
    Problem. In December of 1981 the Norwegian government suddenly opened the possibility of Community Radio in Norway. This was a rather drastic decision since the country had had a government-sponsored radio monopoly for almost fifty years. All at once, Christian denominations and other organizations in selected areas all over the country were able to present their message on radio broadcasts. The Seventh-day Adventist Church in Norway was taken by surprise when the government radio monopoly was suddenly dissolved. It had not applied for permission to start broadcasting and had done little preparation for such a media development. Several other Christian denominations had been preparing for this eventuality for a number of years and were able to start broadcasting shortly after government permissions were granted. Later, a number of Seventh-day Adventist congregations were able to buy air-time from organizations which had already received permission. In September of 1984, a number of Seventh-day Adventist congregations received official permissions to start broadcasting on Community Radio in Norway. The purpose of this study, therefore, was to develop a strategy for a local Seventh-day Adventist ministry on Community Radio in Norway, and to suggest formats of radio broadcasts that might be used in such a ministry. Furthermore, the writer investigated the enormous problem of secularism, which is challenging the Christian community in Western Europe. Guidelines were suggested which might help the Christian broadcaster reach the increasingly larger secular part of the European population. The study suggests the following conclusions: 1. It is important for the Christian broadcaster to start with his target audience, to do research and develop a strategy to reach his target audience. 2. The Christian broadcaster should be aware of the advantages and disadvantages of radio. The medium of radio is particularly useful in the early stages of evangelism. It is an effective medium for building awareness and influencing change. 3. The wide-spread secularism in Norway poses a real challenge to the Christian community, but study shows that secular people can be more easily reached on the basis of felt needs. 4. One cannot conclude that one format of religious radio broadcasting is better than another. It is all a matter of what target audience one is trying to reach. The format should be regarded as a vehicle that can help the Christian broadcaster reach his target audience

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie
    corecore