15 research outputs found

    Do we need bigram alignment models? On the effect of alignment quality on transduction accuracy in G2P

    Get PDF
    Abstract We investigate the need for bigram alignment models and the benefit of supervised alignment techniques in graphemeto-phoneme (G2P) conversion. Moreover, we quantitatively estimate the relationship between alignment quality and overall G2P system performance. We find that, in English, bigram alignment models do perform better than unigram alignment models on the G2P task. Moreover, we find that supervised alignment techniques may perform considerably better than their unsupervised brethren and that few manually aligned training pairs suffice for them to do so. Finally, we estimate a highly significant impact of alignment quality on overall G2P transcription performance and that this relationship is linear in nature

    Keys to Play: Music as a Ludic Medium from Apollo to Nintendo

    Get PDF
    How do keyboards make music playable? Drawing on theories of media, systems, and cultural techniques, Keys to Play spans Greek myth and contemporary Japanese digital games to chart a genealogy of musical play and its animation via improvisation, performance, and recreation. As a paradigmatic digital interface, the keyboard forms a field of play on which the book’s diverse objects of inquiry—from clavichords to PCs and eighteenth-century musical dice games to the latest rhythm-action titles—enter into analogical relations. Remapping the keyboard’s topography by way of Mozart and Super Mario, who head an expansive cast of historical and virtual actors, Keys to Play invites readers to unlock ludic dimensions of music that are at once old and new

    Out-of-vocabulary spoken term detection

    Get PDF
    Spoken term detection (STD) is a fundamental task for multimedia information retrieval. A major challenge faced by an STD system is the serious performance reduction when detecting out-of-vocabulary (OOV) terms. The difficulties arise not only from the absence of pronunciations for such terms in the system dictionaries, but from intrinsic uncertainty in pronunciations, significant diversity in term properties and a high degree of weakness in acoustic and language modelling. To tackle the OOV issue, we first applied the joint-multigram model to predict pronunciations for OOV terms in a stochastic way. Based on this, we propose a stochastic pronunciation model that considers all possible pronunciations for OOV terms so that the high pronunciation uncertainty is compensated for. Furthermore, to deal with the diversity in term properties, we propose a termdependent discriminative decision strategy, which employs discriminative models to integrate multiple informative factors and confidence measures into a classification probability, which gives rise to minimum decision cost. In addition, to address the weakness in acoustic and language modelling, we propose a direct posterior confidence measure which replaces the generative models with a discriminative model, such as a multi-layer perceptron (MLP), to obtain a robust confidence for OOV term detection. With these novel techniques, the STD performance on OOV terms was improved substantially and significantly in our experiments set on meeting speech data

    THE ART OF NOISE: LITERATURE AND DISTURBANCE 1900-1940

    Get PDF
    The Art of Noise: Literature and Disturbance 1900-1940 is a study of noise’s role in prose literature in the U.S., Britain, and Ireland in the first half of the twentieth century. The Art of Noise focuses on what I call modernist noise, a way of leveraging noise— understood both as an auditory phenomenon (unwanted sound) and cybernetic interference (additional or garbled information that distorts information transmission)—to draw attention to, and in some cases to patch, a communicative or epistemological gap. I examine how authors leverage noise’s ability to confuse, to dismay, to pull a reader out of the flow of a text, and even to alienate her in order to create sticking points in their work that demand attention. In tracing noise’s disruptive qualities through modernist and modernist-era novels, I am particularly interested in how the defamiliarizing action of modernist noise coalesces around limit cases of social and political belongingness— narratives of extremity ranging from total war to economic and racial otherness. Scholarship on literary sound has tended to focus on musicality, or on the impact of sound technology on modernist culture. This focus has led to a general neglect of noise in se. The authors I consider—chief among them Mary Borden, James Joyce, Upton Sinclair, and Richard Wright—suggest that writing noise carries with it the possibility of intercourse between otherwise unbridgeable domains of experience. Instead of resolving modernist noise into a symptom of the twentieth century’s mechanized war and industry or of modernism’s own inclination toward an aesthetic of difficulty, I read its irruption into the novel as a productive disturbance

    Beckett and media

    Get PDF
    Featuring twelve original essays by leading Beckett scholars and media theorists, this book provides the first sustained examination of the relationship between Beckett and media technologies. The chapters analyse the rich variety of technical objects, semiotic arrangements, communication processes and forms of data processing that Beckett’s work so uniquely engages with, as well as those that – in historically changing configurations – determine the continuing performance, the audience reception, and the scholarly study of this work. Greatly enlarging the scope of earlier discussions, the book draws on a variety of innovative theoretical approaches, such as media archaeology, in order to discuss Beckett’s intermedial oeuvre. As such it engages with Beckett as a media artist and examine the way his engagement with media technologies continues to speak to our cultural situation

    Ecotonality, or Adapting Soundscape Ecology to Creative Practice: Ecological Sound Art Responses to Four South Australian Ecosystems

    Get PDF
    Vol. 1 Exegesis -- Vol. 2 Creative Artefacts DVDEcotonality, or Adapting Soundscape Ecology to Creative Practice: Ecological Sound Art Responses to Four South Australian Ecosystems presents a practice-led research project, introducing Ecotonality, a creative framework which connects and adapts the principles, frameworks and methods of the ecological discipline, ‘soundscape ecology’ to ecological sound art practice. It consists of a portfolio of creative works and 30,000-word exegesis. Drawing on the growth of research in soundscape ecology (and by extension ecoacoustics, bioacoustics and acoustic ecology), in the past decade, the Ecotonal Creative Framework considers the adaptation of soundscape ecology research, fieldwork and analysis as it relates to creative concerns of project conception, data collation, creative material preparation, compositional assemblage, artistic realisation and post-project reflection. Additionally, the framework appraises roles of human and non-human agency (via Karen Barad and Timothy Morton), and the inherent role and implications of technological mediation, as related to soundscape ecology and creative practice. Ecotonality allows a reconsideration of the macro- and micromorphological relationships of ecosystems in creative works, which engages the ethical concerns of site-specific practice and impact of creative work on ecosystems and soundscapes. Four creative site-specific responses are subsequently discussed, each in response a different South Australian site - Mobilong Swamp (swamp ecosystem), Long Island (riparian ecosystem), Featherstone Place (urban ecosystem) and Farina (desert ecosystem) - and each employing multichannel surround sound setups and acoustic instrumentation. These creative project act as case studies of the implementation of the Ecotonal Creative Framework, creatively expressing ideas related to place, ecosystem, soundscape and identity. Through the recording, manipulation and utilisation of extant material circumstances of particular places, (i.e. their contemporary soundscape and ecosystem), the resultant creative responses provide commentary on ecological, sociocultural, political and spiritual circumstances, histories and identities.Thesis (Ph.D.) -- University of Adelaide, Elder Conservatorium of Music, 201

    The Message is Murder

    Get PDF
    The Message is Murder analyses the violence bound up in the everyday functions of digital media. At its core is the concept of 'computational capital' - the idea that capitalism itself is a computer, turning qualities into quantities, and that the rise of digital culture and technologies under capitalism should be seen as an extension of capitalism's bloody logic. Engaging with Borges, Turing, Claude Shannon, Hitchcock and Marx, this book tracks computational capital to reveal the lineages of capitalised power as it has restructured representation, consciousness and survival in the twentieth and twenty-first centuries. Ultimately The Message is Murder makes the case for recognising media communications across all platforms - books, films, videos, photographs and even language itself - as technologies of political economy, entangled with the social contexts of a capitalism that is inherently racial, gendered and genocidal

    Structure out of sound

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1993.Vita.Includes bibliographical references (p. 155-170).Michael Jerome Hawley.Ph.D

    Typotecture: Histories, Theories and Digital Futures of Typographic Elements in Architectural Design

    Get PDF
    Written language constitutes an integral part of every urban landscape. However, in many cases there is no logical design and semantic relationship between the typographic elements and the architectural structures to which they are applied, resulting in visual pollution, a cacophony of words within the built environment. Taking this fact into account we can propose the concept of typotecture, a new form of architecture that integrates the graphic with the architectural field, an architectural practice that, in its role as a medium of communication, incorporates typography into its substance and expression. The research initially focuses on a systematic, chronologically structured historical analysis of existing examples of typotecture, along with their underlying theory, ranging from primitive pre-modern achievements to more coherent contemporary manifestations. This process helps us to identify an existing yet ill-defined cross-disciplinary design practice. Subsequently it creates a backdrop for its further development through the proposal of new innovative typotectural examples by experimenting with current digital design tools. The research aims to demonstrate that several building typologies where communication processes are involved (commercial, educational, religious, among others) have the capacity to transmit the required typographic information inherently, either two-dimensionally or three-dimensionally. These can offer fixed and mutable messages either explicitly or implicitly, depending on the function typotecture intends to serve (identification, navigation, promotion, education, recreation or mystification). The overall goal of the study is to prove that typotecture is capable of enhancing the value of architecture as a medium of communication, and contribute to contemporary meaningful and effective urban environments

    Corpus-based unit selection for natural-sounding speech synthesis

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 179-196).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Speech synthesis is an automatic encoding process carried out by machine through which symbols conveying linguistic information are converted into an acoustic waveform. In the past decade or so, a recent trend toward a non-parametric, corpus-based approach has focused on using real human speech as source material for producing novel natural-sounding speech. This work proposes a communication-theoretic formulation in which unit selection is a noisy channel through which an input sequence of symbols passes and an output sequence, possibly corrupted due to the coverage limits of the corpus, emerges. The penalty of approximation is quantified by substitution and concatenation costs which grade what unit contexts are interchangeable and where concatenations are not perceivable. These costs are semi-automatically derived from data and are found to agree with acoustic-phonetic knowledge. The implementation is based on a finite-state transducer (FST) representation that has been successfully used in speech and language processing applications including speech recognition. A proposed constraint kernel topology connects all units in the corpus with associated substitution and concatenation costs and enables an efficient Viterbi search that operates with low latency and scales to large corpora. An A* search can be applied in a second, rescoring pass to incorporate finer acoustic modelling. Extensions to this FST-based search include hierarchical and paralinguistic modelling. The search can also be used in an iterative feedback loop to record new utterances to enhance corpus coverage. This speech synthesis framework has been deployed across various domains and languages in many voices, a testament to its flexibility and rapid prototyping capability.(cont.) Experimental subjects completing tasks in a given air travel planning scenario by interacting in real time with a spoken dialogue system over the telephone have found the system "easiest to understand" out of eight competing systems. In more detailed listening evaluations, subjective opinions garnered from human participants are found to be correlated with objective measures calculable by machine.by Jon Rong-Wei Yi.Ph.D
    corecore