10 research outputs found

    The development of speech coding and the first standard coder for public mobile telephony

    Get PDF
    This thesis describes in its core chapter (Chapter 4) the original algorithmic and design features of the ??rst coder for public mobile telephony, the GSM full-rate speech coder, as standardized in 1988. It has never been described in so much detail as presented here. The coder is put in a historical perspective by two preceding chapters on the history of speech production models and the development of speech coding techniques until the mid 1980s, respectively. In the epilogue a brief review is given of later developments in speech coding. The introductory Chapter 1 starts with some preliminaries. It is de- ??ned what speech coding is and the reader is introduced to speech coding standards and the standardization institutes which set them. Then, the attributes of a speech coder playing a role in standardization are explained. Subsequently, several applications of speech coders - including mobile telephony - will be discussed and the state of the art in speech coding will be illustrated on the basis of some worldwide recognized standards. Chapter 2 starts with a summary of the features of speech signals and their source, the human speech organ. Then, historical models of speech production which form the basis of di??erent kinds of modern speech coders are discussed. Starting with a review of ancient mechanical models, we will arrive at the electrical source-??lter model of the 1930s. Subsequently, the acoustic-tube models as they arose in the 1950s and 1960s are discussed. Finally the 1970s are reviewed which brought the discrete-time ??lter model on the basis of linear prediction. In a unique way the logical sequencing of these models is exposed, and the links are discussed. Whereas the historical models are discussed in a narrative style, the acoustic tube models and the linear prediction tech nique as applied to speech, are subject to more mathematical analysis in order to create a sound basis for the treatise of Chapter 4. This trend continues in Chapter 3, whenever instrumental in completing that basis. In Chapter 3 the reader is taken by the hand on a guided tour through time during which successive speech coding methods pass in review. In an original way special attention is paid to the evolutionary aspect. Speci??cally, for each newly proposed method it is discussed what it added to the known techniques of the time. After presenting the relevant predecessors starting with Pulse Code Modulation (PCM) and the early vocoders of the 1930s, we will arrive at Residual-Excited Linear Predictive (RELP) coders, Analysis-by-Synthesis systems and Regular- Pulse Excitation in 1984. The latter forms the basis of the GSM full-rate coder. In Chapter 4, which constitutes the core of this thesis, explicit forms of Multi-Pulse Excited (MPE) and Regular-Pulse Excited (RPE) analysis-by-synthesis coding systems are developed. Starting from current pulse-amplitude computation methods in 1984, which included solving sets of equations (typically of order 10-16) two hundred times a second, several explicit-form designs are considered by which solving sets of equations in real time is avoided. Then, the design of a speci??c explicitform RPE coder and an associated eÆcient architecture are described. The explicit forms and the resulting architectural features have never been published in so much detail as presented here. Implementation of such a codec enabled real-time operation on a state-of-the-art singlechip digital signal processor of the time. This coder, at a bit rate of 13 kbit/s, has been selected as the Full-Rate GSM standard in 1988. Its performance is recapitulated. Chapter 5 is an epilogue brie y reviewing the major developments in speech coding technology after 1988. Many speech coding standards have been set, for mobile telephony as well as for other applications, since then. The chapter is concluded by an outlook

    A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording

    No full text
    Abstract: This paper presents a new approach for a vocoder design based on full frequency masking by octaves in addition to a technique for spectral filling via beta probability distribution. Some psycho-acoustic characteristics of human hearing- inaudibility masking in frequency and phase- are used as a basis for the proposed vocoder. The results confirm that this vocoder may be useful to save bandwidth in applications requiring intelligibility. It is recommended for the legal eavesdropping of long voice conversations. Introduction. The purpose of the voice compression is to obtain a concise representation of the signal, which allows efficient storage and transmission of voice data [1]. With proper processing, a voice signal can be analyzed and encoded at low data rates and then resynthesized. In many applications, the digital coding of voice is needed to introduce encryption algorithms (for security) or error correction techniques (to mitigate the noise of th

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Virtual vocal ensembles and the mediation of performance on YouTube

    Get PDF
    Musicians produce virtual performance videos of themselves and others on websites like YouTube. In a society with ubiquitous Internet and prominent social media interactions, music education can benefit by exploring the practices of musicians who produce music online, such as the creators of virtual vocal ensembles. A virtual vocal ensemble is a video containing multiple audio-visual tracks layered together through a technique called multitracking. In this performance practice, a virtual vocal ensemble creator records and combines multiple tracks to make a choir of clones or works with others in collaborative or collective ways.‬ The purpose of this study was to explore the implications of virtual vocal ensembles and the medium that emerged from the development and distribution of those videos. This study situates the creators of virtual vocal ensembles within a sound recording medium, based on a theoretical framework developed by Sterne (2003) that defines a medium as a contingent network of relations made up of people, practices, institutions, and technologies. Guiding questions focus on the musical and social implications of creating virtual vocal ensembles, the entities listed above, and the relations between them. Traditional research methods and Internet inquiry were combined to create a multiple case study that examined three YouTube channels, each produced by a video creator. Data included the observation of the videos on the YouTube channels, text comments, and website analytics as well as interviews with video creators and others pertinent to the cases. A cross-case analysis was conducted to produce assertions that attended to the guiding questions.‬ â€Ș Creators of virtual vocal ensembles developed methods to construct and publish their videos, which were limited by their musical and technological abilities and the resources available. As musicians produced virtual vocal ensembles, online communities containing elements of fandoms, learning communities of practice, and music making spaces developed. Implications of the performance practice have effected the way the medium is situated within society as well as the way creators perform choral music and sing. For example, when performers create virtual vocal ensembles, they develop identities as virtual performers and express themselves musically and theatrically. Musical arrangement, voice range expansion, and autonomous exploration of musical concepts were also results of creators’ performance practices. Creating virtual vocal ensembles require not only musical skills, but also technological and production abilities that can be applied to music education practices and expand conceptions of ensemble, performance, and medium. As producers of virtual vocal ensembles, video creators use social media to expand their reach and develop a community that has aspects of a fandom as well as learning and music making communities. Music educators can incorporate the practices of virtual vocal ensemble creators into their instruction and help students learn skills that may allow them to make music outside of the choral ensemble classroom in virtual contexts.

    The Race of Sound

    Get PDF
    In The Race of Sound Nina Sun Eidsheim traces the ways in which sonic attributes that might seem natural, such as the voice and its qualities, are socially produced. Eidsheim illustrates how listeners measure race through sound and locate racial subjectivities in vocal timbre—the color or tone of a voice. Eidsheim examines singers Marian Anderson, Billie Holiday, and Jimmy Scott as well as the vocal synthesis technology Vocaloid to show how listeners carry a series of assumptions about the nature of the voice and to whom it belongs. Outlining how the voice is linked to ideas of racial essentialism and authenticity, Eidsheim untangles the relationship between race, gender, vocal technique, and timbre while addressing an undertheorized space of racial and ethnic performance. In so doing, she advances our knowledge of the cultural-historical formation of the timbral politics of difference and the ways that comprehending voice remains central to understanding human experience, all the while advocating for a form of listening that would allow us to hear singers in a self-reflexive, denaturalized way

    Voicing Kinship with Machines: Diffractive Empathetic Listening to Synthetic Voices in Performance.

    Get PDF
    This thesis contributes to the field of voice studies by analyzing the design and production of synthetic voices in performance. The work explores six case studies, consisting of different performative experiences of the last decade (2010- 2020) that featured synthetic voice design. It focusses on the political and social impact of synthetic voices, starting from yet challenging the concepts of voice in the machine and voice of the machine. The synthetic voices explored are often playing the role of simulated artificial intelligences, therefore this thesis expands its questions towards technology at large. The analysis of the case studies follows new materialist and posthumanist premises, yet it tries to confute the patriarchal and neoliberal approach towards technological development through feminist and de-colonial approaches, developing a taxonomy for synthetic voices in performance. Chapter 1 introduces terms and explains the taxonomy. Chapter 2 looks at familiar representations of fictional AI. Chapter 3 introduces headphone theatre exploring immersive practices. Chapters 4 and 5 engage with chatbots. Chapter 6 goes in depth exploring Human and Artificial Intelligence interaction, whereas chapter 7 moves slightly towards music production and live art. The body of the thesis includes the work of Pipeline Theatre, Rimini Protokoll, Annie Dorsen, Begüm Erciyas, and Holly Herndon. The analysis is informed by posthumanism, feminism, and performance studies, starting from my own practice as sound designer and singer, looking at aesthetics of reproduction, audience engagement, and voice composition. This thesis has been designed to inspire and provoke practitioners and scholars to explore synthetic voices further, question predominant biases of binarism and acknowledge their importance in redefining technology

    Abstracts on Radio Direction Finding (1899 - 1995)

    Get PDF
    The files on this record represent the various databases that originally composed the CD-ROM issue of "Abstracts on Radio Direction Finding" database, which is now part of the Dudley Knox Library's Abstracts and Selected Full Text Documents on Radio Direction Finding (1899 - 1995) Collection. (See Calhoun record https://calhoun.nps.edu/handle/10945/57364 for further information on this collection and the bibliography). Due to issues of technological obsolescence preventing current and future audiences from accessing the bibliography, DKL exported and converted into the three files on this record the various databases contained in the CD-ROM. The contents of these files are: 1) RDFA_CompleteBibliography_xls.zip [RDFA_CompleteBibliography.xls: Metadata for the complete bibliography, in Excel 97-2003 Workbook format; RDFA_Glossary.xls: Glossary of terms, in Excel 97-2003 Workbookformat; RDFA_Biographies.xls: Biographies of leading figures, in Excel 97-2003 Workbook format]; 2) RDFA_CompleteBibliography_csv.zip [RDFA_CompleteBibliography.TXT: Metadata for the complete bibliography, in CSV format; RDFA_Glossary.TXT: Glossary of terms, in CSV format; RDFA_Biographies.TXT: Biographies of leading figures, in CSV format]; 3) RDFA_CompleteBibliography.pdf: A human readable display of the bibliographic data, as a means of double-checking any possible deviations due to conversion
    corecore