12 research outputs found

    Distant Speech Recognition of Natural Spontaneous Multi-party Conversations

    Get PDF
    Distant speech recognition (DSR) has gained wide interest recently. While deep networks keep improving ASR overall, the performance gap remains between using close-talking recordings and distant recordings. Therefore the work in this thesis aims at providing some insights for further improvement of DSR performance. The investigation starts with collecting the first multi-microphone and multi-media corpus of natural spontaneous multi-party conversations in native English with the speaker location tracked, i.e. the Sheffield Wargame Corpus (SWC). The state-of-the-art recognition systems with the acoustic models trained standalone and adapted both show word error rates (WERs) above 40% on headset recordings and above 70% on distant recordings. A comparison between SWC and AMI corpus suggests a few unique properties in the real natural spontaneous conversations, e.g. the very short utterances and the emotional speech. Further experimental analysis based on simulated data and real data quantifies the impact of such influence factors on DSR performance, and illustrates the complex interaction among multiple factors which makes the treatment of each influence factor much more difficult. The reverberation factor is studied further. It is shown that the reverberation effect on speech features could be accurately modelled with a temporal convolution in the complex spectrogram domain. Based on that a polynomial reverberation score is proposed to measure the distortion level of short utterances. Compared to existing reverberation metrics like C50, it avoids a rigid early-late-reverberation partition without compromising the performance on ranking the reverberation level of recording environments and channels. Furthermore, the existing reverberation measurement is signal independent thus unable to accurately estimate the reverberation distortion level in short recordings. Inspired by the phonetic analysis on the reverberation distortion via self-masking and overlap-masking, a novel partition of reverberation distortion into the intra-phone smearing and the inter-phone smearing is proposed, so that the reverberation distortion level is first estimated on each part and then combined

    A Player’s Sense of Place: Computer Games as Anatopistic Medium

    Get PDF
    This project works to understand how open-world computer games help generate a sense of place from the player. Since their development over a half century ago, computer games have primarily been discussed in terms of space. Yet the way we think about space today is much different than how those scientists calculated space as a construction of time, mass, and location. But as computer games have evolved, the language has failed to accommodate the more nuanced qualities of game spaces. This project aims at articulating the nuances of place through phenomenological methods to objectively analyze the player experience as performed through various behaviors. Using a conceptual model that partially illustrates sense of place, I demonstrate how players create out of place—or anatopistic—places through play. After a historical survey of play as it is manifested through interaction with miniaturized environments, I turn to computer games as they have helped embody their creators’ sense of place. The third and fourth chapters offer a pair of case studies that reflect upon the experiences of the individual player and player groups. First, I compare virtual photography with tourism to reveal an array of sensibilities suggestive of the pursuit of place. This is followed with a look at Niantic’s PokĂ©mon Go and how player groups use the game to act out ritualistic forms of play. Positioning the player as a “ludopilgrim,” I demonstrate how players perform individual or intersubjectively meaningful places as a form of transgressive placemaking

    Simulating realistic multiparty speech data: for the development of distant microphone ASR systems

    Get PDF
    Automatic speech recognition has become a ubiquitous technology integrated into our daily lives. However, the problem remains challenging when the speaker is far away from the microphone. In such scenarios, the speech is degraded both by reverberation and by the presence of additive noise. This situation is particularly challenging when there are competing speakers present (i.e. multi-party scenarios) Acoustic scene simulation has been a major tool for training and developing distant microphone speech recognition systems, and is now being used to develop solutions for mult-party scenarios. It has been used both in training -- as it allows cheap generation of limitless amounts of data -- and for evaluation -- because it can provide easy access to a ground truth (i.e. a noise-free target signal). However, whilst much work has been conducted to produce realistic artificial scene simulators, the signals produced from such simulators are only as good as the `metadata' being used to define the setups, i.e., the data describing, for example, the number of speakers and their distribution relative to the microphones. This thesis looks at how realistic metadata can be derived by analysing how speakers behave in real domestic environments. In particular, how to produce scenes that provide a realistic distribution for various factors that are known to influence the 'difficulty' of the scene, including the separation angle between speakers, the absolute and relative distances of speakers to microphones, and the pattern of temporal overlap of speech. Using an existing audio-visual multi-party conversational dataset, CHiME-5, each of these aspects has been studied in turn. First, producing a realistic angular separation between speakers allows for algorithms which enhance signals based on the direction of arrival to be fairly evaluated, reducing the mismatch between real and simulated data. This was estimated using automatic people detection techniques in video recordings from CHiME-5. Results show that commonly used datasets of simulated signals do not follow a realistic distribution, and when a realistic distribution is enforced, a significant drop in performance is observed. Second, by using multiple cameras it has been possible to estimate the 2-D positions of people inside each scene. This has allowed the estimation of realistic distributions for the absolute distance to the microphone and relative distance to the competing speaker. The results show grouping behaviour among participants when located in a room and the impact this has on performance depends on the room size considered. Finally, the amount of overlap and points in the mixture which contain overlap were explored using finite-state models. These models allowed for mixtures to be generated, which approached the overlap patterns observed in the real data. Features derived from these models were also shown to be a predictor of the difficulty of the mixture. At each stage of the project, simulated datasets derived using the realistic metadata distributions have been compared to existing standard datasets that use naive or uninformed metadata distributions, and implications for speech recognition performance are observed and discussed. This work has demonstrated how unrealistic approaches can produce over-promising results, and can bias research towards techniques that might not work well in practice. Results will also be valuable in informing the design of future simulated datasets

    Using Deep Neural Networks for Speaker Diarisation

    Get PDF
    Speaker diarisation answers the question “who spoke when?” in an audio recording. The input may vary, but a system is required to output speaker labelled segments in time. Typical stages are Speech Activity Detection (SAD), speaker segmentation and speaker clustering. Early research focussed on Conversational Telephone Speech (CTS) and Broadcast News (BN) domains before the direction shifted to meetings and, more recently, broadcast media. The British Broadcasting Corporation (BBC) supplied data through the Multi-Genre Broadcast (MGB) Challenge in 2015 which showed the difficulties speaker diarisation systems have on broadcast media data. Diarisation is typically an unsupervised task which does not use auxiliary data or information to enhance a system. However, methods which do involve supplementary data have shown promise. Five semi-supervised methods are investigated which use a combination of inputs: different channel types and transcripts. The methods involve Deep Neural Networks (DNNs) for SAD, DNNs trained for channel detection, transcript alignment, and combinations of these approaches. However, the methods are only applicable when datasets contain the required inputs. Therefore, a method involving a pretrained Speaker Separation Deep Neural Network (ssDNN) is investigated which is applicable to every dataset. This technique performs speaker clustering and speaker segmentation using DNNs successfully for meeting data and with mixed results for broadcast media. The task of diarisation focuses on two aspects: accurate segments and speaker labels. The Diarisation Error Rate (DER) does not evaluate the segmentation quality as it does not measure the number of correctly detected segments. Other metrics exist, such as boundary and purity measures, but these also mask the segmentation quality. An alternative metric is presented based on the F-measure which considers the number of hypothesis segments correctly matched to reference segments. A deeper insight into the segment quality is shown through this metric

    The discipline and morale of the British Expeditionary Force in France and Flanders 1914-18, with particular reference to Irish units

    Get PDF
    A thesis submitted for the degree of Doctor of Philosophy of the University of Luton.During the Great War many European armies (most notably the Russian) collapsed due to major disciplinary problems. However, the British Expeditionary Force avoided these problems up until the Armistice of November 1918. This thesis examines how the discipline and morale of the RE.F. survived the war, by using a case-study of the Irish regiments. In 1914 with Ireland on the brink of a civil war, serious questions had been raised relating to the loyalty of the Irish regiments, particularly in the aftermath of the Curragh Incident. Indeed, intelligence reports prepared for Irish Command suggested that some reserve units would defect en masse to the U.V.F. if hostilities broke out in Ireland. As the Great War progressed, the rise of Sinn Fein produced further concern about the loyalty of Irish troops, seen most vividly in the decisions not to reform the 16th. (Irish) Division following the German Spring Offensive of 1918 and to remove Irish reserve units from Ireland in 1917-18. Nevertheless, a detailed study of courts martial (studied comprehensively in a database project) recently released by the P.R.O., demonstrates that many of the fears relating to Irish troops were groundless. Certainly Irish courts martial rates tended to be high, however, these figures were inflated by cases of drunkenness and absence, not disobedience. Likewise, while a number of mutinies did occur in Irish regiments during the war, this study has revealed that mutinies were much more common in the B.E.F. as a whole, than has been previously believed. This study has also considered the discipline and morale problems caused by the rapid expansion of the British army in 1914 and the appointment of many officers, especially in the 36th. (Ulster) Division, on the basis of their political allegiances rather than professional knowledge. Nevertheless, in general it appears that the discipline and morale of the Irish units in the B.E.F. was very good. Incidents of indiscipline appear to have been caused by the practical problems facing units during training and on active service rather than by the growth of the Sinn Fein movement in Ireland

    A critical geopolitics of RAF recruitment

    Get PDF
    PhD ThesisThis PhD thesis investigates the geopolitics of Royal Air Force (RAF) recruitment practices. Set at the interface between military and civilian life, RAF recruitment represents an important site from which particular imaginations of the military are consumed, enacted and performed. Drawing primarily on critical geopolitical theory and military geography, along with more-than-representational approaches to popular culture, the thesis uncovers how RAF recruitment necessitates an understanding of, and participation within, certain military-political narratives and imaginaries. It shows that these imaginaries – variously associated with the role, utility and legitimacy of state-sanctioned military violence – are powerful in their ability to affect popular understandings of the military, and to affect certain bodily and material engagements within the immediate spaces of recruitment. Furthermore, with a specific focus on the RAF, it demonstrates how certain ideas around the role and utility of military airpower are represented, enacted and performed. The thesis approaches the geopolitics of RAF recruitment in three ways. Firstly, focussing on the representative tenets of recruitment, the thesis examines both the historical and contemporary design of recruiting texts, images and documents. Using a socio-historical analysis of recruiting images, and drawing upon interviews with the military and corporate producers of recruitment, it demonstrates how recruitment emerges from particular structures, knowledges and experiences. Secondly, focussing on the visualities of military public-relations, the thesis demonstrates how large-scale public and private events, such as military airshows, provide spaces in which military-political narratives and imaginaries are enacted in and through regimes of seeing and sighting. Based on ethnographic research at military airshows, the thesis works to uncover the ways in which techniques of vision at spectacular events tie the potential recruit into particular imaginations of military legitimacy, efficacy, heritage and power. Thirdly, the thesis examines how the more mundane, quotidian sites of RAF recruitment are powerful in their ability to affect bodily predispositions and material engagements. Focussing on RAF recruiting games, military fitness regimes and the material, ephemeral nature of the airshow in particular, the thesis provides an insight into why the material and bodily cultures of militarism matter, and how they work persuasively to entrain particular imaginations of military life and culture. x The thesis raises important questions about the presence of military narratives and imaginaries in the public, civilian sphere, and in popular culture in particular. Set at the interface between military and civilian life, RAF recruitment demonstrates how popular geopolitical discourses of the military sometimes work not only to script imaginations of military violence, but to affect, mark and alter civilian lives and futures.ESR

    Social structures in the regular combat arms units of the British Army : a model

    Get PDF
    An original model is presented for describing, analysing, and predicting soldiers’ behaviour in current regular combat arms units in the British Army. It was derived, using social anthropological techniques, during participant observation by a serving British Army officer, and provides more coherent insights than other models of unit life. Its central principle, created for this study, is a plurality of >social structures’. These >social structures’ are separate bodies of ideas, rules and conventions of behaviour which inform groups of people or individuals how to organise and conduct themselves vis-à-vis each other. One >social structure’ operates at any single moment, according to context. Such an approach has not previously been applied to British Soldiers. The model’s top level (low resolution), comprises: the formal command structure, consisting in the unit organisation, the apparatus of rank and discipline, and the framework of official accountability; the informal structure, comprising the conventions of behaviour in the absence of formal constraints; the functional structure, concerning >soldierly’ activity, attitudes, and expectations; and the loyalty/identity structure, encompassing the conventions involved in embracing and expressing membership of the formal hierarchy of groups within and above the unit. Lower levels provide higher resolution, including a typology of informal relationships which encompasses different degrees of closeness and differences or equality in rank. The model’s rigour is established by testing its sensitivity at high resolution to the different conditions of life in historical British armies. The top level, however, and the typology of informal relationships, are found potentially to provide a unifying framework for historical analysis of unit life in the British Army throughout its history. The model’s ability to illuminate current issues in the Army is demonstrated by its application to leadership training for officer cadets and the integration of women into regular combat arms units.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    CREATING A COHERENT SCORE: THE MUSIC OF SINGLE-PLAYER FANTASY COMPUTER ROLE-PLAYING GAMES

    Get PDF
    This thesis provides a comprehensive exploration into the music of the ludic genre (Hourigan, 2005) known as a Computer Role-Playing Game (CRPG) and its two main sub-divisions: Japanese and Western Role-Playing Games (JRPGs & WRPGs). It focuses on the narrative category known as genre fiction, concentrating on fantasy fiction (Turco, 1999) and seeks to address one overall question: How do fantasy CRPG composers incorporate the variety of musical material needed to create a coherent score across the JRPG and WRPG divide? Seven main chapters form the thesis text. Chapter One provides an introduction to the thesis, detailing the research contributions in addition to outlining a variety of key terms that must be understood to continue with the rest of the text. A database accompanying this thesis showcases the vast range of CRPGs available; a literature review tackles relevant existing materials. Chapters Two and Three seek to provide the first canonical history of soundtracks used in CRPGs by dissecting typical narrative structures for games so as to provide context to their musical scores. Through analysis of existing game composer interviews, cultural influences are revealed. Chapters Four and Five mirror one another with detailed discussion respectively regarding JRPG and WRPG music including the influence that anime and Hollywood cinema have had upon them. In Chapter Six, the use of CRPG music outside of video games is explored, particularly the popularity of JRPG soundtracks in the concert hall. Chapter Seven concludes the thesis, summarising research contributions achieved and areas for future work. Throughout these chapters, the core task is to explain how the two primary sub-genres of CRPGs parted ways and why the music used to accompany these games differs so drastically

    Bowdoin Orient v.110, no.1-25 (1980-1981)

    Get PDF
    https://digitalcommons.bowdoin.edu/bowdoinorient-1980s/1001/thumbnail.jp

    The Knowing: A Fantasy an epistemological enquiry into creative process, form, and genre

    Get PDF
    This creative writing PhD thesis consists of a novel and a critical reflective essay. Both articulate a distinctive approach to the challenges of writing genre fiction in the 21st Century that I define as ‘Goldendark’ – one that actively engages with the ethical and political implications of the field via the specific aesthetic choices made about methodology, content, and form. The Knowing: A Fantasy is a novel written in the High Mimetic style that, through the story of Janey McEttrick, a Scottish-Cherokee musician descended from the Reverend Robert Kirk, a 17th Century Episcopalian minister from Aberfoyle (author of the 1691 monograph, The Secret Commonwealth of Elves, Fauns and Fairies), fictionalises the diasporic translocation of song- and tale-cultures between the Scottish Lowlands and the Southern Appalachians, and is a dramatisation of the creative process. In the accompanying critical reflective essay, ‘An Epistemological Enquiry into Creative Process, Form and Genre’, I chart the development of my novel: its initial inspiration, my practice-based research, its composition and completion, all informed both by my practice as a storyteller/poet and by my archival discoveries. In the section ‘Walking Between Worlds’ I articulate my methodology and seek to defend experiential research as a multi-modal approach – one that included long-distance walking, illustration, spoken word performance, ballad-singing and learning an instrument. In ‘Framing the Narrative’ I discuss matters of form – how I engaged with hyperfictionality and digital technology in destabilising traditional conventions of linear narrative and generic expectation. Finally, in ‘Defining Goldendark’ I articulate in detail my approach to a new ethical aesthetics of the fantasy genre
    corecore