8,631 research outputs found

    Learning disentangled speech representations

    Get PDF
    A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody. The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions. In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks. This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically

    Non-Thermal Optical Engineering of Strongly-Correlated Quantum Materials

    Get PDF
    This thesis develops multiple optical engineering mechanisms to modulate the electronic, magnetic, and optical properties of strongly-correlated quantum materials, including polar metals, transition metal trichalcogenides, and copper oxides. We established the mechanisms of Floquet engineering and magnon bath engineering, and used optical probes, especially optical nonlinearity, to study the dynamics of these quantum systems. Strongly-correlated quantum materials host complex interactions between different degrees of freedom, offering a rich phase diagram to explore both in and out of equilibrium. While static tuning methods of the phases have witnessed great success, the emerging optical engineering methods have provided a more versatile platform. For optical engineering, the key to success lies in achieving the desired tuning while suppressing other unwanted effects, such as laser heating. We used sub-gap optical driving in order to avoid electronic excitation. Therefore, we managed to directly couple to low-energy excitation, or to induce coherent light-matter interactions. In order to elucidate the exact microscopic mechanisms of the optical engineering effects, we performed photon energy-dependent measurements and thorough theoretical analysis. To experimentally access the engineered quantum states, we leveraged various probe techniques, including the symmetry-sensitive optical second harmonic generation (SHG), and performed pump-probe type experiments to study the dynamics of quantum materials. I will first introduce the background and the motivation of this thesis, with an emphasis on the principles of optical engineering within the big picture of achieving quantum material properties on demand (Chapter I). I will then continue to introduce the main probe technique used in this thesis: SHG. I will also introduce the experimental setups which we developed and where we conducted the works contained in this thesis (Chapter II). In Chapter III, I will introduce an often overlooked aspect of SHG studies -- using SHG to study short-range structural correlations. Chapter IV will contain the theoretical analysis and experimental realizations of using sub-gap and resonant optical driving to tune electronic and optical properties of MnPS₃. The main tuning mechanism used in this chapter is Floquet engineering, where light modulates material properties without being absorbed. In Chapter V, I will turn to another useful material property: magnetism. First I will describe the extension of the Floquet mechanism to the renormalization of spin exchange interaction. Then I will switch gears and describe the demagnetization in Sr₂Cu₃O₄Cl₂ by resonant coupling between photons and magnons. I will end the thesis with a brief closing remark (Chapter VI).</p

    Supernatural crossing in Republican Chinese fiction, 1920s–1940s

    Get PDF
    This dissertation studies supernatural narratives in Chinese fiction from the mid-1920s to the 1940s. The literary works present phenomena or elements that are or appear to be supernatural, many of which remain marginal or overlooked in Sinophone and Anglophone academia. These sources are situated in the May Fourth/New Culture ideological context, where supernatural narratives had to make way for the progressive intellectuals’ literary realism and their allegorical application of supernatural motifs. In the face of realism, supernatural narratives paled, dismissed as impractical fantasies that distract one from facing and tackling real life. Nevertheless, I argue that the supernatural narratives do not probe into another mystical dimension that might co-exist alongside the empirical world. Rather, they imagine various cases of the characters’ crossing to voice their discontent with contemporary society or to reflect on the notion of reality. “Crossing” relates to characters’ acts or processes of trespassing the boundary that separates the supernatural from the conventional natural world, thus entailing encounters and interaction between the natural and the supernatural. The dissertation examines how crossing, as a narrative device, disturbs accustomed and mundane situations, releases hidden tensions, and discloses repressed truths in Republican fiction. There are five types of crossing in the supernatural narratives. Type 1 is the crossing into “haunted” houses. This includes (intangible) human agency crossing into domestic spaces and revealing secrets and truths concealed by the scary, feigned ‘haunting’, thus exposing the hidden evil and the other house occupiers’ silenced, suffocated state. Type 2 is men crossing into female ghosts’ apparitional residences. The female ghosts allude to heart-breaking, traumatic experiences in socio-historical reality, evoking sympathetic concern for suffering individuals who are caught in social upheavals. Type 3 is the crossing from reality into the characters’ delusional/hallucinatory realities. While they physically remain in the empirical world, the characters’ abnormal perceptions lead them to exclusive, delirious, and quasi-supernatural experiences of reality. Their crossings blur the concrete boundaries between the real and the unreal on the mental level: their abnormal perceptions construct a significant, meaningful reality for them, which may be as real as the commonly regarded objective reality. Type 4 is the crossing into the netherworld modelled on the real world in the authors’ observation and bears a spectrum of satirised objects of the Republican society. The last type is immortal visitors crossing into the human world. This type satirises humanity’s vices and destructive potential. The primary sources demonstrate their writers’ witty passion to play with super--natural notions and imagery (such as ghosts, demons, and immortals) and stitch them into vivid, engaging scenes using techniques such as the gothic, the grotesque, and the satirical, in order to evoke sentiments such as terror, horror, disgust, dis--orientation, or awe, all in service of their insights into realist issues. The works also creatively tailor traditional Chinese modes and motifs, which exemplifies the revival of Republican interest in traditional cultural heritage. The supernatural narratives may amaze or disturb the reader at first, but what is more shocking, unpleasantly nudging, or thought-provoking is the problematic society and people’s lives that the supernatural (misunderstandings) eventually reveals. They present a more compre--hensive treatment of reality than Republican literature with its revolutionary consciousness surrounding class struggle. The critical perspectives of the supernatural narratives include domestic space, unacknowledged history and marginal individuals, abnormal mentality, and pervasive weaknesses in humanity. The crossing and supernatural narratives function as a means of better understanding the lived reality. This study gathers diverse primary sources written by Republican writers from various educational and political backgrounds and interprets them from a rare perspective, thus filling a research gap. It promotes a fuller view of supernatural narratives in twentieth-century Chinese literature. In terms of reflecting the social and personal reality of the Republican era, the supernatural narratives supplement the realist fiction of the time

    'Exarcheia doesn't exist': Authenticity, Resistance and Archival Politics in Athens

    Get PDF
    My thesis investigates the ways people, materialities and urban spaces interact to form affective ecologies and produce historicity. It focuses on the neighbourhood of Exarcheia, Athens’ contested political topography par excellence, known for its production of radical politics of discontent and resistance to state oppression and eoliberal capitalism. Embracing Exarcheia’s controversial status within Greek vernacular, media and state discourses, this thesis aims to unpick the neighbourhoods’ socio-spatial assemblage imbued with affect and formed through the numerous (mis)understandings and (mis)interpretations rooted in its turbulent political history. Drawing on theory on urban spaces, affect, hauntology and archival politics, I argue for Exarcheia as an unwavering archival space composed of affective chronotopes – (in)tangible loci that defy space and temporality. I posit that the interwoven narratives and materialities emerging in my fieldwork are persistently – and perhaps obsessively – reiterating themselves and remaining imprinted on the neighbourhood’s landscape as an incessant reminder of violent histories that the state often seeks to erase and forget. Through this analysis, I contribute to understandings of place as a primary ethnographic ‘object’ and the ways in which place forms complex interactions and relationships with social actors, shapes their subjectivities, retains and bestows their memories and senses of historicity

    Growth trends and site productivity in boreal forests under management and environmental change: insights from long-term surveys and experiments in Sweden

    Get PDF
    Under a changing climate, current tree and stand growth information is indispensable to the carbon sink strength of boreal forests. Important questions regarding tree growth are to what extent have management and environmental change influenced it, and how it might respond in the future. In this thesis, results from five studies (Papers I-V) covering growth trends, site productivity, heterogeneity in managed forests and potentials for carbon storage in forests and harvested wood products via differing management strategies are presented. The studies were based on observations from national forest inventories and long-term experiments in Sweden. The annual height growth of Scots pine (Pinus sylvestris) and Norway spruce (Picea abies) had increased, especially after the millennium shift, while the basal area growth remains stable during the last 40 years (Papers I-II). A positive response on height growth with increasing temperature was observed. The results generally imply a changing growing condition and stand composition. In Paper III, yield capacity of conifers was analysed and compared with existing functions. The results showed that there is a bias in site productivity estimates and the new functions give better prediction of the yield capacity in Sweden. In Paper IV, the variability in stand composition was modelled as indices of heterogeneity to calibrate the relationship between basal area and leaf area index in managed stands of Norway spruce and Scots pine. The results obtained show that the stand structural heterogeneity effects here are of such a magnitude that they cannot be neglected in the implementation of hybrid growth models, especially those based on light interception and light-use efficiency. In the long-term, the net climate benefits in Swedish forests may be maximized through active forest management with high harvest levels and efficient product utilization, compared to increasing carbon storage in standing forests through land set-asides for nature conservation (Paper V). In conclusion, this thesis offers support for the development of evidence-based policy recommendations for site-adapted and sustainable management of Swedish forests in a changing climate

    Socio-endocrinology revisited: New tools to tackle old questions

    Get PDF
    Animals’ social environments impact their health and survival, but the proximate links between sociality and fitness are still not fully understood. In this thesis, I develop and apply new approaches to address an outstanding question within this sociality-fitness link: does grooming (a widely studied, positive social interaction) directly affect glucocorticoid concentrations (GCs; a group of steroid hormones indicating physiological stress) in a wild primate? To date, negative, long-term correlations between grooming and GCs have been found, but the logistical difficulties of studying proximate mechanisms in the wild leave knowledge gaps regarding the short-term, causal mechanisms that underpin this relationship. New technologies, such as collar-mounted tri-axial accelerometers, can provide the continuous behavioural data required to match grooming to non-invasive GC measures (Chapter 1). Using Chacma baboons (Papio ursinus) living on the Cape Peninsula, South Africa as a model system, I identify giving and receiving grooming using tri-axial accelerometers and supervised machine learning methods, with high overall accuracy (~80%) (Chapter 2). I then test what socio-ecological variables predict variation in faecal and urinary GCs (fGCs and uGCs) (Chapter 3). Shorter and rainy days are associated with higher fGCs and uGCs, respectively, suggesting that environmental conditions may impose stressors in the form of temporal bottlenecks. Indeed, I find that short days and days with more rain-hours are associated with reduced giving grooming (Chapter 4), and that this reduction is characterised by fewer and shorter grooming bouts. Finally, I test whether grooming predicts GCs, and find that while there is a long-term negative correlation between grooming and GCs, grooming in the short-term, in particular giving grooming, is associated with higher fGCs and uGCs (Chapter 5). I end with a discussion on how the new tools I applied have enabled me to advance our understanding of sociality and stress in primate social systems (Chapter 6)

    Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning

    Get PDF
    Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.This work has been partially supported by the European Commission ICT COST Action “Multi-task, Multilingual, Multi-modal Language Generation” (CA18231). AE was supported by BAGEP 2021 Award of the Science Academy. EE was supported in part by TUBA GEBIP 2018 Award. BP is in in part funded by Independent Research Fund Denmark (DFF) grant 9063-00077B. IC has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 838188. EL is partly funded by Generalitat Valenciana and the Spanish Government throught projects PROMETEU/2018/089 and RTI2018-094649-B-I00, respectively. SMI is partly funded by UNIRI project uniri-drustv-18-20. GB is partly supported by the Ministry of Innovation and the National Research, Development and Innovation Office within the framework of the Hungarian Artificial Intelligence National Laboratory Programme. COT is partially funded by the Romanian Ministry of European Investments and Projects through the Competitiveness Operational Program (POC) project “HOLOTRAIN” (grant no. 29/221 ap2/07.04.2020, SMIS code: 129077) and by the German Academic Exchange Service (DAAD) through the project “AWAKEN: content-Aware and netWork-Aware faKE News mitigation” (grant no. 91809005). ESA is partially funded by the German Academic Exchange Service (DAAD) through the project “Deep-Learning Anomaly Detection for Human and Automated Users Behavior” (grant no. 91809358)

    Linguistic- and Acoustic-based Automatic Dementia Detection using Deep Learning Methods

    Get PDF
    Dementia can affect a person's speech and language abilities, even in the early stages. Dementia is incurable, but early detection can enable treatment that can slow down and maintain mental function. Therefore, early diagnosis of dementia is of great importance. However, current dementia detection procedures in clinical practice are expensive, invasive, and sometimes inaccurate. In comparison, computational tools based on the automatic analysis of spoken language have the potential to be applied as a cheap, easy-to-use, and objective clinical assistance tool for dementia detection. In recent years, several studies have shown promise in this area. However, most studies focus heavily on the machine learning aspects and, as a consequence, often lack sufficient incorporation of clinical knowledge. Many studies also concentrate on clinically less relevant tasks such as the distinction between HC and people with AD which is relatively easy and therefore less interesting both in terms of the machine learning and the clinical application. The studies in this thesis concentrate on automatically identifying signs of neurodegenerative dementia in the early stages and distinguishing them from other clinical, diagnostic categories related to memory problems: (FMD, MCI, and HC). A key focus, when designing the proposed systems has been to better consider (and incorporate) currently used clinical knowledge and also to bear in mind how these machine-learning based systems could be translated for use in real clinical settings. Firstly, a state-of-the-art end-to-end system is constructed for extracting linguistic information from automatically transcribed spontaneous speech. The system's architecture is based on hierarchical principles thereby mimicking those used in clinical practice where information at both word-, sentence- and paragraph-level is used when extracting information to be used for diagnosis. Secondly, hand-crafted features are designed that are based on clinical knowledge of the importance of pausing and rhythm. These are successfully joined with features extracted from the end-to-end system. Thirdly, different classification tasks are explored, each set up so as to represent the types of diagnostic decision-making that is relevant in clinical practice. Finally, experiments are conducted to explore how to better deal with the known problem of confounding and overlapping symptoms on speech and language from age and cognitive decline. A multi-task system is constructed that takes age into account while predicting cognitive decline. The studies use the publicly available DementiaBank dataset as well as the IVA dataset, which has been collected by our collaborators at the Royal Hallamshire Hospital, UK. In conclusion, this thesis proposes multiple methods of using speech and language information for dementia detection with state-of-the-art deep learning technologies, confirming the automatic system's potential for dementia detection

    Computational analysis of single-cell dynamics: protein localisation, cell cycle, and metabolic adaptation

    Get PDF
    Cells need to be able to adapt quickly to changes in nutrient availability in their environment in order to survive. Budding yeasts constitute a convenient model to study how eukaryotic cells respond to sudden environmental change because of their fast growth and relative simplicity. Many of the intracellular changes needed for adaptation are spatial and transient; they can be captured experimentally using ïŹ‚uorescence time-lapse microscopy. These data are limited when only used for observation, and become most powerful when they can be used to extract quantitative, dynamic, single-cell information. In this thesis we describe an analysis framework heavily based on deep learning methods that allows us to quantitatively describe diïŹ€erent aspects of cells’ response to a new environment from microscopy data. chapter 2 describes a start-to-ïŹnish pipeline for data access and preprocessing, cell segmentation, volume and growth rate estimation, and lineage extraction. We provide benchmarks of run time and describe how to speed up analysis using parallelisation. We then show how this pipeline can be extended with custom processing functions, and how it can be used for real-time analysis of microscopy experiments. In chapter 3 we develop a method for predicting the location of the vacuole and nucleus from bright ïŹeld images. We combine this method with cell segmentation to quantify the timing of three aspects of the cells’ response to a sudden nutrient shift: a transient change in transcription factor nuclear localisation, a change in instantaneous growth rate, and the reorganisation of the plasma membrane through the endocytosis of certain membrane proteins. In particular, we quantify the relative timing of these processes and show that there is a consistent lag between the perception of the stress at the level of gene expression and the reorganisation of the cell membrane. In chapter 4 we evaluate several methods to obtain cell cycle phase information in a label-free manner. We begin by using the outputs of cell segmentation to predict cytokinesis with high accuracy. We then predict cell cycle phase at a higher granularity directly from bright ïŹeld images. We show that bright ïŹeld images contain information about the cell cycle which is not visible by eye. We use these methods to quantify the relationship between cell cycle phase length and growth rate. Finally, in chapter 5 we look beyond microscopy to the bigger picture. We sketch an abstract description of how, at a genome-scale, cells might choose a strategy for adapting to a nutrient shift based on limited, noisy, and local information. Starting from a constraint-based model of metabolism, we propose an algorithm to navigate through metabolic space using only a lossy encoding of the full metabolic network. We show how this navigation can be used to adapt to a changing environment, and how its results diïŹ€er from the global optimisation usually applied to metabolic models
    • 

    corecore