8,306 research outputs found
Learning disentangled speech representations
A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody.
The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions.
In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks.
This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically
Annals [...].
Pedometrics: innovation in tropics; Legacy data: how turn it useful?; Advances in soil sensing; Pedometric guidelines to systematic soil surveys.Evento online. Coordenado por: Waldir de Carvalho Junior, Helena Saraiva Koenow Pinheiro, Ricardo Simão Diniz Dalmolin
Management controls, government regulations, customer involvement: Evidence from a Chinese family-owned business
This research reports on a case study of a family-owned elevator manufacturing company in China, where management control was sandwiched between the state policies and global customer production requirements. By analysing the role of government and customer, this thesis aimed to illustrate how management control operated in a family-owned business and to see how and why they do management control differently. In particular, it focused on how international production standards and existing Chinese industry policies translated into a set of the management control practices through a local network within the family-owned business I studied.
Based on an ethnographic approach to research, I spent six months in the field, conducted over 30 interviews, several conservations, and reviewed relevant internal documents to understand how management control (MC) techniques with humans cooperated in the company. I also understood how two layers of pressure have shaped company behaviour, and how a company located in a developing country is connecting with global network. I also found there is considerable tension among key actors and investigated how the company responded and managed it.
Drawing on Actor Network Theory (ANT), I analysed the interviews from key actors, examined the role of government regulations and customer requirements to see how management control being managed under two layers of pressure, i.e., the government regulations (e.g., labour, tax, environment control) and customer requirement (e.g., quality and production control). Management controls were an obligatory passage point (OPP), and transformation of those elements of Western production requirements and government requirements arrived at the Chinese local factory and influenced management control and budgeting.
The findings suggest that management control systems are not only a set of technical procedures, but it is also about managing tensions. This understanding shows a linear perspective on MC practices rather than a social perspective. However, when we use ANT as a theoretical perspective, we see those actors who, being obliged and sandwiched, and controlled by external forces for them to follow. Consequently, human actors must work in an unavoidable OPP. This is the tension they face which constructed mundane practices of MC. Hence, MCs are managing such tensions. This study contributes to management control research by analysing management controls in terms of OPP, extends our understanding by illustrating the role of the government and customers, and our understanding of family-owned business from a management controls perspective in a developing country
The Role of Transient Vibration of the Skull on Concussion
Concussion is a traumatic brain injury usually caused by a direct or indirect blow to the head that affects brain function. The maximum mechanical impedance of the brain tissue occurs at 450±50 Hz and may be affected by the skull resonant frequencies. After an impact to the head, vibration resonance of the skull damages the underlying cortex. The skull deforms and vibrates, like a bell for 3 to 5 milliseconds, bruising the cortex. Furthermore, the deceleration forces the frontal and temporal cortex against the skull, eliminating a layer of cerebrospinal fluid. When the skull vibrates, the force spreads directly to the cortex, with no layer of cerebrospinal fluid to reflect the wave or cushion its force. To date, there is few researches investigating the effect of transient vibration of the skull. Therefore, the overall goal of the proposed research is to gain better understanding of the role of transient vibration of the skull on concussion. This goal will be achieved by addressing three research objectives. First, a MRI skull and brain segmentation automatic technique is developed. Due to bones’ weak magnetic resonance signal, MRI scans struggle with differentiating bone tissue from other structures. One of the most important components for a successful segmentation is high-quality ground truth labels. Therefore, we introduce a deep learning framework for skull segmentation purpose where the ground truth labels are created from CT imaging using the standard tessellation language (STL). Furthermore, the brain region will be important for a future work, thus, we explore a new initialization concept of the convolutional neural network (CNN) by orthogonal moments to improve brain segmentation in MRI. Second, the creation of a novel 2D and 3D Automatic Method to Align the Facial Skeleton is introduced. An important aspect for further impact analysis is the ability to precisely simulate the same point of impact on multiple bone models. To perform this task, the skull must be precisely aligned in all anatomical planes. Therefore, we introduce a 2D/3D technique to align the facial skeleton that was initially developed for automatically calculating the craniofacial symmetry midline. In the 2D version, the entire concept of using cephalometric landmarks and manual image grid alignment to construct the training dataset was introduced. Then, this concept was extended to a 3D version where coronal and transverse planes are aligned using CNN approach. As the alignment in the sagittal plane is still undefined, a new alignment based on these techniques will be created to align the sagittal plane using Frankfort plane as a framework. Finally, the resonant frequencies of multiple skulls are assessed to determine how the skull resonant frequency vibrations propagate into the brain tissue. After applying material properties and mesh to the skull, modal analysis is performed to assess the skull natural frequencies. Finally, theories will be raised regarding the relation between the skull geometry, such as shape and thickness, and vibration with brain tissue injury, which may result in concussive injury
Recommended from our members
Mixture Models in Machine Learning
Modeling with mixtures is a powerful method in the statistical toolkit that can be used for representing the presence of sub-populations within an overall population. In many applications ranging from financial models to genetics, a mixture model is used to fit the data. The primary difficulty in learning mixture models is that the observed data set does not identify the sub-population to which an individual observation belongs. Despite being studied for more than a century, the theoretical guarantees of mixture models remain unknown for several important settings.
In this thesis, we look at three groups of problems. The first part is aimed at estimating the parameters of a mixture of simple distributions. We ask the following question: How many samples are necessary and sufficient to learn the latent parameters? We propose several approaches for this problem that include complex analytic tools to connect statistical distances between pairs of mixtures with the characteristic function. We show sufficient sample complexity guarantees for mixtures of popular distributions (including Gaussian, Poisson and Geometric). For many distributions, our results provide the first sample complexity guarantees for parameter estimation in the corresponding mixture. Using these techniques, we also provide improved lower bounds on the Total Variation distance between Gaussian mixtures with two components and demonstrate new results in some sequence reconstruction problems.
In the second part, we study Mixtures of Sparse Linear Regressions where the goal is to learn the best set of linear relationships between the scalar responses (i.e., labels) and the explanatory variables (i.e., features). We focus on a scenario where a learner is able to choose the features to get the labels. To tackle the high dimensionality of data, we further assume that the linear maps are also sparse , i.e., have only few prominent features among many. For this setting, we devise algorithms with sub-linear (as a function of the dimension) sample complexity guarantees that are also robust to noise.
In the final part, we study Mixtures of Sparse Linear Classifiers in the same setting as above. Given a set of features and the binary labels, the objective of this task is to find a set of hyperplanes in the space of features such that for any (feature, label) pair, there exists a hyperplane in the set that justifies the mapping. We devise efficient algorithms with sub-linear sample complexity guarantees for learning the unknown hyperplanes under similar sparsity assumptions as above. To that end, we propose several novel techniques that include tensor decomposition methods and combinatorial designs
Path integral based convolution and pooling for graph neural networks
Graph neural networks (GNNs) extends the functionality of traditional neural networks to graph-structured data. Similar to CNNs, an optimized design of graph convolution and pooling is key to success. Borrowing ideas from physics, we propose a path integral based graph neural networks (PAN) for classification and regression tasks on graphs. Specifically, we consider a convolution operation that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. It generalizes the graph Laplacian to a new transition matrix we call maximal entropy transition (MET) matrix derived from a path integral formalism. Importantly, the diagonal entries of the MET matrix are directly related to the subgraph centrality, thus lead to a natural and adaptive pooling mechanism. PAN provides a versatile framework that can be tailored for different graph data with varying sizes and structures. We can view most existing GNN architectures as special cases of PAN. Experimental results show that PAN achieves state-of-the-art performance on various graph classification/regression tasks, including a new benchmark dataset from statistical mechanics we propose to boost applications of GNN in physical sciences
Recommended from our members
Brain signal recognition using deep learning
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityBrain Computer Interface (BCI) has the potential to offer a new generation of applications independent of
muscular activity and controlled by the human brain. Brain imaging technologies are used to transfer the
cognitive tasks into control commands for a BCI system. The electroencephalography (EEG) technology
serves as the best available non-invasive solution for extracting signals from the brain. On the other hand,
speech is the primary means of communication, but for patients suffering from locked-in syndrome, there
is no easy way to communicate. Therefore, an ideal communication system for locked-in patients is a
thought-to-speech BCI system.
This research aims to investigate methods for the recognition of imagined speech from EEG signals
using deep learning techniques. In order to design an optimal imagined speech recognition BCI, variety
of issues have been solved. These include 1) proposing new feature extraction and classification
framework for recognition of imagined speech from EEG signals, 2) grammatical class recognition of
imagined words from EEG signals, 3) discriminating different cognitive tasks associated with speech in
the brain such as overt speech, covert speech, and visual imagery. In this work machine learning, deep
learning methods were used to analyze EEG signals.
For recognition of imagined speech from EEG signals, a new EEG database was collected while the
participants mentally spoke (imagined speech) the presented words. Along with imagined speech, EEG
data was recorded for visual imagery (imagining a scene or an image) and overt speech (verbal speech).
Spectro-temporal and spatio-temporal domain features were investigated for the classification of imagined
words from EEG signals. Further, a deep learning framework using the convolutional network
and attention mechanism was implemented for learning features in the spatial, temporal, and spectral
domains. The method achieved a recognition rate of 76.6% for three binary word pairs. These experiments
show that deep learning algorithms are ideal for imagined speech recognition from EEG signals
due to their ability to interpret features from non-linear and non-stationary signals. Grammatical classes
of imagined words from EEG signals were also recognized using a multi-channel convolution network
framework. This method was extended to a multi-level recognition system for multi-class classification
of imagined words which achieved an accuracy of 52.9% for 10 words, which is much better in
comparison to previous work.
In order to investigate the difference between imagined speech with verbal speech and visual imagery
from EEG signals, we used multivariate pattern analysis (MVPA). MVPA provided the time segments
when the neural oscillation for the different cognitive tasks was linearly separable. Further, frequencies
that result in most discrimination between the different cognitive tasks were also explored. A framework
was proposed to discriminate two cognitive tasks based on the spatio-temporal patterns in EEG signals.
The proposed method used the K-means clustering algorithm to find the best electrode combination and
convolutional-attention network for feature extraction and classification. The proposed method achieved
a high recognition rate of 82.9% and 77.7%.
The results in this research suggest that a communication based BCI system can be designed using
deep learning methods. Further, this work add knowledge to the existing work in the field of communication
based BCI system
Chinese Benteng Women’s Participation in Local Development Affairs in Indonesia: Appropriate means for struggle and a pathway to claim citizen’ right?
It had been more than two decades passing by aftermath the devastating Asia’s Financial Crisis in 1997, subsequently followed by Suharto’s step down from his presidential throne which he occupied for more than three decades. The financial turmoil turned to a political disaster furthermore has led to massive looting that severely impacted Indonesians of Chinese descendant, including unresolved mystery of the most atrocious sexual violation against women and covert killings of students and democracy activists in this country. Since then, precisely aftermath May 1998, which publicly known as “Reformasi”1, Indonesia underwent political reform that eventually corresponded positively to its macroeconomic growth. Twenty years later, in 2018, Indonesia captured worldwide attention because it has successfully hosted two internationally renowned events, namely the Asian Games 2018 – the most prestigious sport events in Asia – conducted in Jakarta and Palembang; and the IMF/World Bank Annual Meeting 2018 in Bali. Particularly in the IMF/World Bank Annual Meeting, this event has significantly elevated Indonesia’s credibility and international prestige in the global economic powerplay as one of the nations with promising growth and openness. However, the narrative about poverty and inequality, including increasing racial tension, religious conservatism, and sexual violation against women are superseded by friendly climate for foreign investment and eventually excessive glorification of the nation’s economic growth. By portraying the image of promising new economic power, as rhetorically promised by President Joko Widodo during his presidential terms, Indonesia has swept the growing inequality in this highly stratified society that historically compounded with religious and racial tension under the carpet of digital economy.Arte y Humanidade
- …