15,857 research outputs found
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Automatic speech recognition (ASR) has gained a remarkable success thanks to
recent advances of deep learning, but it usually degrades significantly under
real-world noisy conditions. Recent works introduce speech enhancement (SE) as
front-end to improve speech quality, which is proved effective but may not be
optimal for downstream ASR due to speech distortion problem. Based on that,
latest works combine SE and currently popular self-supervised learning (SSL) to
alleviate distortion and improve noise robustness. Despite the effectiveness,
the speech distortion caused by conventional SE still cannot be completely
eliminated. In this paper, we propose a self-supervised framework named
Wav2code to implement a generalized SE without distortions for noise-robust
ASR. First, in pre-training stage the clean speech representations from SSL
model are sent to lookup a discrete codebook via nearest-neighbor feature
matching, the resulted code sequence are then exploited to reconstruct the
original clean representations, in order to store them in codebook as prior.
Second, during finetuning we propose a Transformer-based code predictor to
accurately predict clean codes by modeling the global dependency of input noisy
representations, which enables discovery and restoration of high-quality clean
representations without distortions. Furthermore, we propose an interactive
feature fusion network to combine original noisy and the restored clean
representations to consider both fidelity and quality, resulting in even more
informative features for downstream ASR. Finally, experiments on both synthetic
and real noisy datasets demonstrate that Wav2code can solve the speech
distortion and improve ASR performance under various noisy conditions,
resulting in stronger robustness.Comment: 12 pages, 7 figures, Submitted to IEEE/ACM TASL
Grasping nothing: a study of minimal ontologies and the sense of music
If music were to have a proper sense – one in which it is truly given – one might reasonably place this in sound and aurality. I contend, however, that no such sense exists; rather, the sense of music takes place, and it does so with the impossible. To this end, this thesis – which is a work of philosophy and music – advances an ontology of the impossible (i.e., it thinks the being of what, properly speaking, can have no being) and considers its implications for music, articulating how ontological aporias – of the event, of thinking the absolute, and of sovereignty’s dismemberment – imply senses of music that are anterior to sound. John Cage’s Silent Prayer, a nonwork he never composed, compels a rerethinking of silence on the basis of its contradictory status of existence; Florian Hecker et al.’s Speculative Solution offers a basis for thinking absolute music anew to the precise extent that it is a discourse of meaninglessness; and Manfred Werder’s [yearn] pieces exhibit exemplarily that music’s sense depends on the possibility of its counterfeiting. Inso-much as these accounts produce musical senses that take the place of sound, they are also understood to be performances of these pieces. Here, then, thought is music’s organon and its instrument
Exploring the Training Factors that Influence the Role of Teaching Assistants to Teach to Students With SEND in a Mainstream Classroom in England
With the implementation of inclusive education having become increasingly valued over the years, the training of Teaching Assistants (TAs) is now more important than ever, given that they work alongside pupils with special educational needs and disabilities (hereinafter SEND) in mainstream education classrooms. The current study explored the training factors that influence the role of TAs when it comes to teaching SEND students in mainstream classrooms in England during their one-year training period. This work aimed to increase understanding of how the training of TAs is seen to influence the development of their personal knowledge and professional skills. The study has significance for our comprehension of the connection between the TAs’ training and the quality of education in the classroom. In addition, this work investigated whether there existed a correlation between the teaching experience of TAs and their background information, such as their gender, age, grade level taught, years of teaching experience, and qualification level.
A critical realist theoretical approach was adopted for this two-phased study, which involved the mixing of adaptive and grounded theories respectively. The multi-method project featured 13 case studies, each of which involved a trainee TA, his/her college tutor, and the classroom teacher who was supervising the trainee TA. The analysis was based on using semi-structured interviews, various questionnaires, and non-participant observation methods for each of these case studies during the TA’s one-year training period. The primary analysis of the research was completed by comparing the various kinds of data collected from the participants in the first and second data collection stages of each case. Further analysis involved cross-case analysis using a grounded theory approach, which made it possible to draw conclusions and put forth several core propositions. Compared with previous research, the findings of the current study reveal many implications for the training and deployment conditions of TAs, while they also challenge the prevailing approaches in many aspects, in addition to offering more diversified, enriched, and comprehensive explanations of the critical pedagogical issues
Statistical phase estimation and error mitigation on a superconducting quantum processor
Quantum phase estimation (QPE) is a key quantum algorithm, which has been
widely studied as a method to perform chemistry and solid-state calculations on
future fault-tolerant quantum computers. Recently, several authors have
proposed statistical alternatives to QPE that have benefits on early
fault-tolerant devices, including shorter circuits and better suitability for
error mitigation techniques. However, practical implementations of the
algorithm on real quantum processors are lacking. In this paper we practically
implement statistical phase estimation on Rigetti's superconducting processors.
We specifically use the method of Lin and Tong [PRX Quantum 3, 010318 (2022)]
using the improved Fourier approximation of Wan et al. [PRL 129, 030503
(2022)], and applying a variational compilation technique to reduce circuit
depth. We then incorporate error mitigation strategies including zero-noise
extrapolation and readout error mitigation with bit-flip averaging. We propose
a simple method to estimate energies from the statistical phase estimation
data, which is found to improve the accuracy in final energy estimates by one
to two orders of magnitude with respect to prior theoretical bounds, reducing
the cost to perform accurate phase estimation calculations. We apply these
methods to chemistry problems for active spaces up to 4 electrons in 4
orbitals, including the application of a quantum embedding method, and use them
to correctly estimate energies within chemical precision. Our work demonstrates
that statistical phase estimation has a natural resilience to noise,
particularly after mitigating coherent errors, and can achieve far higher
accuracy than suggested by previous analysis, demonstrating its potential as a
valuable quantum algorithm for early fault-tolerant devices.Comment: 24 pages, 13 figure
Unified Multi-Modal Image Synthesis for Missing Modality Imputation
Multi-modal medical images provide complementary soft-tissue characteristics
that aid in the screening and diagnosis of diseases. However, limited scanning
time, image corruption and various imaging protocols often result in incomplete
multi-modal images, thus limiting the usage of multi-modal data for clinical
purposes. To address this issue, in this paper, we propose a novel unified
multi-modal image synthesis method for missing modality imputation. Our method
overall takes a generative adversarial architecture, which aims to synthesize
missing modalities from any combination of available ones with a single model.
To this end, we specifically design a Commonality- and Discrepancy-Sensitive
Encoder for the generator to exploit both modality-invariant and specific
information contained in input modalities. The incorporation of both types of
information facilitates the generation of images with consistent anatomy and
realistic details of the desired distribution. Besides, we propose a Dynamic
Feature Unification Module to integrate information from a varying number of
available modalities, which enables the network to be robust to random missing
modalities. The module performs both hard integration and soft integration,
ensuring the effectiveness of feature combination while avoiding information
loss. Verified on two public multi-modal magnetic resonance datasets, the
proposed method is effective in handling various synthesis tasks and shows
superior performance compared to previous methods.Comment: 10 pages, 9 figure
Projected Multi-Agent Consensus Equilibrium (PMACE) for Distributed Reconstruction with Application to Ptychography
Multi-Agent Consensus Equilibrium (MACE) formulates an inverse imaging
problem as a balance among multiple update agents such as data-fitting terms
and denoisers. However, each such agent operates on a separate copy of the full
image, leading to redundant memory use and slow convergence when each agent
affects only a small subset of the full image. In this paper, we extend MACE to
Projected Multi-Agent Consensus Equilibrium (PMACE), in which each agent
updates only a projected component of the full image, thus greatly reducing
memory use for some applications.We describe PMACE in terms of an equilibrium
problem and an equivalent fixed point problem and show that in most cases the
PMACE equilibrium is not the solution of an optimization problem. To
demonstrate the value of PMACE, we apply it to the problem of ptychography, in
which a sample is reconstructed from the diffraction patterns resulting from
coherent X-ray illumination at multiple overlapping spots. In our PMACE
formulation, each spot corresponds to a separate data-fitting agent, with the
final solution found as an equilibrium among all the agents. Our results
demonstrate that the PMACE reconstruction algorithm generates more accurate
reconstructions at a lower computational cost than existing ptychography
algorithms when the spots are sparsely sampled
Identifying and responding to people with mild learning disabilities in the probation service
It has long been recognised that, like many other individuals, people with learningdisabilities find their way into the criminal justice system. This fact is not disputed. Whathas been disputed, however, is the extent to which those with learning disabilities arerepresented within the various agencies of the criminal justice system and the ways inwhich the criminal justice system (and society) should address this. Recently, social andlegislative confusion over the best way to deal with offenders with learning disabilities andmental health problems has meant that the waters have become even more muddied.Despite current government uncertainty concerning the best way to support offenders withlearning disabilities, the probation service is likely to continue to play a key role in thesupervision of such offenders. The three studies contained herein aim to clarify the extentto which those with learning disabilities are represented in the probation service, toexamine the effectiveness of probation for them and to explore some of the ways in whichprobation could be adapted to fit their needs.Study 1 and study 2 showed that around 10% of offenders on probation in Kent appearedto have an IQ below 75, putting them in the bottom 5% of the general population. Study 3was designed to assess some of the support needs of those with learning disabilities in theprobation service, finding that many of the materials used by the probation service arelikely to be too complex for those with learning disabilities to use effectively. To addressthis, a model for service provision is tentatively suggested. This is based on the findings ofthe three studies and a pragmatic assessment of what the probation service is likely to becapable of achieving in the near future
Learning disentangled speech representations
A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody.
The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions.
In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks.
This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically
The Professional Identity of Doctors who Provide Abortions: A Sociological Investigation
Abortion is a medicalised problem in England and Wales, where the law places doctors at the centre of legal provision and puts doctors in control of who has an abortion. However, the sex-selection abortion scandal of 2012 presented a very real threat to 'abortion doctors', when the medical profession's values and practices were questioned in the media, society and by Members of Parliament. Doctors found themselves at the centre of a series of claims that stated doctors were acting both illegally and unethically, driven by profit rather than patient needs. Yet, the perspectives of those doctors who provide abortions has been under-researched; this thesis aims to fill that gap by examining the beliefs and values of this group of doctors. Early chapters highlight the ambiguous position of the abortion provider in Britain, where doctors are seen as a collective group of professionals motivated by medical dominance and medical autonomy. They outline how this position is then questioned and contested, with doctors being presented as unethical. By studying abortion at the macro-, meso- and micro-levels, this thesis seeks to better understand the values of the 'abortion doctor', and how these levels shape the work and experiences of abortion providers in England and Wales. This thesis thus addresses the question: 'What do abortion doctors' accounts of their professional work suggest about the contemporary dynamics of the medicalisation of abortion in Britain?'. It investigates the research question using a qualitative methodological approach: face-to-face and telephone interviews were conducted with 47 doctors who provide abortions in England and Wales. The findings from this empirical study show how doctors' values are linked to how they view the 'normalisation of abortion'. At the macro-level doctors, openly resisted the medicalisation of abortion through the position ascribed to them by the legal framework, yet at the meso-level doctors construct an identity where normalising abortion is based on further medicalising services. Finally, at the micro-level, the ambiguous position of the abortion provider is further identified in terms of being both a proud provider and a stigmatised individual. This thesis shows that while the existing medicalisation literature has some utility, it has limited explanatory power when investigating the problem of abortion. The thesis thus provides some innovative insights into the relevance and value of medicalisation through a comprehensive study on doctors' values, beliefs and practices
- …