11 research outputs found
Word boundary agreementto combine multi-microphone hypotheses in distant speech recognition
In this paper we propose a technique for combining hypothe-
ses generated in a multi-microphone setting, which exploits
complementarity and collective agreement among ASR out-
puts of different channels. The technique draws upon the
information encoded in the available set of word lattices. As
a first step, we identify word boundaries in which a compre-
hensive inter-channel agreement is found; then, these bound-
aries are used to reduce the global hypothesis search space.
Global word posterior probabilities are estimated for the can-
didate words associated to each of the bounded segments.
As a result, a single combined confusion network is gener-
ated from the multiple lattices. This approach offers a novel
perspective to state of the art solutions based on confusion
network combination. Promising results were obtained from
an experimental evaluation in a simulated domestic environ-
ment equipped with a distributed microphone network. The
development and test sets were simulated using real impulse
responses estimated for a large set of microphone-speaker
position pairs
Dysarthric speech analysis and automatic recognition using phase based representations
Dysarthria is a neurological speech impairment which usually results in the loss of motor speech control due to muscular atrophy and poor coordination of articulators. Dysarthric speech is more difficult to model with machine learning algorithms, due to inconsistencies in the acoustic signal and to limited amounts of training data. This study reports a new approach for the analysis and representation of dysarthric speech, and applies it to improve ASR performance.
The Zeros of Z-Transform (ZZT) are investigated for dysarthric vowel segments. It shows evidence of a phase-based acoustic phenomenon that is responsible for the way the distribution of zero patterns relate to speech intelligibility. It is investigated whether such phase-based artefacts can be systematically exploited to understand their association with intelligibility.
A metric based on the phase slope deviation (PSD) is introduced that are observed in the unwrapped phase spectrum of dysarthric vowel segments. The metric compares the differences between the slopes of dysarthric vowels and typical vowels. The PSD shows a strong and nearly linear correspondence with the intelligibility of the speaker, and it is shown to hold for two separate databases of dysarthric speakers. A systematic procedure for correcting the underlying phase deviations results in a significant improvement in ASR performance for speakers with severe and moderate dysarthria.
In addition, information encoded in the phase component of the Fourier transform of dysarthric speech is exploited in the group delay spectrum. Its properties are found to represent disordered speech more effectively than the magnitude spectrum. Dysarthric ASR performance was significantly improved using phase-based cepstral features in comparison to the conventional MFCCs. A combined approach utilising the benefits of PSD corrections and phase-based features was found to surpass all the previous performance on the UASPEECH database of dysarthric speech
Third International Conference on Technologies for Music Notation and Representation TENOR 2017
The third International Conference on Technologies for Music Notation and Representation seeks to focus on a set of specific research issues associated with Music Notation that were elaborated at the first two editions of TENOR in Paris and Cambridge. The theme of the conference is vocal music, whereas the pre-conference workshops focus on innovative technological approaches to music notation
Recommended from our members
Person-based Prominence in Ojibwe
This dissertation develops a formal and psycholinguistic theory of person-based prominence effects, the finding that certain categories of person such as first and second (the local persons) are privileged by the grammar. The thesis takes on three questions: (i) What are the possible categories related to person? (ii) What are the possible prominence relationships between these categories? And (iii) how is prominence information used to parse and interpret linguistic input in real time?
The empirical through-line is understanding obviation — a “spotlighting” system, found most prominently in the Algonquian family of languages, that splits the (ani- mate) third persons into two categories: proximate, the person who is in the spotlight, and obviative, the persons who are introduced into the discourse, but are not in the spotlight. I provide a semantics for the feature [proximate], and detail a lattice-based theory of feature composition to derive the categories related to obviation in Border Lakes Ojibwe and beyond. This leads to insights about the syntactic and semantic relationships between person, animacy-based noun classification, number, and obviation.
The novel contribution to the theory of person-based prominence effects is to de- compose person features into sets of primitives. This proposal allows the stipulated entailment relationships between categories and features, as encoded in prominence hierarchies and feature geometries, to be derived from the first principles of set theory. I further motivate the account by showing that it has increased empirical coverage, and apply it to capture patterns of agreement and word order in Border Lakes Ojibwe.
Finally, I present a psycholinguistic study on how obviation is used to process filler- gap dependencies in Border Lakes Ojibwe. I show that obviation, and by extension, prominence information more generally, is used immediately to predictively encode movement chains, prior to bottom-up information from voice marking about the argument structure of the clause. I argue for a modular and syntax-first model of parsing, revising the Active Filler Strategy to be guided by pressures to minimize syntactic distance and maximize the expected well-formedness of each link in the chain. These pressures compete, accounting for effects of prediction, integration, and reanalysis in long-distance dependency formation
Algorithmic business and EU law on fair trading
This thesis studies how commercial practice is developing with artificial intelligence (AI) technologies and discusses some normative concepts in EU consumer law. The author analyses the phenomenon of 'algorithmic business', which defines the increasing use of data-driven AI in marketing organisations for the optimisation of a range of consumer-related tasks. The phenomenon is orienting business-consumer relations towards some general trends that influence power and behaviors of consumers. These developments are not taking place in a legal vacuum, but against the background of a normative system aimed at maintaining fairness and balance in market transactions. The author assesses current developments in commercial practices in the context of EU consumer law, which is specifically aimed at regulating commercial practices. The analysis is critical by design and without neglecting concrete practices tries to look at the big picture.
The thesis consists of nine chapters divided in three thematic parts. The first part discusses the deployment of AI in marketing organisations, a brief history, the technical foundations, and their modes of integration in business organisations. In the second part, a selected number of socio-technical developments in commercial practice are analysed. The following are addressed: the monitoring and analysis of consumers’ behaviour based on data; the personalisation of commercial offers and customer experience; the use of information on consumers’ psychology and emotions, the mediation through marketing conversational applications. The third part assesses these developments in the context of EU consumer law and of the broader policy debate concerning consumer protection in the algorithmic society. In particular, two normative concepts underlying the EU fairness standard are analysed: manipulation, as a substantive regulatory standard that limits commercial behaviours in order to protect consumers’ informed and free choices and vulnerability, as a concept of social policy that portrays people who are more exposed to marketing practices
The 9th International Conference on Sustainable Development
The International Conference on Sustainable Development (ICSD) was held virtually on September 20-21, 2021, with the conference theme “Research for Impact: A Sustainable and Inclusive Planet.” ICSD provides a forum for academia, government, civil society, UN agencies, and the private sector to come together to share practical solutions to achieve Sustainable Development Goals (SDGs). The two-day conference hosted 49 different sessions across multiple time zones to accommodate the global audience, with 204 oral presenters, 239 poster presenters, and 977 total authors