Search CORE

118 research outputs found

Improvements of Silent Speech Interface Algorithms

Author: Honarmandi Shandiz Amin
Publication venue
Publication date
Field of study

SZTE Doktori Értekezések Repozitórium (SZTE Repository of Dissertations)

Using fMRI and Behavioural Measures to Investigate Rehabilitation in Post-Stroke Aphasic Deficits

Author: Brownsett Sonia
Publication venue: Medicine, Imperial College London
Publication date: 01/01/2014
Field of study

In this thesis I investigated whether an intensive computerised, home-based therapy programme could improve phonological discrimination ability in 19 patients with chronic post-stroke aphasia. One skill specifically targeted by the treatment demonstrated an improvement due to the therapy. However, this improvement did not generalise to untreated items, and was only effective for participants without a lesion involving the frontal lobe, indicating a potentially important role for this region in determining outcome of aphasia therapy. Complementary functional imaging studies investigated activity in domain-general and domain-specific networks in both patients and healthy volunteers during listening and repeating simple sentences. One important consideration when comparing a patient group with a healthy population is the difference in task difficulty encountered by the two groups. Increased cognitive effort can be expected to increase activity in domain-general networks. I minimised the effect of this confound by manipulating task difficulty for the healthy volunteers to reduce their behavioural performance so that it was comparable to that of the patients. By this means I demonstrated that the activation patterns in domain-general regions were very similar in the two groups. Region-of-interest analysis demonstrated that activity within a domain-general network, the salience network, predicted residual language function in the patients with aphasia, even after accounting for lesion volume and their chronological age. I drew two broad conclusions from these studies. First, that computer-based rehabilitation can improve disordered phonological discrimination in chronic aphasia, but that lesion distribution may influence the response to this training. Second, that the ability to activate domain-general cognitive control regions influences outcome in aphasia. This allows me to propose that in future work, therapeutic strategies, pharmacological or behavioural, targeting domain-general brain systems, may benefit aphasic stroke rehabilitation.Open Acces

Spiral - Imperial College Digital Repository

Branching Boogaloo: Botanical Adventures in Multi-Mediated Morphologies

Author: Ruggiero Diana Marie
Publication venue: Bard Digital Commons
Publication date: 01/01/2016
Field of study

FormaLeaf is a software interface for exploring leaf morphology using parallel string rewriting grammars called L-systems. Scanned images of dicotyledonous angiosperm leaves removed from plants around Bard’s campus are displayed on the left and analyzed using the computer vision library OpenCV. Morphometrical information and terminological labels are reported in a side-panel. “Slider mode” allows the user to control the structural template and growth parameters of the generated L-system leaf displayed on the right. “Vision mode” shows the input and generated leaves as the computer ‘sees’ them. “Search mode” attempts to automatically produce a formally defined graphical representation of the input by evaluating the visual similarity of a generated pool of candidate leaves. The system seeks to derive a possible internal structural configuration for venation based purely off a visual analysis of external shape. The iterations of the generated L-system leaves when viewed in succession appear as a hypothetical development sequence. FormaLeaf was written in Processing

Bard College

Apprentissage automatique pour le codage cognitif de la parole

Author: Lotfidereshgi Reza
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2022
Field of study

Depuis les années 80, les codecs vocaux reposent sur des stratégies de codage à court terme qui fonctionnent au niveau de la sous-trame ou de la trame (généralement 5 à 20 ms). Les chercheurs ont essentiellement ajusté et combiné un nombre limité de technologies disponibles (transformation, prédiction linéaire, quantification) et de stratégies (suivi de forme d'onde, mise en forme du bruit) pour construire des architectures de codage de plus en plus complexes. Dans cette thèse, plutôt que de s'appuyer sur des stratégies de codage à court terme, nous développons un cadre alternatif pour la compression de la parole en codant les attributs de la parole qui sont des caractéristiques perceptuellement importantes des signaux vocaux. Afin d'atteindre cet objectif, nous résolvons trois problèmes de complexité croissante, à savoir la classification, la prédiction et l'apprentissage des représentations. La classification est un élément courant dans les conceptions de codecs modernes. Dans un premier temps, nous concevons un classifieur pour identifier les émotions, qui sont parmi les attributs à long terme les plus complexes de la parole. Dans une deuxième étape, nous concevons un prédicteur d'échantillon de parole, qui est un autre élément commun dans les conceptions de codecs modernes, pour mettre en évidence les avantages du traitement du signal de parole à long terme et non linéaire. Ensuite, nous explorons les variables latentes, un espace de représentations de la parole, pour coder les attributs de la parole à court et à long terme. Enfin, nous proposons un réseau décodeur pour synthétiser les signaux de parole à partir de ces représentations, ce qui constitue notre dernière étape vers la construction d'une méthode complète de compression de la parole basée sur l'apprentissage automatique de bout en bout. Bien que chaque étape de développement proposée dans cette thèse puisse faire partie d'un codec à elle seule, chaque étape fournit également des informations et une base pour la prochaine étape de développement jusqu'à ce qu'un codec entièrement basé sur l'apprentissage automatique soit atteint. Les deux premières étapes, la classification et la prédiction, fournissent de nouveaux outils qui pourraient remplacer et améliorer des éléments des codecs existants. Dans la première étape, nous utilisons une combinaison de modèle source-filtre et de machine à état liquide (LSM), pour démontrer que les caractéristiques liées aux émotions peuvent être facilement extraites et classées à l'aide d'un simple classificateur. Dans la deuxième étape, un seul réseau de bout en bout utilisant une longue mémoire à court terme (LSTM) est utilisé pour produire des trames vocales avec une qualité subjective élevée pour les applications de masquage de perte de paquets (PLC). Dans les dernières étapes, nous nous appuyons sur les résultats des étapes précédentes pour concevoir un codec entièrement basé sur l'apprentissage automatique. un réseau d'encodage, formulé à l'aide d'un réseau neuronal profond (DNN) et entraîné sur plusieurs bases de données publiques, extrait et encode les représentations de la parole en utilisant la prédiction dans un espace latent. Une approche d'apprentissage non supervisé basée sur plusieurs principes de cognition est proposée pour extraire des représentations à partir de trames de parole courtes et longues en utilisant l'information mutuelle et la perte contrastive. La capacité de ces représentations apprises à capturer divers attributs de la parole à court et à long terme est démontrée. Enfin, une structure de décodage est proposée pour synthétiser des signaux de parole à partir de ces représentations. L'entraînement contradictoire est utilisé comme une approximation des mesures subjectives de la qualité de la parole afin de synthétiser des échantillons de parole à consonance naturelle. La haute qualité perceptuelle de la parole synthétisée ainsi obtenue prouve que les représentations extraites sont efficaces pour préserver toutes sortes d'attributs de la parole et donc qu'une méthode de compression complète est démontrée avec l'approche proposée.Abstract: Since the 80s, speech codecs have relied on short-term coding strategies that operate at the subframe or frame level (typically 5 to 20ms). Researchers essentially adjusted and combined a limited number of available technologies (transform, linear prediction, quantization) and strategies (waveform matching, noise shaping) to build increasingly complex coding architectures. In this thesis, rather than relying on short-term coding strategies, we develop an alternative framework for speech compression by encoding speech attributes that are perceptually important characteristics of speech signals. In order to achieve this objective, we solve three problems of increasing complexity, namely classification, prediction and representation learning. Classification is a common element in modern codec designs. In a first step, we design a classifier to identify emotions, which are among the most complex long-term speech attributes. In a second step, we design a speech sample predictor, which is another common element in modern codec designs, to highlight the benefits of long-term and non-linear speech signal processing. Then, we explore latent variables, a space of speech representations, to encode both short-term and long-term speech attributes. Lastly, we propose a decoder network to synthesize speech signals from these representations, which constitutes our final step towards building a complete, end-to-end machine-learning based speech compression method. The first two steps, classification and prediction, provide new tools that could replace and improve elements of existing codecs. In the first step, we use a combination of source-filter model and liquid state machine (LSM), to demonstrate that features related to emotions can be easily extracted and classified using a simple classifier. In the second step, a single end-to-end network using long short-term memory (LSTM) is shown to produce speech frames with high subjective quality for packet loss concealment (PLC) applications. In the last steps, we build upon the results of previous steps to design a fully machine learning-based codec. An encoder network, formulated using a deep neural network (DNN) and trained on multiple public databases, extracts and encodes speech representations using prediction in a latent space. An unsupervised learning approach based on several principles of cognition is proposed to extract representations from both short and long frames of data using mutual information and contrastive loss. The ability of these learned representations to capture various short- and long-term speech attributes is demonstrated. Finally, a decoder structure is proposed to synthesize speech signals from these representations. Adversarial training is used as an approximation to subjective speech quality measures in order to synthesize natural-sounding speech samples. The high perceptual quality of synthesized speech thus achieved proves that the extracted representations are efficient at preserving all sorts of speech attributes and therefore that a complete compression method is demonstrated with the proposed approach

Savoirs UdeS