184 research outputs found

    Speech wave-form driven motion synthesis for embodied agents

    Get PDF
    The main objective of this thesis is to synthesise motion from speech, especially in conversation. Based on previous research into different acoustic features or the combination of them were investigated, no one has investigated in estimating head motion from waveform directly, which is the stem of the speech. Thus, we study the direct use of speech waveform to generate head motion. We claim that creating a task-specific feature from waveform to generate head motion leads to better performance than using standard acoustic features to generate head motion overall. At the same time, we completely abandon the handcrafted feature extraction process, leading to more effectiveness. However, there are a few problems if we would like to apply speech waveform, 1) high dimensional, where the dimension of the waveform data is much higher than those common acoustic features and thus making the training of the model more difficult, and 2) irrelevant information, which refers to the full information in the original waveform implicating potential cumbrance for neural network training. To resolve these problems, we applied a deep canonical correlated constrainted auto-encoder (DCCCAE) to compress the waveform into low dimensional and highly correlated embedded features with head motion. The estimated head motion was evaluated both objectively and subjectively. In objective evaluation, the result confirmed that DCCCAE enables the creation of a more correlated feature with the head motion than standard AE and other popular spectral features such as MFCC and FBank, and is capable of being used in achieving state-of-the-art results for predicting natural head motion with the advantage of the DCCCAE. Besides investigating the representation learning of the feature, we also explored the LSTM-based regression model for the proposed feature. The LSTM-based models were able to boost the overall performance in the objective evaluation and adapt better to the proposed feature than MFCC. MUSHRA-liked subjective evaluation results suggest that the animations generated by models with the proposed feature were chosen to be better than the other models by the participants of MUSHRA-liked test. A/B test further that the LSTM-based regression model adapts better to the proposed feature. Furthermore, we extended the architecture to estimate the upper body motion as well. We submitted our result to GENEA2020 and our model achieved a higher score than BA in both aspects (human-likeness and appropriateness) according to the participant’s preference, suggesting that the highly correlated feature pair and the sequential estimation helped in improving the model generalisation

    Internet and Biometric Web Based Business Management Decision Support

    Get PDF
    Internet and Biometric Web Based Business Management Decision Support MICROBE MOOC material prepared under IO1/A5 Development of the MICROBE personalized MOOCs content and teaching materials Prepared by: A. Kaklauskas, A. Banaitis, I. Ubarte Vilnius Gediminas Technical University, Lithuania Project No: 2020-1-LT01-KA203-07810

    Whose Emotion Matters? Speaking Activity Localisation without Prior Knowledge

    Full text link
    The task of emotion recognition in conversations (ERC) benefits from the availability of multiple modalities, as provided, for example, in the video-based Multimodal EmotionLines Dataset (MELD). However, only a few research approaches use both acoustic and visual information from the MELD videos. There are two reasons for this: First, label-to-video alignments in MELD are noisy, making those videos an unreliable source of emotional speech data. Second, conversations can involve several people in the same scene, which requires the localisation of the utterance source. In this paper, we introduce MELD with Fixed Audiovisual Information via Realignment (MELD-FAIR) by using recent active speaker detection and automatic speech recognition models, we are able to realign the videos of MELD and capture the facial expressions from speakers in 96.92% of the utterances provided in MELD. Experiments with a self-supervised voice recognition model indicate that the realigned MELD-FAIR videos more closely match the transcribed utterances given in the MELD dataset. Finally, we devise a model for emotion recognition in conversations trained on the realigned MELD-FAIR videos, which outperforms state-of-the-art models for ERC based on vision alone. This indicates that localising the source of speaking activities is indeed effective for extracting facial expressions from the uttering speakers and that faces provide more informative visual cues than the visual features state-of-the-art models have been using so far. The MELD-FAIR realignment data, and the code of the realignment procedure and of the emotional recognition, are available at https://github.com/knowledgetechnologyuhh/MELD-FAIR.Comment: 17 pages, 8 figures, 7 tables, Published in Neurocomputin

    Facial Micro- and Macro-Expression Spotting and Generation Methods

    Get PDF
    Facial micro-expression (ME) recognition requires face movement interval as input, but computer methods in spotting ME are still underperformed. This is due to lacking large-scale long video dataset and ME generation methods are in their infancy. This thesis presents methods to address data deficiency issues and introduces a new method for spotting macro- and micro-expressions simultaneously. This thesis introduces SAMM Long Videos (SAMM-LV), which contains 147 annotated long videos, and develops a baseline method to facilitate ME Grand Challenge 2020. Further, a reference-guided style transfer of StarGANv2 is experimented on SAMM-LV to generate a synthetic dataset, namely SAMM-SYNTH. The quality of SAMM-SYNTH is evaluated by using facial action units detected by OpenFace. Quantitative measurement shows high correlations on two Action Units (AU12 and AU6) of the original and synthetic data. In facial expression spotting, a two-stream 3D-Convolutional Neural Network with temporal oriented frame skips that can spot micro- and macro-expression simultaneously is proposed. This method achieves state-of-the-art performance in SAMM-LV and is competitive in CAS(ME)2, it was used as the baseline result of ME Grand Challenge 2021. The F1-score improves to 0.1036 when trained with composite data consisting of SAMM-LV and SAMMSYNTH. On the unseen ME Grand Challenge 2022 evaluation dataset, it achieves F1-score of 0.1531. Finally, a new sequence generation method to explore the capability of deep learning network is proposed. It generates spontaneous facial expressions by using only two input sequences without any labels. SSIM and NIQE were used for image quality analysis and the generated data achieved 0.87 and 23.14. By visualising the movements using optical flow value and absolute frame differences, this method demonstrates its potential in generating subtle ME. For realism evaluation, the generated videos were rated by using two facial expression recognition networks

    Time- and value-continuous explainable affect estimation in-the-wild

    Get PDF
    Today, the relevance of Affective Computing, i.e., of making computers recognise and simulate human emotions, cannot be overstated. All technology giants (from manufacturers of laptops to mobile phones to smart speakers) are in a fierce competition to make their devices understand not only what is being said, but also how it is being said to recognise user’s emotions. The goals have evolved from predicting the basic emotions (e.g., happy, sad) to now the more nuanced affective states (e.g., relaxed, bored) real-time. The databases used in such research too have evolved, from earlier featuring the acted behaviours to now spontaneous behaviours. There is a more powerful shift lately, called in-the-wild affect recognition, i.e., taking the research out of the laboratory, into the uncontrolled real-world. This thesis discusses, for the very first time, affect recognition for two unique in-the-wild audiovisual databases, GRAS2 and SEWA. The GRAS2 is the only database till date with time- and value-continuous affect annotations for Labov effect-free affective behaviours, i.e., without the participant’s awareness of being recorded (which otherwise is known to affect the naturalness of one’s affective behaviour). The SEWA features participants from six different cultural backgrounds, conversing using a video-calling platform. Thus, SEWA features in-the-wild recordings further corrupted by unpredictable artifacts, such as the network-induced delays, frame-freezing and echoes. The two databases present a unique opportunity to study time- and value-continuous affect estimation that is truly in-the-wild. A novel ‘Evaluator Weighted Estimation’ formulation is proposed to generate a gold standard sequence from several annotations. An illustration is presented demonstrating that the moving bag-of-words (BoW) representation better preserves the temporal context of the features, yet remaining more robust against the outliers compared to other statistical summaries, e.g., moving average. A novel, data-independent randomised codebook is proposed for the BoW representation; especially useful for cross-corpus model generalisation testing when the feature-spaces of the databases differ drastically. Various deep learning models and support vector regressors are used to predict affect dimensions time- and value-continuously. Better generalisability of the models trained on GRAS2 , despite the smaller training size, makes a strong case for the collection and use of Labov effect-free data. A further foundational contribution is the discovery of the missing many-to-many mapping between the mean square error (MSE) and the concordance correlation coefficient (CCC), i.e., between two of the most popular utility functions till date. The newly invented cost function |MSE_{XY}/σ_{XY}| has been evaluated in the experiments aimed at demystifying the inner workings of a well-performing, simple, low-cost neural network effectively utilising the BoW text features. Also proposed herein is the shallowest-possible convolutional neural network (CNN) that uses the facial action unit (FAU) features. The CNN exploits sequential context, but unlike RNNs, also inherently allows data- and process-parallelism. Interestingly, for the most part, these white-box AI models have shown to utilise the provided features consistent with the human perception of emotion expression

    Principled methods for mixtures processing

    Get PDF
    This document is my thesis for getting the habilitation à diriger des recherches, which is the french diploma that is required to fully supervise Ph.D. students. It summarizes the research I did in the last 15 years and also provides the short­term research directions and applications I want to investigate. Regarding my past research, I first describe the work I did on probabilistic audio modeling, including the separation of Gaussian and α­stable stochastic processes. Then, I mention my work on deep learning applied to audio, which rapidly turned into a large effort for community service. Finally, I present my contributions in machine learning, with some works on hardware compressed sensing and probabilistic generative models.My research programme involves a theoretical part that revolves around probabilistic machine learning, and an applied part that concerns the processing of time series arising in both audio and life sciences

    Comprendre et mesurer les asymétries dans les dispositifs en ligne pour la délibération publique. Revue de la littérature internationale et analyse exploratoire d’un dispositif au sujet de l’innovation en santé

    Full text link
    Au cours des trente dernières années, plusieurs organisations publiques et académiques ont créé des nombreux espaces de participation du public en adoptant des méthodes délibératives. Ayant pour finalité de réfléchir collectivement à ce qui serait une ligne de conduite raisonnable, les domaines d’application de ces dispositifs sont aussi variés que les politiques de santé publique, la planification organisationnelle, ou l’innovation et la recherche en santé. Suivant l’approche interactive de l’engagement public fondée sur les principes des théories de la démocratie délibérative, ces dispositifs aspirent à respecter trois dimensions normatives fondamentales : l’expression d’opinions justifiées, l’égalité et l’engagement. De fait, les asymétries entre les membres des groupes sociaux peuvent conduire à l’exclusion dans ces dispositifs en créant des obstacles dont la capacité à mobiliser du temps ou d’autres ressources pour délibérer. À leur tour, les exclusions peuvent également conduire à sous-évaluer les points de vue et les expériences de certains segments de la population conformément à une dérive vers la domination de l’espace communicationnel par quelques individus. L’égalité des chances à s’exprimer augmente, en principe, la probabilité qu’une diversité de points de vue soit entendue. Les méthodes délibératives en ligne, de plus en plus utilisées, apparaissent comme prometteuses en matière de réalisation des idéaux démocratiques car elles réduisent certaines exigences spatio-temporelles. Pourtant, peu d’études ont été menées afin de décrire l’éventail des approches actuellement utilisées pour mesurer les inégalités dans les processus et les résultats de la délibération en ligne. Pour contribuer à combler ces lacunes dans la littérature scientifique et dans la pratique délibérative nous adoptons une compréhension large de la communication dans les processus délibératifs. Il importe de comprendre comment les individus s’engagent dans des interactions variées afin de produire des informations de façon collective, sans pour autant dominer le débat. Pour ce faire, nous développons la notion des asymétries à partir du construit d’inégalités délibératives. Cette thèse a ainsi pour objectif principal d’examiner comment ces inégalités sont conceptualisées et mesurées, notamment lorsqu’elles se déroulent dans un environnement en ligne. Notre but étant d’identifier les mécanismes potentiels qui pourraient moduler des asymétries relevant de l’individu, du processus de communication de groupe et de la conception du dispositif afin d’informer et de faciliter la conduite de mécanismes délibératifs de meilleure qualité en santé publique. Notre recherche s’appuie sur un devis exploratoire suivant une approche séquentielle avec plusieurs méthodologies. Les deux premiers articles sont rédigés à partir de stratégies complémentaires de revues systématiques de la littérature internationale. Le premier article présente les construits pour définir et mesurer des asymétries délibératives par une revue de la portée (« scoping review »). Lorsqu’ils conçoivent des études et analysent des processus délibératifs, les chercheurs s’appuient sur des hypothèses et des présomptions normatives qui sous-tendent ces mesures. Pour analyser les résultats, nous mobilisons un cadre théorique ancré dans la recherche sur les mouvements sociaux qui propose la prise en compte des ressources biographiques des participants en tant que citoyens. Le deuxième article repose sur une revue systématique visant à inventorier les mesures d’évaluation des asymétries dans les études de délibération ayant opté pour un dispositif en ligne. Les résultats abordent leur définition et leur utilisation en considérant la qualité méthodologique des études, les propriétés des mesures et l’applicabilité tout au long des étapes du processus de délibération, allant du recrutement à l’évaluation des effets. Le troisième article comporte l’analyse secondaire des données issues de l’étude empirique d’un dispositif de délibération en ligne concernant la conception de technologies en santé au Québec, « Dessine-moi un futur ! ». Nous examinons l’occupation de l’espace communicationnel ainsi que les variations selon les caractéristiques sociodémographiques, comportementales et attitudinales des participants. Nous constatons que le partage de l’espace communicationnel amène à une participation et une interaction saillante, la saillance étant comprise à la fois comme un facteur et un résultat de l’engagement des participants dans la délibération. La façon dont le pouvoir est exercé n’est pas unilatérale, ni homogène au cours de la délibération en ligne. En conclusion, rendre les jugements du public explicites est un enjeu majeur pour les promoteurs des dispositifs de délibération dans le domaine de la santé. Les mécanismes délibératifs peuvent prendre en compte les asymétries au cours du processus pour équilibrer la diversité des ressources et des capacités dont disposent les membres du public. Cette thèse participe au renforcement de la recherche interdisciplinaire pour viser à réduire ces anomalies dans la délibération publique par une action conjointe de la santé publique et la démocratie délibérative.Abstract: Over the past 30 years, many public and academic organisations have created numerous public involvement spaces by adopting deliberative methods. These methods seek to bring diverse perspectives into an open dialogue, in order to collectively reflect on what may be a reasonable course of action. The areas of application are as varied as public health policy, organizational planning, or health innovation and research. Following the interactive approach to public engagement based on the principles of deliberative democracy theories, these devices aspire to respect three fundamental normative dimensions: the expression of justified opinions, equality and engagement. Indeed, asymmetries among members of social groups can lead to exclusion on these devices by creating barriers to accessing competences required, including the ability to mobilize time or other resources for deliberation. In turn, exclusions can also lead to an undervaluing of the views and experiences of certain segments of the population, resulting in a drift towards domination of the communication space by a few individuals. Equal opportunity to express oneself increases, in principle, the probability that a diversity of points of view will be heard. Online deliberative methods, which have been increasingly used, appear to hold promise for achieving democratic ideals through the reduction of spatio-temporal requirements. Yet few studies have been conducted to describe the range of approaches currently used to measure inequalities in the processes and outcomes of online deliberation. To contribute to these gaps in the scientific literature and in deliberative practice, we adopt a broad understanding of communication in deliberative processes. It is important to understand how individuals engage in a variety of interactions in order to produce information collectively, without dominating the debate. To do so, we mobilize the construct of deliberative inequalities and develop the notion of the asymmetries that underlie these inequalities. The main objective of this thesis is thus to examine how deliberative inequalities are conceptualized and measured, especially when they take place in an online environment. Our purpose is to identify potential mechanisms that could modulate individual, group communication process and process design asymmetries in order to inform and facilitate the conduct of higher quality deliberative processes in public health. Our research is based on an exploratory design following a sequential approach with several methodologies. The first two articles are written using complementary strategies from international literature reviews. The first article presents constructs to define and measure deliberative asymmetries through a scoping review. When designing studies and analyzing deliberative processes, researchers rely on assumptions and normative presumptions that underlie these measures. To analyze the results, we mobilize a theoretical framework grounded in social movement research that proposes the consideration of participants’ biographical resources, as citizens. The second article is based on a systematic review aiming to inventory the measures of asymmetry evaluation in deliberation studies that have chosen an online device. The results address their definition and use by considering the methodological quality of the studies, the properties of the measures, and the applicability throughout the stages of the deliberative process, from recruitment to evaluation of effects. The third paper involves secondary analysis of data from an empirical study of an online deliberative device for health technology design in Quebec, “Draw me a future!” We examine the occupation of the communication space as well as variations according to the sociodemographic, behavioural and attitudinal characteristics of the participants. We found that sharing communicative space leads to salient participation and interaction, with saliency understood as both a factor and an outcome of participants’ engagement in deliberation. The way in which power is exercised is not unilateral, nor homogenous in the course of the deliberation. In conclusion, making public judgment explicit is a major issue for the promoters of deliberative devices in the health field. Deliberative mechanisms can take into account asymmetries in the process to balance the diversity of resources and capacities available to members of the public. This dissertation contributes to the strengthening of interdisciplinary research to aim at reducing these anomalies in public deliberation through the joint action of public health and deliberative democracy.Resumen: En los últimos treinta años, numerosos dispositivos de participación pública han adoptado métodos deliberativos para informar adecuadamente las políticas de salud pública. Estos métodos pretenden que los diversos puntos de vista se encuentren en un diálogo abierto entre los miembros del público (ciudadanos) y los agentes institucionales (promotores de la deliberación), con el fin de reflexionar colectivamente sobre lo que puede ser un curso de acción razonable. Los ámbitos de aplicación son tan variados como las políticas de salud pública, la planificación organizativa, las guías de práctica clínica, las herramientas de apoyo a la toma de decisiones de los pacientes o la innovación e investigación sanitarias. Siguiendo el enfoque interactivo del compromiso público basado en los principios de las teorías de la democracia deliberativa, estos dispositivos aspiran a respetar tres dimensiones normativas fundamentales: la expresión de opiniones justificadas, la igualdad y el compromiso. La literatura ha discutido con frecuencia las fortalezas y los problemas asociados a estos mecanismos, pero ha prestado menos atención a cómo operacionalizarlos en la práctica deliberativa y cómo medirlos. Efectivamente, las asimetrías entre los miembros de los grupos sociales pueden conducir a la exclusión creando barreras para acceder a las habilidades necesarias, incluyendo la capacidad de movilizar tiempo u otros recursos para la deliberación. A su vez, las exclusiones también pueden conducir a una infravaloración de los puntos de vista y las experiencias de ciertos segmentos de la población, lo que da lugar a una deriva hacia el dominio del espacio de comunicación por parte de unos pocos individuos. Además, la dinámica de dominación en los debates impide que algunos participantes movilicen un conjunto de razones relevantes que pueden sesgar las conclusiones alcanzadas en las deliberaciones. La igualdad de oportunidades para expresarse aumenta, en principio, la probabilidad de que se escuchen diversos puntos de vista. Las reglas que guían los procesos deliberativos también pueden influir en la eficacia de la deliberación y, por extensión, en su legitimidad y relevancia. Los métodos deliberativos en línea, que han sido cada vez más utilizados en los últimos años, parecen ser prometedores para la realización de los ideales democráticos al reducir los requisitos espacio-temporales. Sin embargo, se han realizado pocos estudios para describir la gama de enfoques que se utilizan actualmente para medir las desigualdades en los procesos y sus resultados de la deliberación en línea. Nuestro proyecto de tesis pretende contribuir a estas lagunas en la literatura científica y en la práctica deliberativa. Adoptamos una comprensión amplia de la comunicación en los procesos deliberativos centrando nuestra atención en la deliberación en línea. Es importante entender cómo los individuos participan en diversas interacciones para producir información de forma colectiva, sin dominar el debate. Para ello, movilizamos el constructo de las desigualdades deliberativas y desarrollamos la noción de las asimetrías que subyacen a estas desigualdades. Entre ellas se encuentran las asimetrías de poder, la posición social, la comunicación y la participación política que tanto preexisten como surgen de los mecanismos deliberativos. El objetivo principal de esta tesis es, por tanto, examinar cómo se conceptualizan y se miden las desigualdades deliberativas en los procesos deliberativos públicos e identificar los mecanismos potenciales que podrían modular las asimetrías relevantes de los individuos, del proceso de comunicación del grupo y del diseño del proceso para informar y facilitar la realización de procesos deliberativos de mayor calidad en la salud pública. Nuestra investigación se basa en un diseño exploratorio que sigue un enfoque secuencial con varias metodologías. Los dos primeros artículos están efectuados utilizando estrategias complementarias de revisiones bibliográficas internacionales. El primer artículo consiste en una revisión de alcance (“scoping review”) que sintetiza los estudios empíricos que han examinado los procesos deliberativos movilizando constructos para definir y medir las asimetrías. Este enfoque permite aclarar los conceptos clave e identificar las lagunas en la definición de las asimetrías asociadas a estos procesos. Al diseñar los estudios y analizar los procesos deliberativos, los investigadores se basan en supuestos y presunciones normativas que subyacen a estas medidas. Para analizar los resultados, movilizamos un marco teórico enraizado en la investigación de los movimientos sociales que propone la consideración de los recursos biográficos de los participantes, como ciudadanos. El segundo artículo se basa en una revisión sistemática de las medidas de evaluación de las asimetrías en los estudios deliberativos que han optado por un dispositivo en línea. Los resultados abordan su definición y uso considerando la calidad metodológica de los estudios, las propiedades de las medidas y la aplicabilidad a lo largo de las etapas del proceso de deliberación, desde el reclutamiento hasta la evaluación de los efectos. El tercer artículo consiste en un análisis secundario de los datos de un estudio empírico de un proceso deliberativo en línea para el diseño de tecnologías sanitarias en Quebec, "Dibújame un futuro". Examinamos la ocupación del espacio de comunicación, así como las variaciones según las características sociodemográficas, de comportamiento y de actitud de los participantes. Encontramos que compartir el espacio comunicativo conduce a una participación y una interacción sobresaliente, entendiendo el ser sobresaliente como un factor y un resultado del compromiso de los participantes en la deliberación. La manera en que se ejerce el poder no es unilateral ni homogénea en el curso de la deliberación. Esta tesis contribuye al fortalecimiento de la investigación interdisciplinar entre varios campos del conocimiento como las ciencias políticas, las humanidades digitales y la salud pública, y proporciona elementos teóricos clave esenciales para evaluar las asimetrías en la deliberación. Entre las contribuciones al conocimiento metodológico, nuestros resultados pueden informar el diseño de dispositivos de deliberación pública y sugerir un enfoque analítico para asegurar la consideración de las asimetrías en los procesos de deliberación en línea. En conclusión, explicitar los juicios públicos es un reto importante para los promotores de los dispositivos deliberativos en el ámbito de la salud. Los mecanismos deliberativos pueden tener en cuenta las asimetrías a lo largo del proceso para equilibrar la diversidad de recursos y capacidades de que disponen los miembros del público. La acción conjunta de la salud pública y la democracia deliberativa debería tener como objetivo reducir estas anomalías en la deliberación pública

    Méthode par gabarit à ordre variable pour la prédiction de séries chronologiques financières

    Get PDF
    La prédiction de séries chronologiques exhibant des changements de comportements à travers le temps est un problème fondamental dans les domaines du traitement de signal et de la reconnaissance automatique. Dans la majorité des applications de prédiction de séries chronologiques financières, ajuster proprement la paramétrisation d'un modèle ou d'un modèle d'ensemble est un problème connu pour sa difficulté. Lorsqu'il y a des changements de régime, c.-à-d.: des changements des propriétés statistiques inattendues de ces séries à travers le temps, les modèles actuels ne sont pas capables d'adapter leur paramétrisation et la qualité de leur prédictions se voit dégradée. Cette thèse propose une approche formelle pour aborder ces changements de comportements au moyen d'une automatisation de la capacité de modèles existants a varier dynamiquement leurs structures graphiques et à modéliser plusieurs structures graphiques simultanément. Lorsque cette approche est appliquée à grande échelle, les modèles pouvant changer leurs structures graphiques dynamiquement ont tendance à être plus robustes et permettent de réduire le temps de calcul nécessaire pour produire des modèles d'ensemble sans compromettre leur niveau de précision

    Villages et quartiers à risque d’abandon

    Get PDF
    The issue of villages and neighborhoods at risk of abandonment is a common topic in many Mediterranean regions and is considered as a strategic point of the new European policies. The progressive abandonment of inland areas, with phenomena of emigration and fragmentation of cultural heritage, is a common trend in countries characterized by economic underdevelopment. This leads to the decay of architectural artifacts and buildings and problems with land management. Some aspects of this issue are also found in several urban areas. The goal of this research work is collecting international debates, discussions, opinions and comparisons concerning the analysis, study, surveys, diagnoses and graphical rendering of architectural heritage and landscape as well as demo-ethno-anthropological witnesses, typological-constructive stratifications, materials and technologies of traditional and vernacular constructions of historic buildings
    corecore