175 research outputs found

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion

    Full text link
    Speech-driven 3D facial animation synthesis has been a challenging task both in industry and research. Recent methods mostly focus on deterministic deep learning methods meaning that given a speech input, the output is always the same. However, in reality, the non-verbal facial cues that reside throughout the face are non-deterministic in nature. In addition, majority of the approaches focus on 3D vertex based datasets and methods that are compatible with existing facial animation pipelines with rigged characters is scarce. To eliminate these issues, we present FaceDiffuser, a non-deterministic deep learning model to generate speech-driven facial animations that is trained with both 3D vertex and blendshape based datasets. Our method is based on the diffusion technique and uses the pre-trained large speech representation model HuBERT to encode the audio input. To the best of our knowledge, we are the first to employ the diffusion method for the task of speech-driven 3D facial animation synthesis. We have run extensive objective and subjective analyses and show that our approach achieves better or comparable results in comparison to the state-of-the-art methods. We also introduce a new in-house dataset that is based on a blendshape based rigged character. We recommend watching the accompanying supplementary video. The code and the dataset will be publicly available.Comment: Pre-print of the paper accepted at ACM SIGGRAPH MIG 202

    Moving Through Experience: Disruption, Emergence, and the Aesthetic of Repose

    Get PDF
    Contemporary aesthetic philosophy engages the notion of aesthetic experience from two conflicting lenses; on one hand are those who support a connection between the aesthetic and political while the other favors a more pragmatic position. An area of aesthetic engagement not yet explored inhabits an intermediary between these opposing poles, a modality of aesthetic experience I term, the aesthetic of repose. This dissertation traces the evolution of ideas regarding aesthetic experience through a survey of several philosophers whose varied perspectives form the foundation for my inquiry. Beginning with an exploration of Immanuel Kant’s Critique of Judgement, proceeding through Friedrich Nietzsche’s Birth of Tragedy, and progressing to John Dewey’s Art as Experience, my aim is first, to situate their individual aesthetic philosophies within the context of 21st century aesthetic experience. Despite their differing viewpoints, these thinkers share in common; 1) the importance of sense and sensation to valuable aesthetic experience and 2) a desire to find value and meaning in aesthetic experience for overcoming the ills of humanity and advancing culture. Secondly, this dissertation examines a polarity of ideas that challenge the notion of authentic aesthetic experience in our times. Similar to their predecessors, contemporary aesthetic philosophers desire to make aesthetic experience a portal for humanity’s recuperation. There are thinkers such as Jacques Ranciére and Santiago Zabala, who advance an aesthetics of action; others, like Richard Shusterman and Hans Ulrich Gumbrecht advocate for an aesthetics of presence. The aesthetic of repose rests uniquely between action and presence, as an area of slumber, where neither action nor presence is necessary. Rather, the idea is to remain in repose, linger there, where repositioning occurs naturally, as though without perception. One emerges from this seemingly imperceptible experience, having done nothing save moving through it, yet being forever changed by it.https://digitalmaine.com/academic/1046/thumbnail.jp

    Multimodal Assessment of Cognitive Decline: Applications in Alzheimer’s Disease and Depression

    Get PDF
    The initial diagnosis and assessment of cognitive decline are generally based around the judgement of clinicians, and commonly used semi-structured interviews, guided by pre-determined sets of topics, in a clinical set-up. Publicly available multimodal datasets have provided an opportunity to explore a range of experiments in the automatic detecting of cognitive decline. Drawing on the latest developments in representation learning, machine learning, and natural language processing, we seek to develop models capable of identifying cognitive decline with an eye to discovering the differences and commonalities that should be considered in computational treatment of mental health disorders. We present models that learn the indicators of cognitive decline from audio and visual modalities as well as lexical, syntactic, disfluency and pause information. Our study is carried out in two parts: moderation analysis and predictive modelling. We do some experiments with different fusion techniques. Our approaches are motivated by some of the recent efforts in multimodal fusion for classifying cognitive states to capture the interaction between modalities and maximise the use and combination of each modality. We create tools for detecting cognitive decline and use them to analyze three major datasets containing speech produced by people with and without cognitive decline. These findings are being used to develop multimodal models for the detection of depression and Alzheimer’s dementia

    Towards Video Transformers for Automatic Human Analysis

    Full text link
    [eng] With the aim of creating artificial systems capable of mirroring the nuanced understanding and interpretative powers inherent to human cognition, this thesis embarks on an exploration of the intersection between human analysis and Video Transformers. The objective is to harness the potential of Transformers, a promising architectural paradigm, to comprehend the intricacies of human interaction, thus paving the way for the development of empathetic and context-aware intelligent systems. In order to do so, we explore the whole Computer Vision pipeline, from data gathering, to deeply analyzing recent developments, through model design and experimentation. Central to this study is the creation of UDIVA, an expansive multi-modal, multi-view dataset capturing dyadic face-to-face human interactions. Comprising 147 participants across 188 sessions, UDIVA integrates audio-visual recordings, heart-rate measurements, personality assessments, socio- demographic metadata, and conversational transcripts, establishing itself as the largest dataset for dyadic human interaction analysis up to this date. This dataset provides a rich context for probing the capabilities of Transformers within complex environments. In order to validate its utility, as well as to elucidate Transformers' ability to assimilate diverse contextual cues, we focus on addressing the challenge of personality regression within interaction scenarios. We first adapt an existing Video Transformer to handle multiple contextual sources and conduct rigorous experimentation. We empirically observe a progressive enhancement in model performance as more context is added, reinforcing the potential of Transformers to decode intricate human dynamics. Building upon these findings, the Dyadformer emerges as a novel architecture, adept at long-range modeling of dyadic interactions. By jointly modeling both participants in the interaction, as well as embedding multi- modal integration into the model itself, the Dyadformer surpasses the baseline and other concurrent approaches, underscoring Transformers' aptitude in deciphering multifaceted, noisy, and challenging tasks such as the analysis of human personality in interaction. Nonetheless, these experiments unveil the ubiquitous challenges when training Transformers, particularly in managing overfitting due to their demand for extensive datasets. Consequently, we conclude this thesis with a comprehensive investigation into Video Transformers, analyzing topics ranging from architectural designs and training strategies, to input embedding and tokenization, traversing through multi-modality and specific applications. Across these, we highlight trends which optimally harness spatio-temporal representations that handle video redundancy and high dimensionality. A culminating performance comparison is conducted in the realm of video action classification, spotlighting strategies that exhibit superior efficacy, even compared to traditional CNN-based methods.[cat] Aquesta tesi busca crear sistemes artificials que reflecteixin les habilitats de comprensió i interpretació humanes a través de l'ús de Transformers per a vídeo. L'objectiu és utilitzar aquestes arquitectures per comprendre millor la interacció humana i desenvolupar sistemes intel·ligents i conscients de l'entorn. Això implica explorar àmplies àrees de la Visió per Computador, des de la recopilació de dades fins a l'anàlisi de l'estat de l'art i la prova experimental d'aquests models. Una part essencial d'aquest estudi és la creació d'UDIVA, un ampli conjunt de dades multimodal i multivista que enregistra interaccions humanes cara a cara. Amb 147 participants i 188 sessions, UDIVA inclou contingut audiovisual, freqüència cardíaca, perfils de personalitat, dades sociodemogràfiques i transcripcions de les converses. És el conjunt de dades més gran conegut per a l'anàlisi de la interacció humana diàdica i proporciona un context ric per a l'estudi de les capacitats dels Transformers en entorns complexos. Per tal de validar la seva utilitat i les habilitats dels Transformers, ens centrem en la regressió de la personalitat. Inicialment, adaptem un Transformer de vídeo per integrar diverses fonts de context. Mitjançant experiments exhaustius, observem millores progressives en els resultats amb la inclusió de més context, confirmant la capacitat dels Transformers. Motivats per aquests resultats, desenvolupem el Dyadformer, una arquitectura per interaccions diàdiques de llarga duració. Aquesta nova arquitectura considera simultàniament els dos participants en la interacció i incorpora la multimodalitat en un sol model. El Dyadformer supera la nostra proposta inicial i altres treballs similars, destacant la capacitat dels Transformers per abordar tasques complexes. No obstant això, aquestos experiments revelen reptes d'entrenament dels Transformers, com el sobreajustament, per la seva necessitat de grans conjunts de dades. La tesi conclou amb una anàlisi profunda dels Transformers per a vídeo, incloent dissenys arquitectònics, estratègies d'entrenament, preprocessament de vídeos, tokenització i multimodalitat. S'identifiquen tendències per gestionar la redundància i alta dimensionalitat de vídeos i es realitza una comparació de rendiment en la classificació d'accions a vídeo, destacant estratègies d'eficàcia superior als mètodes tradicionals basats en convolucions

    International Academic Symposium of Social Science 2022

    Get PDF
    This conference proceedings gathers work and research presented at the International Academic Symposium of Social Science 2022 (IASSC2022) held on July 3, 2022, in Kota Bharu, Kelantan, Malaysia. The conference was jointly organized by the Faculty of Information Management of Universiti Teknologi MARA Kelantan Branch, Malaysia; University of Malaya, Malaysia; Universitas Pembangunan Nasional Veteran Jakarta, Indonesia; Universitas Ngudi Waluyo, Indonesia; Camarines Sur Polytechnic Colleges, Philippines; and UCSI University, Malaysia. Featuring experienced keynote speakers from Malaysia, Australia, and England, this proceeding provides an opportunity for researchers, postgraduate students, and industry practitioners to gain knowledge and understanding of advanced topics concerning digital transformations in the perspective of the social sciences and information systems, focusing on issues, challenges, impacts, and theoretical foundations. This conference proceedings will assist in shaping the future of the academy and industry by compiling state-of-the-art works and future trends in the digital transformation of the social sciences and the field of information systems. It is also considered an interactive platform that enables academicians, practitioners and students from various institutions and industries to collaborate

    Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation

    Full text link
    Audio-driven talking-head synthesis is a popular research topic for virtual human-related applications. However, the inflexibility and inefficiency of existing methods, which necessitate expensive end-to-end training to transfer emotions from guidance videos to talking-head predictions, are significant limitations. In this work, we propose the Emotional Adaptation for Audio-driven Talking-head (EAT) method, which transforms emotion-agnostic talking-head models into emotion-controllable ones in a cost-effective and efficient manner through parameter-efficient adaptations. Our approach utilizes a pretrained emotion-agnostic talking-head transformer and introduces three lightweight adaptations (the Deep Emotional Prompts, Emotional Deformation Network, and Emotional Adaptation Module) from different perspectives to enable precise and realistic emotion controls. Our experiments demonstrate that our approach achieves state-of-the-art performance on widely-used benchmarks, including LRW and MEAD. Additionally, our parameter-efficient adaptations exhibit remarkable generalization ability, even in scenarios where emotional training videos are scarce or nonexistent. Project website: https://yuangan.github.io/eat/Comment: Accepted to ICCV 2023. Project page: https://yuangan.github.io/eat

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Creating an innovative, online resource to support teachers in integrating digital literacy skills into the Junior Cycle English Curriculum

    Get PDF
    The proliferation of digital tools into all facets of society in recent years has had no small impact on the field of post-primary education. The Irish government’s Digital Strategy for Schools (DSS) (2022, 2015) emphasises the importance of seamlessly integrating digital literacy skills into subject curricula. However, this is a new frontier for Irish post-primary teachers. As a practising teacher I identified a need for teachers to be supported in embedding digital literacy skills into their teaching practice through the provision of relevant, practical and quality resources that met the learning objectives of the subject and through access to continuous professional development (CPD) in this area. Through this Ph.D. research I created an innovative, online resource to assist post-primary English teachers in integrating digital literacy skills into the Junior Cycle curriculum. Taking a methodologically inventive approach (Dadds and Hart, 2001) I used an Educational Entrepreneurial Approach to Action Research (Crotty, 2014) to explore my passions, skills, values, work culture and the literature around digital literacy, digital inequality and digital natives. This exploratory process led me to a greater understanding of issues of inequality in my own work practice as a teacher in a DEIS school; namely that students weren’t as digitally literate as we might assume and that teachers, whose digital literacy also exists on a spectrum, may not have the time, money or motivation to upskill. I worked with students to create an animated, digital documentary and drew on these experiences to create an innovative, online, curriculum for Junior Cycle English teachers with an accompanying online, asynchronous CPD course. This dissertation presents a detailed explanation of this collaborative creative process and draws out the variety of media used to create online digital resources as a means of creating a pluralistic representation of the process. In line with Crotty’s (2014) EEA the creation of these digital resources proved transformative to me personally, to my school’s digital culture and continues to impact the wider education sphere
    corecore