9 research outputs found

    En studie om hur framtida landskapsarkitektur kan pÄverkas av Artificiell Intelligens

    Get PDF
    Det hÀr arbetet tar upp hur Artifciell Intelligens med en större sannolikhet kommer att pÄverka arkitektyrkena. Arbetet tar upp Artifciell Intelligens i form av Machine Learning, Neural Networks och GANs tillsammans med omrÄdena regression, klassifcering och kreativ AI. Genom litterÀra studier och kodning har teorier, statistik och praktisk kunskap sammanfogats för att hantera frÄgan. Slutsatsen lyder att Artifciell Intelligens kommer till största delen att pÄverka hur stÀder planeras, men kommer Àven pÄverka arkitektens arbete genom att ge möjligheten till att generera bÄde bilder, stadsplanering och arkitektur.Tis work addresses how Artifcial Intelligence, with a greater probability, will affect the architectural professions. Te work addresses Artifcial Intelligence in the form of Machine Learning, Neural Networks and GANs divided into the areas of regression, classifcation and creative AI. Trough literary studies and coding, theories, statistics, and practical knowledge have been combined to deal with the issue. Te conclusion is that Artifcial Intelligence will largely infuence how cities are planned, but will also affect the architect's work by providing the opportunity to generate images, urban planning, and architecture

    Emotional body language synthesis for humanoid robots

    Get PDF
    Some of the chapters of this thesis are based on research published by the author. Chapter 4 is based on Marmpena M., Lim, A., and Dahl, T. S. (2018). How does the robot feel? Perception of valence and arousal in emotional body language. Paladyn, Journal of Behavioral Robotics, 9(1), 168-182. DOI: https://doi.org/10.1515/pjbr-2018-0012. Chapter 6 is based on Marmpena M., Lim, A., Dahl, T. S., and Hemion, N. (2019). Generating robotic emotional body language with Variational Autoencoders. In Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction (ACII), pages 545–551. DOI:10.1109/ACII.2019.8925459. Chapter 7 extends Marmpena M., Garcia, F., and Lim, A. (2020). Generating robotic emotional body language of targeted valence and arousal with Conditional Variational Autoencoders. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’20, page 357–359. DOI: https://doi.org/10.1145/3371382.3378360. The designed or generated robotic emotional body language expressions data presented in this thesis are publicly available: https://github.com/minamar/rebl-pepper-dataIn the next decade, societies will witness a rise in service robots deployed in social environments, such as schools, homes, or shops, where they will operate as assistants, public relation agents, or companions. People are expected to willingly engage and collaborate with these robots to accomplish positive outcomes. To facilitate collaboration, robots need to comply with the behavioural and social norms used by humans in their daily interactions. One such behavioural norm is the expression of emotion through body language. Previous work on emotional body language synthesis for humanoid robots has been mainly focused on hand-coded design methods, often employing features extracted from human body language. However, the hand-coded design is cumbersome and results in a limited number of expressions with low variability. This limitation can be at the expense of user engagement since the robotic behaviours will appear repetitive and predictable, especially in long-term interaction. Furthermore, design approaches strictly based on human emotional body language might not transfer effectively on robots because of their simpler morphology. Finally, most previous work is using six or fewer basic emotion categories in the design and the evaluation phase of emotional expressions. This approach might result in lossy compression of the granularity in emotion expression. The current thesis presents a methodology for developing a complete framework of emotional body language generation for a humanoid robot, intending to address these three limitations. Our starting point is a small set of animations designed by professional animators with the robot morphology in mind. We conducted an initial user study to acquire reliable dimensional labels of valence and arousal for each animation. In the next step, we used the motion sequences from these animations to train a Variational Autoencoder, a deep learning model, to generate numerous new animations in an unsupervised setting. Finally, we extended the model to condition the generative process with valence and arousal attributes, and we conducted a user study to evaluate the interpretability of the animations in terms of valence, arousal, and dominance. The results indicate moderate to strong interpretability

    Real-time generation and adaptation of social companion robot behaviors

    Get PDF
    Social robots will be part of our future homes. They will assist us in everyday tasks, entertain us, and provide helpful advice. However, the technology still faces challenges that must be overcome to equip the machine with social competencies and make it a socially intelligent and accepted housemate. An essential skill of every social robot is verbal and non-verbal communication. In contrast to voice assistants, smartphones, and smart home technology, which are already part of many people's lives today, social robots have an embodiment that raises expectations towards the machine. Their anthropomorphic or zoomorphic appearance suggests they can communicate naturally with speech, gestures, or facial expressions and understand corresponding human behaviors. In addition, robots also need to consider individual users' preferences: everybody is shaped by their culture, social norms, and life experiences, resulting in different expectations towards communication with a robot. However, robots do not have human intuition - they must be equipped with the corresponding algorithmic solutions to these problems. This thesis investigates the use of reinforcement learning to adapt the robot's verbal and non-verbal communication to the user's needs and preferences. Such non-functional adaptation of the robot's behaviors primarily aims to improve the user experience and the robot's perceived social intelligence. The literature has not yet provided a holistic view of the overall challenge: real-time adaptation requires control over the robot's multimodal behavior generation, an understanding of human feedback, and an algorithmic basis for machine learning. Thus, this thesis develops a conceptual framework for designing real-time non-functional social robot behavior adaptation with reinforcement learning. It provides a higher-level view from the system designer's perspective and guidance from the start to the end. It illustrates the process of modeling, simulating, and evaluating such adaptation processes. Specifically, it guides the integration of human feedback and social signals to equip the machine with social awareness. The conceptual framework is put into practice for several use cases, resulting in technical proofs of concept and research prototypes. They are evaluated in the lab and in in-situ studies. These approaches address typical activities in domestic environments, focussing on the robot's expression of personality, persona, politeness, and humor. Within this scope, the robot adapts its spoken utterances, prosody, and animations based on human explicit or implicit feedback.Soziale Roboter werden Teil unseres zukĂŒnftigen Zuhauses sein. Sie werden uns bei alltĂ€glichen Aufgaben unterstĂŒtzen, uns unterhalten und uns mit hilfreichen RatschlĂ€gen versorgen. Noch gibt es allerdings technische Herausforderungen, die zunĂ€chst ĂŒberwunden werden mĂŒssen, um die Maschine mit sozialen Kompetenzen auszustatten und zu einem sozial intelligenten und akzeptierten Mitbewohner zu machen. Eine wesentliche FĂ€higkeit eines jeden sozialen Roboters ist die verbale und nonverbale Kommunikation. Im Gegensatz zu Sprachassistenten, Smartphones und Smart-Home-Technologien, die bereits heute Teil des Lebens vieler Menschen sind, haben soziale Roboter eine Verkörperung, die Erwartungen an die Maschine weckt. Ihr anthropomorphes oder zoomorphes Aussehen legt nahe, dass sie in der Lage sind, auf natĂŒrliche Weise mit Sprache, Gestik oder Mimik zu kommunizieren, aber auch entsprechende menschliche Kommunikation zu verstehen. DarĂŒber hinaus mĂŒssen Roboter auch die individuellen Vorlieben der Benutzer berĂŒcksichtigen. So ist jeder Mensch von seiner Kultur, sozialen Normen und eigenen Lebenserfahrungen geprĂ€gt, was zu unterschiedlichen Erwartungen an die Kommunikation mit einem Roboter fĂŒhrt. Roboter haben jedoch keine menschliche Intuition - sie mĂŒssen mit entsprechenden Algorithmen fĂŒr diese Probleme ausgestattet werden. In dieser Arbeit wird der Einsatz von bestĂ€rkendem Lernen untersucht, um die verbale und nonverbale Kommunikation des Roboters an die BedĂŒrfnisse und Vorlieben des Benutzers anzupassen. Eine solche nicht-funktionale Anpassung des Roboterverhaltens zielt in erster Linie darauf ab, das Benutzererlebnis und die wahrgenommene soziale Intelligenz des Roboters zu verbessern. Die Literatur bietet bisher keine ganzheitliche Sicht auf diese Herausforderung: Echtzeitanpassung erfordert die Kontrolle ĂŒber die multimodale Verhaltenserzeugung des Roboters, ein VerstĂ€ndnis des menschlichen Feedbacks und eine algorithmische Basis fĂŒr maschinelles Lernen. Daher wird in dieser Arbeit ein konzeptioneller Rahmen fĂŒr die Gestaltung von nicht-funktionaler Anpassung der Kommunikation sozialer Roboter mit bestĂ€rkendem Lernen entwickelt. Er bietet eine ĂŒbergeordnete Sichtweise aus der Perspektive des Systemdesigners und eine Anleitung vom Anfang bis zum Ende. Er veranschaulicht den Prozess der Modellierung, Simulation und Evaluierung solcher Anpassungsprozesse. Insbesondere wird auf die Integration von menschlichem Feedback und sozialen Signalen eingegangen, um die Maschine mit sozialem Bewusstsein auszustatten. Der konzeptionelle Rahmen wird fĂŒr mehrere AnwendungsfĂ€lle in die Praxis umgesetzt, was zu technischen Konzeptnachweisen und Forschungsprototypen fĂŒhrt, die in Labor- und In-situ-Studien evaluiert werden. Diese AnsĂ€tze befassen sich mit typischen AktivitĂ€ten in hĂ€uslichen Umgebungen, wobei der Schwerpunkt auf dem Ausdruck der Persönlichkeit, dem Persona, der Höflichkeit und dem Humor des Roboters liegt. In diesem Rahmen passt der Roboter seine Sprache, Prosodie, und Animationen auf Basis expliziten oder impliziten menschlichen Feedbacks an

    An HCI-Centric Survey and Taxonomy of Human-Generative-AI Interactions

    Full text link
    Generative AI (GenAI) has shown remarkable capabilities in generating diverse and realistic content across different formats like images, videos, and text. In Generative AI, human involvement is essential, thus HCI literature has investigated how to effectively create collaborations between humans and GenAI systems. However, the current literature lacks a comprehensive framework to better understand Human-GenAI Interactions, as the holistic aspects of human-centered GenAI systems are rarely analyzed systematically. In this paper, we present a survey of 291 papers, providing a novel taxonomy and analysis of Human-GenAI Interactions from both human and Gen-AI perspectives. The dimensions of design space include 1) Purposes of Using Generative AI, 2) Feedback from Models to Users, 3) Control from Users to Models, 4) Levels of Engagement, 5) Application Domains, and 6) Evaluation Strategies. Our work is also timely at the current development stage of GenAI, where the Human-GenAI interaction design is of paramount importance. We also highlight challenges and opportunities to guide the design of Gen-AI systems and interactions towards the future design of human-centered Generative AI applications

    Proceedings of the 9th Arab Society for Computer Aided Architectural Design (ASCAAD) international conference 2021 (ASCAAD 2021): architecture in the age of disruptive technologies: transformation and challenges.

    Get PDF
    The ASCAAD 2021 conference theme is Architecture in the age of disruptive technologies: transformation and challenges. The theme addresses the gradual shift in computational design from prototypical morphogenetic-centered associations in the architectural discourse. This imminent shift of focus is increasingly stirring a debate in the architectural community and is provoking a much needed critical questioning of the role of computation in architecture as a sole embodiment and enactment of technical dimensions, into one that rather deliberately pursues and embraces the humanities as an ultimate aspiration

    Constrained Affective Computing

    Get PDF

    Generation of realistic human behaviour

    Get PDF
    As the use of computers and robots in our everyday lives increases so does the need for better interaction with these devices. Human-computer interaction relies on the ability to understand and generate human behavioural signals such as speech, facial expressions and motion. This thesis deals with the synthesis and evaluation of such signals, focusing not only on their intelligibility but also on their realism. Since these signals are often correlated, it is common for methods to drive the generation of one signal using another. The thesis begins by tackling the problem of speech-driven facial animation and proposing models capable of producing realistic animations from a single image and an audio clip. The goal of these models is to produce a video of a target person, whose lips move in accordance with the driving audio. Particular focus is also placed on a) generating spontaneous expression such as blinks, b) achieving audio-visual synchrony and c) transferring or producing natural head motion. The second problem addressed in this thesis is that of video-driven speech reconstruction, which aims at converting a silent video into waveforms containing speech. The method proposed for solving this problem is capable of generating intelligible and accurate speech for both seen and unseen speakers. The spoken content is correctly captured thanks to a perceptual loss, which uses features from pre-trained speech-driven animation models. The ability of the video-to-speech model to run in real-time allows its use in hearing assistive devices and telecommunications. The final work proposed in this thesis is a generic domain translation system, that can be used for any translation problem including those mapping across different modalities. The framework is made up of two networks performing translations in opposite directions and can be successfully applied to solve diverse sets of translation problems, including speech-driven animation and video-driven speech reconstruction.Open Acces

    Spatio-Temporal Multimedia Big Data Analytics Using Deep Neural Networks

    Get PDF
    With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era, where new opportunities and challenges appear with the high diversity multimedia data together with the huge amount of social data. Nowadays, multimedia data consisting of audio, text, image, and video has grown tremendously. With such an increase in the amount of multimedia data, the main question raised is how one can analyze this high volume and variety of data in an efficient and effective way. A vast amount of research work has been done in the multimedia area, targeting different aspects of big data analytics, such as the capture, storage, indexing, mining, and retrieval of multimedia big data. However, there is insufficient research that provides a comprehensive framework for multimedia big data analytics and management. To address the major challenges in this area, a new framework is proposed based on deep neural networks for multimedia semantic concept detection with a focus on spatio-temporal information analysis and rare event detection. The proposed framework is able to discover the pattern and knowledge of multimedia data using both static deep data representation and temporal semantics. Specifically, it is designed to handle data with skewed distributions. The proposed framework includes the following components: (1) a synthetic data generation component based on simulation and adversarial networks for data augmentation and deep learning training, (2) an automatic sampling model to overcome the imbalanced data issue in multimedia data, (3) a deep representation learning model leveraging novel deep learning techniques to generate the most discriminative static features from multimedia data, (4) an automatic hyper-parameter learning component for faster training and convergence of the learning models, (5) a spatio-temporal deep learning model to analyze dynamic features from multimedia data, and finally (6) a multimodal deep learning fusion model to integrate different data modalities. The whole framework has been evaluated using various large-scale multimedia datasets that include the newly collected disaster-events video dataset and other public datasets
    corecore