575 research outputs found

    Tracking Ensemble Performance on Touch-Screens with Gesture Classification and Transition Matrices

    Get PDF
    We present and evaluate a novel interface for tracking ensemble performances on touch-screens. The system uses a Random Forest classifier to extract touch-screen gestures and transition matrix statistics. It analyses the resulting gesture-state sequences across an ensemble of performers. A series of specially designed iPad apps respond to this real-time analysis of free-form gestural performances with calculated modifications to their musical interfaces. We describe our system and evaluate it through cross-validation and profiling as well as concert experienc

    Music of 18 Performances: Evaluating Apps and Agents with Free Improvisation

    No full text
    We present a study where a small group of experienced iPad musicians evaluated a system of three musical touch-screen apps and two server-based agents over 18 controlled improvisations. The performers’ perspectives were recorded through surveys, interviews, and interaction data. Our agent classifies the touch gestures of the performers and identifies new sections in the improvisations while a control agent returns similar messages sourced from a statistical model. The three touch-screen apps respond according to design paradigms of reward, support, and disruption. In this study of an ongoing musical practice, significant effects were observed due to the apps’ interfaces and how they respond to agent interactions. The “reward” app received the highest ratings. The results were used to iterate the app designs for later performances

    Apps, Agents, and Improvisation: Ensemble Interaction with Touch-Screen Digital Musical Instruments

    No full text
    This thesis concerns the making and performing of music with new digital musical instruments (DMIs) designed for ensemble performance. While computer music has advanced to the point where a huge variety of digital instruments are common in educational, recreational, and professional music-making, these instruments rarely seek to enhance the ensemble context in which they are used. Interaction models that map individual gestures to sound have been previously studied, but the interactions of ensembles within these models are not well understood. In this research, new ensemble-focussed instruments have been designed and deployed in an ongoing artistic practice. These instruments have also been evaluated to find out whether, and if so how, they affect the ensembles and music that is made with them. Throughout this thesis, six ensemble-focussed DMIs are introduced for mobile touch-screen computers. A series of improvised rehearsals and performances leads to the identification of a vocabulary of continuous performative touch-gestures and a system for tracking these collaborative performances in real time using tools from machine learning. The tracking system is posed as an intelligent agent that can continually analyse the gestural states of performers, and trigger a response in the performers' user interfaces at appropriate moments. The hypothesis is that the agent interaction and UI response can enhance improvised performances, allowing performers to better explore creative interactions with each other, produce better music, and have a more enjoyable experience. Two formal studies are described where participants rate their perceptions of improvised performances with a variety of designs for agent-app interaction. The first, with three expert performers, informed refinements for a set of apps. The most successful interface was redesigned and investigated further in a second study with 16 non-expert participants. In the final interface, each performer freely improvised with a limited number of notes; at moments of peak gestural change, the agent presented users with the opportunity to try different notes. This interface is shown to produce performances that are longer, as well as demonstrate improved perceptions of musical structure, group interaction, enjoyment and overall quality. Overall, this research examined ensemble DMI performance in unprecedented scope and detail, with more than 150 interaction sessions recorded. Informed by the results of lab and field studies using quantitative and qualitative methods, four generations of ensemble-focussed interface have been developed and refined. The results of the most recent studies assure us that the intelligent agent interaction does enhance improvised performances

    Data Driven Analysis of Tiny Touchscreen Performance with MicroJam

    Full text link
    The widespread adoption of mobile devices, such as smartphones and tablets, has made touchscreens a common interface for musical performance. New mobile musical instruments have been designed that embrace collaborative creation and that explore the affordances of mobile devices, as well as their constraints. While these have been investigated from design and user experience perspectives, there is little examination of the performers' musical outputs. In this work, we introduce a constrained touchscreen performance app, MicroJam, designed to enable collaboration between performers, and engage in a novel data-driven analysis of more than 1600 performances using the app. MicroJam constrains performances to five seconds, and emphasises frequent and casual music making through a social media-inspired interface. Performers collaborate by replying to performances, adding new musical layers that are played back at the same time. Our analysis shows that users tend to focus on the centre and diagonals of the touchscreen area, and tend to swirl or swipe rather than tap. We also observe that while long swipes dominate the visual appearance of performances, the majority of interactions are short with limited expressive possibilities. Our findings are summarised into a set of design recommendations for MicroJam and other touchscreen apps for social musical interaction

    Context-aware gestural interaction in the smart environments of the ubiquitous computing era

    Get PDF
    A thesis submitted to the University of Bedfordshire in partial fulfilment of the requirements for the degree of Doctor of PhilosophyTechnology is becoming pervasive and the current interfaces are not adequate for the interaction with the smart environments of the ubiquitous computing era. Recently, researchers have started to address this issue introducing the concept of natural user interface, which is mainly based on gestural interactions. Many issues are still open in this emerging domain and, in particular, there is a lack of common guidelines for coherent implementation of gestural interfaces. This research investigates gestural interactions between humans and smart environments. It proposes a novel framework for the high-level organization of the context information. The framework is conceived to provide the support for a novel approach using functional gestures to reduce the gesture ambiguity and the number of gestures in taxonomies and improve the usability. In order to validate this framework, a proof-of-concept has been developed. A prototype has been developed by implementing a novel method for the view-invariant recognition of deictic and dynamic gestures. Tests have been conducted to assess the gesture recognition accuracy and the usability of the interfaces developed following the proposed framework. The results show that the method provides optimal gesture recognition from very different view-points whilst the usability tests have yielded high scores. Further investigation on the context information has been performed tackling the problem of user status. It is intended as human activity and a technique based on an innovative application of electromyography is proposed. The tests show that the proposed technique has achieved good activity recognition accuracy. The context is treated also as system status. In ubiquitous computing, the system can adopt different paradigms: wearable, environmental and pervasive. A novel paradigm, called synergistic paradigm, is presented combining the advantages of the wearable and environmental paradigms. Moreover, it augments the interaction possibilities of the user and ensures better gesture recognition accuracy than with the other paradigms

    A Survey of Applications and Human Motion Recognition with Microsoft Kinect

    Get PDF
    Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation

    To Draw or Not to Draw: Recognizing Stroke-Hover Intent in Gesture-Free Bare-Hand Mid-Air Drawing Tasks

    Get PDF
    Over the past several decades, technological advancements have introduced new modes of communication with the computers, introducing a shift from traditional mouse and keyboard interfaces. While touch based interactions are abundantly being used today, latest developments in computer vision, body tracking stereo cameras, and augmented and virtual reality have now enabled communicating with the computers using spatial input in the physical 3D space. These techniques are now being integrated into several design critical tasks like sketching, modeling, etc. through sophisticated methodologies and use of specialized instrumented devices. One of the prime challenges in design research is to make this spatial interaction with the computer as intuitive as possible for the users. Drawing curves in mid-air with fingers, is a fundamental task with applications to 3D sketching, geometric modeling, handwriting recognition, and authentication. Sketching in general, is a crucial mode for effective idea communication between designers. Mid-air curve input is typically accomplished through instrumented controllers, specific hand postures, or pre-defined hand gestures, in presence of depth and motion sensing cameras. The user may use any of these modalities to express the intention to start or stop sketching. However, apart from suffering with issues like lack of robustness, the use of such gestures, specific postures, or the necessity of instrumented controllers for design specific tasks further result in an additional cognitive load on the user. To address the problems associated with different mid-air curve input modalities, the presented research discusses the design, development, and evaluation of data driven models for intent recognition in non-instrumented, gesture-free, bare-hand mid-air drawing tasks. The research is motivated by a behavioral study that demonstrates the need for such an approach due to the lack of robustness and intuitiveness while using hand postures and instrumented devices. The main objective is to study how users move during mid-air sketching, develop qualitative insights regarding such movements, and consequently implement a computational approach to determine when the user intends to draw in mid-air without the use of an explicit mechanism (such as an instrumented controller or a specified hand-posture). By recording the user’s hand trajectory, the idea is to simply classify this point as either hover or stroke. The resulting model allows for the classification of points on the user’s spatial trajectory. Drawing inspiration from the way users sketch in mid-air, this research first specifies the necessity for an alternate approach for processing bare hand mid-air curves in a continuous fashion. Further, this research presents a novel drawing intent recognition work flow for every recorded drawing point, using three different approaches. We begin with recording mid-air drawing data and developing a classification model based on the extracted geometric properties of the recorded data. The main goal behind developing this model is to identify drawing intent from critical geometric and temporal features. In the second approach, we explore the variations in prediction quality of the model by improving the dimensionality of data used as mid-air curve input. Finally, in the third approach, we seek to understand the drawing intention from mid-air curves using sophisticated dimensionality reduction neural networks such as autoencoders. Finally, the broad level implications of this research are discussed, with potential development areas in the design and research of mid-air interactions

    Enabling Context-Awareness in Mobile Systems via Multi-Modal Sensing

    Get PDF
    <p>The inclusion of rich sensors on modern smartphones has changed mobile phones from simple communication devices to powerful human-centric sensing platforms. Similar trends are influencing other personal gadgets such as the tablets, cameras, and wearable devices like the Google glass. Together, these sensors can provide</p><p>a high-resolution view of the user's context, ranging from simple information like locations and activities, to high-level inferences about the users' intention, behavior, and social interactions. Understanding such context can help solving existing system-side</p><p>challenges and eventually enable a new world of real-life applications. </p><p>In this thesis, we propose to learn human behavior via multi-modal sensing. The intuition is that human behaviors leave footprints on different sensing dimensions - visual, acoustic, motion and in cyber space. By collaboratively analyzing these footprints, the system can obtain valuable insights about the user. We show that the</p><p>analysis results can lead to a series of applications including capturing life-logging videos, tagging user-generated photos and enabling new ways for human-object interactions. Moreover, the same intuition may potentially be applied to enhance existing</p><p>system-side functionalities - offloading, prefetching and compression.</p>Dissertatio

    Sensing, interpreting, and anticipating human social behaviour in the real world

    Get PDF
    Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie
    • …
    corecore