Search CORE

70 research outputs found

ZATLAB : recognizing gestures for artistic performance interaction

Author: Baltazar André Miguel Passos
Publication venue
Publication date: 01/01/2014
Field of study

Most artistic performances rely on human gestures, ultimately resulting in an elaborate interaction between the performer and the audience. Humans, even without any kind of formal analysis background in music, dance or gesture are typically able to extract, almost unconsciously, a great amount of relevant information from a gesture. In fact, a gesture contains so much information, why not use it to further enhance a performance? Gestures and expressive communication are intrinsically connected, and being intimately attached to our own daily existence, both have a central position in our (nowadays) technological society. However, the use of technology to understand gestures is still somehow vaguely explored, it has moved beyond its first steps but the way towards systems fully capable of analyzing gestures is still long and difficult (Volpe, 2005). Probably because, if on one hand, the recognition of gestures is somehow a trivial task for humans, on the other hand, the endeavor of translating gestures to the virtual world, with a digital encoding is a difficult and illdefined task. It is necessary to somehow bridge this gap, stimulating a constructive interaction between gestures and technology, culture and science, performance and communication. Opening thus, new and unexplored frontiers in the design of a novel generation of multimodal interactive systems. This work proposes an interactive, real time, gesture recognition framework called the Zatlab System (ZtS). This framework is flexible and extensible. Thus, it is in permanent evolution, keeping up with the different technologies and algorithms that emerge at a fast pace nowadays. The basis of the proposed approach is to partition a temporal stream of captured movement into perceptually motivated descriptive features and transmit them for further processing in Machine Learning algorithms. The framework described will take the view that perception primarily depends on the previous knowledge or learning. Just like humans do, the framework will have to learn gestures and their main features so that later it can identify them. It is however planned to be flexible enough to allow learning gestures on the fly. This dissertation also presents a qualitative and quantitative experimental validation of the framework. The qualitative analysis provides the results concerning the users acceptability of the framework. The quantitative validation provides the results about the gesture recognizing algorithms. The use of Machine Learning algorithms in these tasks allows the achievement of final results that compare or outperform typical and state-of-the-art systems. In addition, there are also presented two artistic implementations of the framework, thus assessing its usability amongst the artistic performance domain. Although a specific implementation of the proposed framework is presented in this dissertation and made available as open source software, the proposed approach is flexible enough to be used in other case scenarios, paving the way to applications that can benefit not only the performative arts domain, but also, probably in the near future, helping other types of communication, such as the gestural sign language for the hearing impaired.Grande parte das apresentações artísticas são baseadas em gestos humanos, ultimamente resultando numa intricada interação entre o performer e o público. Os seres humanos, mesmo sem qualquer tipo de formação em música, dança ou gesto são capazes de extrair, quase inconscientemente, uma grande quantidade de informações relevantes a partir de um gesto. Na verdade, um gesto contém imensa informação, porque não usá-la para enriquecer ainda mais uma performance? Os gestos e a comunicação expressiva estão intrinsecamente ligados e estando ambos intimamente ligados à nossa própria existência quotidiana, têm uma posicão central nesta sociedade tecnológica actual. No entanto, o uso da tecnologia para entender o gesto está ainda, de alguma forma, vagamente explorado. Existem já alguns desenvolvimentos, mas o objetivo de sistemas totalmente capazes de analisar os gestos ainda está longe (Volpe, 2005). Provavelmente porque, se por um lado, o reconhecimento de gestos é de certo modo uma tarefa trivial para os seres humanos, por outro lado, o esforço de traduzir os gestos para o mundo virtual, com uma codificação digital é uma tarefa difícil e ainda mal definida. É necessário preencher esta lacuna de alguma forma, estimulando uma interação construtiva entre gestos e tecnologia, cultura e ciência, desempenho e comunicação. Abrindo assim, novas e inexploradas fronteiras na concepção de uma nova geração de sistemas interativos multimodais . Este trabalho propõe uma framework interativa de reconhecimento de gestos, em tempo real, chamada Sistema Zatlab (ZtS). Esta framework é flexível e extensível. Assim, está em permanente evolução, mantendo-se a par das diferentes tecnologias e algoritmos que surgem num ritmo acelerado hoje em dia. A abordagem proposta baseia-se em dividir a sequência temporal do movimento humano nas suas características descritivas e transmiti-las para posterior processamento, em algoritmos de Machine Learning. A framework descrita baseia-se no facto de que a percepção depende, principalmente, do conhecimento ou aprendizagem prévia. Assim, tal como os humanos, a framework terá que aprender os gestos e as suas principais características para que depois possa identificá-los. No entanto, esta está prevista para ser flexível o suficiente de forma a permitir a aprendizagem de gestos de forma dinâmica. Esta dissertação apresenta também uma validação experimental qualitativa e quantitativa da framework. A análise qualitativa fornece os resultados referentes à aceitabilidade da framework. A validação quantitativa fornece os resultados sobre os algoritmos de reconhecimento de gestos. O uso de algoritmos de Machine Learning no reconhecimento de gestos, permite a obtençãoc¸ ˜ao de resultados finais que s˜ao comparaveis ou superam outras implementac¸ ˜oes do mesmo g´enero. Al ´em disso, s˜ao tamb´em apresentadas duas implementac¸ ˜oes art´ısticas da framework, avaliando assim a sua usabilidade no dom´ınio da performance art´ıstica. Apesar duma implementac¸ ˜ao espec´ıfica da framework ser apresentada nesta dissertac¸ ˜ao e disponibilizada como software open-source, a abordagem proposta ´e suficientemente flex´ıvel para que esta seja usada noutros cen´ arios. Abrindo assim, o caminho para aplicac¸ ˜oes que poder˜ao beneficiar n˜ao s´o o dom´ınio das artes performativas, mas tamb´em, provavelmente num futuro pr ´oximo, outros tipos de comunicac¸ ˜ao, como por exemplo, a linguagem gestual usada em casos de deficiˆencia auditiva

Repositório Institucional da Universidade Católica Portuguesa

Toward Vision-based Control of Heavy-Duty and Long-Reach Robotic Manipulators

Author: Mäkinen Petri
Publication venue: Tampere University Press
Publication date: 08/12/2023
Field of study

Heavy-duty mobile machines are an important part of the industry, and they are used for various work tasks in mining, construction, forestry, and agriculture. Many of these machines have heavy-duty, long-reach (HDLR) manipulators attached to them, which are used for work tasks such as drilling, lifting, and grabbing. A robotic manipulator, by deﬁnition, is a device used for manipulating materials without direct physical contact by a human operator. HDLR manipulators diﬀer from manipulators of conventional industrial robots in the sense that they are subject to much larger kinematic and non-kinematic errors, which hinder the overall accuracy and repeatability of the robot’s tool center point (TCP). Kinematic errors result from modeling inaccuracies, while non-kinematic errors include structural ﬂexibility and bending, thermal eﬀects, backlash, and sensor resolution. Furthermore, conventional six degrees of freedom (DOF) industrial robots are more general-purpose systems, whereas HDLR manipulators are mostly designed for special (or single) purposes. HDLR manipulators are typically built as lightweight as possible while being able to handle signiﬁcant load masses. Consequently, they have long reaches and high payload-to-own-weight ratios, which contribute to the increased errors compared to conventional industrial robots. For example, a joint angle measurement error of 0.5◦ associated with a 5-m-long rigid link results in an error of approximately 4.4 cm at the end of the link, with further errors resulting from ﬂexibility and other non-kinematic aspects. The target TCP positioning accuracy for HDLR manipulators is in the sub-centimeter range, which is very diﬃcult to achieve in practical systems. These challenges have somewhat delayed the automation of HDLR manipulators, while conventional industrial robots have long been commercially available. This is also attributed to the fact that machines with HDLR manipulators have much lower production volumes, and the work tasks are more non-repetitive in nature compared to conventional industrial robots in factories. Sensors are a key requirement in order to achieve automated operations and eventually full autonomy. For example, humans mostly rely on their visual perception in work tasks, while the collected information is processed in the brain. Much like humans, autonomous machines also require both sensing and intelligent processing of the collected sensor data. This dissertation investigates new visual sensing solutions for HDLR manipulators, which are striving toward increased automation levels in various work tasks. The focus is on visual perception and generic 6 DOF TCP pose estimation of HDLR manipulators in unknown (or unstructured) environments. Methods for increasing the robustness and reliability of visual perception systems are examined by exploiting sensor redundancy and data fusion. Vision-aided control using targetless, motion-based local calibration between an HDLR manipulator and a visual sensor is also proposed to improve the absolute positioning accuracy of the TCP despite the kinematic and non-kinematic errors present in the system. It is experimentally shown that a sub-centimeter TCP positioning accuracy was reliably achieved in the tested cases using a developed trajectory-matching-based method. Overall, this compendium thesis includes four publications and one unpublished manuscript related to these topics. Two main research problems, inspired by the industry, are considered and investigated in the presented publications. The outcome of this thesis provides insight into possible applications and beneﬁts of advanced visual perception systems for HDLR manipulators in dynamic, unstructured environments. The main contribution is related to achieving sub-centimeter TCP positioning accuracy for an HDLR manipulator using a low-cost camera. The numerous challenges and complexities related to HDLR manipulators and visual sensing are also highlighted and discussed

Trepo - Institutional Repository of Tampere University

New directions in the analysis of movement patterns in space and time

Author: Chavoshi Seyed Hossein
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

Intelligent Sensors for Human Motion Analysis

Author
Publication venue: 'MDPI AG'
Publication date: 25/10/2022
Field of study

The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

Directory of Open Access Books (DOAB)

Content rendering and interaction technologies for digital heritage systems

Author: Patoli Muhammad Zeeshan
Publication venue
Publication date: 06/06/2011
Field of study

Existing digital heritage systems accommodate a huge amount of digital repository information; however their content rendering and interaction components generally lack the more interesting functionality that allows better interaction with heritage contents. Many digital heritage libraries are simply collections of 2D images with associated metadata and textual content, i.e. little more than museum catalogues presented online. However, over the last few years, largely as a result of EU framework projects, some 3D representation of digital heritage objects are beginning to appear in a digital library context. In the cultural heritage domain, where researchers and museum visitors like to observe cultural objects as closely as possible and to feel their existence and use in the past, giving the user only 2D images along with textual descriptions significantly limits interaction and hence understanding of their heritage. The availability of powerful content rendering technologies, such as 3D authoring tools to create 3D objects and heritage scenes, grid tools for rendering complex 3D scenes, gaming engines to display 3D interactively, and recent advances in motion capture technologies for embodied immersion, allow the development of unique solutions for enhancing user experience and interaction with digital heritage resources and objects giving a higher level of understanding and greater benefit to the community. This thesis describes DISPLAYS (Digital Library Services for Playing with Shared Heritage Resources), which is a novel conceptual framework where five unique services are proposed for digital content: creation, archival, exposition, presentation and interaction services. These services or tools are designed to allow the heritage community to create, interpret, use and explore digital heritage resources organised as an online exhibition (or virtual museum). This thesis presents innovative solutions for two of these services or tools: content creation where a cost effective render grid is proposed; and an interaction service, where a heritage scenario is presented online using a real-time motion capture and digital puppeteer solution for the user to explore through embodied immersive interaction their digital heritage

Sussex Research Online

Expressive movement generation with machine learning

Author: Alemi Omid
Publication venue
Publication date: 25/03/2021
Field of study

Movement is an essential aspect of our lives. Not only do we move to interact with our physical environment, but we also express ourselves and communicate with others through our movements. In an increasingly computerized world where various technologies and devices surround us, our movements are essential parts of our interaction with and consumption of computational devices and artifacts. In this context, incorporating an understanding of our movements within the design of the technologies surrounding us can significantly improve our daily experiences. This need has given rise to the field of movement computing – developing computational models of movement that can perceive, manipulate, and generate movements. In this thesis, we contribute to the field of movement computing by building machine-learning-based solutions for automatic movement generation. In particular, we focus on using machine learning techniques and motion capture data to create controllable, generative movement models. We also contribute to the field by creating datasets, tools, and libraries that we have developed during our research. We start our research by reviewing the works on building automatic movement generation systems using machine learning techniques and motion capture data. Our review covers background topics such as high-level movement characterization, training data, features representation, machine learning models, and evaluation methods. Building on our literature review, we present WalkNet, an interactive agent walking movement controller based on neural networks. The expressivity of virtual, animated agents plays an essential role in their believability. Therefore, WalkNet integrates controlling the expressive qualities of movement with the goal-oriented behaviour of an animated virtual agent. It allows us to control the generation based on the valence and arousal levels of affect, the movement’s walking direction, and the mover’s movement signature in real-time. Following WalkNet, we look at controlling movement generation using more complex stimuli such as music represented by audio signals (i.e., non-symbolic music). Music-driven dance generation involves a highly non-linear mapping between temporally dense stimuli (i.e., the audio signal) and movements, which renders a more challenging modelling movement problem. To this end, we present GrooveNet, a real-time machine learning model for music-driven dance generation

Simon Fraser University Institutional Repository

Recommended from our members

Hand gesture recognition using deep learning neural networks

Author: Alnaim Norah
Publication venue: Brunel University London
Publication date: 01/01/2020
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonHuman Computer Interaction (HCI) is a broad field involving different types of interactions including gestures. Gesture recognition concerns non-verbal motions used as a means of communication in HCI. A system may be utilised to identify human gestures to convey information for device control. This represents a significant field within HCI involving device interfaces and users. The aim of gesture recognition is to record gestures that are formed in a certain way and then detected by a device such as a camera. Hand gestures can be used as a form of communication for many different applications. It may be used by people who possess different disabilities, including those with hearing-impairments, speech impairments and stroke patients, to communicate and fulfil their basic needs. Various studies have previously been conducted relating to hand gestures. Some studies proposed different techniques to implement the hand gesture experiments. For image processing there are multiple tools to extract features of images, as well as Artificial Intelligence which has varied classifiers to classify different types of data. 2D and 3D hand gestures request an effective algorithm to extract images and classify various mini gestures and movements. This research discusses this issue using different algorithms. To detect 2D or 3D hand gestures, this research proposed image processing tools such as Wavelet Transforms and Empirical Mode Decomposition to extract image features. The Artificial Neural Network (ANN) classifier which used to train and classify data besides Convolutional Neural Networks (CNN). These methods were examined in terms of multiple parameters such as execution time, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood, negative likelihood, receiver operating characteristic, area under ROC curve and root mean square. This research discusses four original contributions in the field of hand gestures. The first contribution is an implementation of two experiments using 2D hand gesture video where ten different gestures are detected in short and long distances using an iPhone 6 Plus with 4K resolution. The experiments are performed using WT and EMD for feature extraction while ANN and CNN for classification. The second contribution comprises 3D hand gesture video experiments where twelve gestures are recorded using holoscopic imaging system camera. The third contribution pertains experimental work carried out to detect seven common hand gestures. Finally, disparity experiments were performed using the left and the right 3D hand gesture videos to discover disparities. The results of comparison show the accuracy results of CNN being 100% compared to other techniques. CNN is clearly the most appropriate method to be used in a hand gesture system.Imam Abdulrahman bin Faisal Universit

Brunel University Research Archive

I-Light Symposium 2005 Proceedings

Author: I-Light
Publication venue
Publication date
Field of study

I-Light was made possible by a special appropriation by the State of Indiana. The research described at the I-Light Symposium has been supported by numerous grants from several sources. Any opinions, findings and conclusions, or recommendations expressed in the 2005 I-Light Symposium Proceedings are those of the researchers and authors and do not necessarily reflect the views of the granting agencies.Indiana University Office of the Vice President for Research and Information Technology, Purdue University Office of the Vice President for Information Technology and CI

IUScholarWorks (University of Indiana)

Poetics of Artificial Intelligence in Art Practice: (Mis)apprehended Bodies Remixed as Language

Author: gilchrist Bruce
Publication venue
Publication date
Field of study

With a focus on the last five years, art employing artificial intelligence (AI) has been defined by a spectrum of activity, from the deep learning explorations of neural network researchers to artists critiquing the broader social implications of AI technology. There is an emergence of and increasing access to new tools and techniques for repurposing and manipulating material in unprecedented ways in art. At the same time, there is a dearth of language outside the scientific domain with which to discuss it. A combination of contextual review, comparison of artistic approaches, and practical projects explores speculation that the conceptual repertoire for remix studies can open up to art enabled by AI and machine learning (ML). This research contributes a practical, conceptual and combinatorial approach for artists who do not necessarily have a grounding in engineering or computer science. A bricolage methodology—described by Annette Markham as combining serendipity, proximity and contingency—reveals the poetics of AI-enabled art in the form of an assemblage of techniques that understands poetics as active making (poiesis) as well as an approach to manipulating language. The poetic capacity of AI/ML is understood as an emergent form of remix technique, with the ML at its core functioning like a remix engine. This practice based research presents several projects founded on an interrelation of body, text Bruce Gilchrist. Poetics of Artificial Intelligence in Art Practice: (Mis)apprehended Bodies Remixed as Language. 3 and predictive technology enabled by a human-action-recognition algorithm combined with a natural language generator. A significant number of artistic works have been made around object and facial recognition, while very little (if any) artist activity has focused on human-action-recognition. For this reason, I concentrated my research there

Sunderland University Institutional Repository

Multimodal machine learning for intelligent mobility

Author: Jamie Roche (8064755)
Publication venue
Publication date: 20/07/2020
Field of study

Scientific problems are solved by finding the optimal solution for a specific task. Some problems can be solved analytically while other problems are solved using data driven methods. The use of digital technologies to improve the transportation of people and goods, which is referred to as intelligent mobility, is one of the principal beneficiaries of data driven solutions. Autonomous vehicles are at the heart of the developments that propel Intelligent Mobility. Due to the high dimensionality and complexities involved in real-world environments, it needs to become commonplace for intelligent mobility to use data-driven solutions. As it is near impossible to program decision making logic for every eventuality manually. While recent developments of data-driven solutions such as deep learning facilitate machines to learn effectively from large datasets, the application of techniques within safety-critical systems such as driverless cars remain scarce.Autonomous vehicles need to be able to make context-driven decisions autonomously in different environments in which they operate. The recent literature on driverless vehicle research is heavily focused only on road or highway environments but have discounted pedestrianized areas and indoor environments. These unstructured environments tend to have more clutter and change rapidly over time. Therefore, for intelligent mobility to make a significant impact on human life, it is vital to extend the application beyond the structured environments. To further advance intelligent mobility, researchers need to take cues from multiple sensor streams, and multiple machine learning algorithms so that decisions can be robust and reliable. Only then will machines indeed be able to operate in unstructured and dynamic environments safely. Towards addressing these limitations, this thesis investigates data driven solutions towards crucial building blocks in intelligent mobility. Specifically, the thesis investigates multimodal sensor data fusion, machine learning, multimodal deep representation learning and its application of intelligent mobility. This work demonstrates that mobile robots can use multimodal machine learning to derive driver policy and therefore make autonomous decisions.To facilitate autonomous decisions necessary to derive safe driving algorithms, we present an algorithm for free space detection and human activity recognition. Driving these decision-making algorithms are specific datasets collected throughout this study. They include the Loughborough London Autonomous Vehicle dataset, and the Loughborough London Human Activity Recognition dataset. The datasets were collected using an autonomous platform design and developed in house as part of this research activity. The proposed framework for Free-Space Detection is based on an active learning paradigm that leverages the relative uncertainty of multimodal sensor data streams (ultrasound and camera). It utilizes an online learning methodology to continuously update the learnt model whenever the vehicle experiences new environments. The proposed Free Space Detection algorithm enables an autonomous vehicle to self-learn, evolve and adapt to new environments never encountered before. The results illustrate that online learning mechanism is superior to one-off training of deep neural networks that require large datasets to generalize to unfamiliar surroundings. The thesis takes the view that human should be at the centre of any technological development related to artificial intelligence. It is imperative within the spectrum of intelligent mobility where an autonomous vehicle should be aware of what humans are doing in its vicinity. Towards improving the robustness of human activity recognition, this thesis proposes a novel algorithm that classifies point-cloud data originated from Light Detection and Ranging sensors. The proposed algorithm leverages multimodality by using the camera data to identify humans and segment the region of interest in point cloud data. The corresponding 3-dimensional data was converted to a Fisher Vector Representation before being classified by a deep Convolutional Neural Network. The proposed algorithm classifies the indoor activities performed by a human subject with an average precision of 90.3%. When compared to an alternative point cloud classifier, PointNet[1], [2], the proposed framework out preformed on all classes. The developed autonomous testbed for data collection and algorithm validation, as well as the multimodal data-driven solutions for driverless cars, is the major contributions of this thesis. It is anticipated that these results and the testbed will have significant implications on the future of intelligent mobility by amplifying the developments of intelligent driverless vehicles.</div

Loughborough University Institutional Repository

Scipedia