49 research outputs found

    Automatic Sign Language Recognition from Image Data

    Get PDF
    Tato práce se zabývá problematikou automatického rozpoznávání znakového jazyka z obrazových dat. Práce představuje pět hlavních přínosů v oblasti tvorby systému pro rozpoznávání, tvorby korpusů, extrakci příznaků z rukou a obličeje s využitím metod pro sledování pozice a pohybu rukou (tracking) a modelování znaků s využitím menších fonetických jednotek (sub-units). Metody využité v rozpoznávacím systému byly využity i k tvorbě vyhledávacího nástroje "search by example", který dokáže vyhledávat ve videozáznamech podle obrázku ruky. Navržený systém pro automatické rozpoznávání znakového jazyka je založen na statistickém přístupu s využitím skrytých Markovových modelů, obsahuje moduly pro analýzu video dat, modelování znaků a dekódování. Systém je schopen rozpoznávat jak izolované, tak spojité promluvy. Veškeré experimenty a vyhodnocení byly provedeny s vlastními korpusy UWB-06-SLR-A a UWB-07-SLR-P, první z nich obsahuje 25 znaků, druhý 378. Základní extrakce příznaků z video dat byla provedena na nízkoúrovňových popisech obrazu. Lepších výsledků bylo dosaženo s příznaky získaných z popisů vyšší úrovně porozumění obsahu v obraze, které využívají sledování pozice rukou a metodu pro segmentaci rukou v době překryvu s obličejem. Navíc, využitá metoda dokáže interpolovat obrazy s obličejem v době překryvu a umožňuje tak využít metody pro extrakci příznaků z obličeje, které by během překryvu nefungovaly, jako např. metoda active appearance models (AAM). Bylo porovnáno několik různých metod pro extrakci příznaků z rukou, jako např. local binary patterns (LBP), histogram of oriented gradients (HOG), vysokoúrovnové lingvistické příznaky a nové navržená metoda hand shape radial distance function (hRDF). Bylo také zkoumáno využití menších fonetických jednotek, než jsou celé znaky, tzv. sub-units. Pro první krok tvorby těchto jednotek byl navržen iterativní algoritmus, který tyto jednotky automaticky vytváří analýzou existujících dat. Bylo ukázáno, že tento koncept je vhodný pro modelování a rozpoznávání znaků. Kromě systému pro rozpoznávání je v práci navržen a představen systém "search by example", který funguje jako vyhledávací systém pro videa se záznamy znakového jazyka a může být využit například v online slovnících znakového jazyka, kde je v současné době složité či nemožné v takovýchto datech vyhledávat. Tento nástroj využívá metody, které byly použity v rozpoznávacím systému. Výstupem tohoto vyhledávacího nástroje je seřazený seznam videí, které obsahují stejný nebo podobný tvar ruky, které zadal uživatel, např. přes webkameru.Katedra kybernetikyObhájenoThis thesis addresses several issues of automatic sign language recognition, namely the creation of vision based sign language recognition framework, sign language corpora creation, feature extraction, making use of novel hand tracking with face occlusion handling, data-driven creation of sub-units and "search by example" tool for searching in sign language corpora using hand images as a search query. The proposed sign language recognition framework, based on statistical approach incorporating hidden Markov models (HMM), consists of video analysis, sign modeling and decoding modules. The framework is able to recognize both isolated signs and continuous utterances from video data. All experiments and evaluations were performed on two own corpora, UWB-06-SLR-A and UWB-07-SLR-P, the first containing 25 signs and second 378. As a baseline feature descriptors, low level image features are used. It is shown that better performance is gained by higher level features that employ hand tracking, which resolve occlusions of hands and face. As a side effect, the occlusion handling method interpolates face area in the frames during the occlusion and allows to use face feature descriptors that fail in such a case, for instance features extracted from active appearance models (AAM) tracker. Several state-of-the-art appearance-based feature descriptors were compared for tracked hands, such as local binary patterns (LBP), histogram of oriented gradients (HOG), high-level linguistic features or newly proposed hand shape radial distance function (denoted as hRDF) that enhances the feature description of hand-shape like concave regions. The concept of sub-units, that uses HMM models based on linguistic units smaller than whole sign and covers inner structures of the signs, was investigated in the proposed iterative method that is a first required step for data-driven construction of sub-units, and shows that such a concept is suitable for sign modeling and recognition tasks. Except of experiments in the sign language recognition, additional tool \textit{search by example} was created and evaluated. This tool is a search engine for sign language videos. Such a system can be incorporated into an online sign language dictionary where it is difficult to search in the sign language data. This proposed tool employs several methods which were examined in the sign language recognition task and allows to search in the video corpora based on an user-given query that consists of one or multiple images of hands. As a result, an ordered list of videos that contain the same or similar hand configurations is returned

    Data-driven Communicative Behaviour Generation: A Survey

    Get PDF
    The development of data-driven behaviour generating systems has recently become the focus of considerable attention in the fields of human–agent interaction and human–robot interaction. Although rule-based approaches were dominant for years, these proved inflexible and expensive to develop. The difficulty of developing production rules, as well as the need for manual configuration to generate artificial behaviours, places a limit on how complex and diverse rule-based behaviours can be. In contrast, actual human–human interaction data collected using tracking and recording devices makes humanlike multimodal co-speech behaviour generation possible using machine learning and specifically, in recent years, deep learning. This survey provides an overview of the state of the art of deep learning-based co-speech behaviour generation models and offers an outlook for future research in this area.</jats:p

    Face Detection and Verification using Local Binary Patterns

    Get PDF
    This thesis proposes a robust Automatic Face Verification (AFV) system using Local Binary Patterns (LBP). AFV is mainly composed of two modules: Face Detection (FD) and Face Verification (FV). The purpose of FD is to determine whether there are any face in an image, while FV involves confirming or denying the identity claimed by a person. The contributions of this thesis are the following: 1) a real-time multiview FD system which is robust to illumination and partial occlusion, 2) a FV system based on the adaptation of LBP features, 3) an extensive study of the performance evaluation of FD algorithms and in particular the effect of FD errors on FV performance. The first part of the thesis addresses the problem of frontal FD. We introduce the system of Viola and Jones which is the first real-time frontal face detector. One of its limitations is the sensitivity to local lighting variations and partial occlusion of the face. In order to cope with these limitations, we propose to use LBP features. Special emphasis is given to the scanning process and to the merging of overlapped detections, because both have a significant impact on the performance. We then extend our frontal FD module to multiview FD. In the second part, we present a novel generative approach for FV, based on an LBP description of the face. The main advantages compared to previous approaches are a very fast and simple training procedure and robustness to bad lighting conditions. In the third part, we address the problem of estimating the quality of FD. We first show the influence of FD errors on the FV task and then empirically demonstrate the limitations of current detection measures when applied to this task. In order to properly evaluate the performance of a face detection module, we propose to embed the FV into the performance measuring process. We show empirically that the proposed methodology better matches the final FV performance

    Gesture recognition with application to human-robot interaction

    Get PDF
    Gestures are a natural form of communication, often transcending language barriers. Recently, much research has been focused on achieving natural human-machine interaction using gestures. This dissertation presents the design of a gestural interface that can be used to control a robot. The system consists of two modes: far-mode and near-mode. In far-mode interaction, upper-body gestures are used to control the motion of a robot. Near-mode interaction uses static hand poses to control a graphical user interface. For upper-body gesture recognition, features are extracted from skeletal data. The extracted features consist of joint angles and relative joint positions and are extracted for each frame of the gesture sequence. A novel key-frame selection algorithm is used to align the gesture sequences temporally. A neural network and hidden Markov model are then used to classify the gestures. The framework was tested on three different datasets, the CMU Military dataset of 3 users, 15 gestures and 10 repetitions per gesture, the VisApp2013 dataset with 28 users, 8 gestures and 1 repetition/gesture and a recorded dataset of 15 users, 10 gestures and 3 repetitions per gesture. The system is shown to achieve a recognition rate of 100% across the three different datasets, using the key-frame selection and a neural network for gesture identification. Static hand-gesture recognition is achieved by first retrieving the 24-DOF hand model. The hand is segmented from the image using both depth and colour information. A novel calibration method is then used to automatically obtain the anthropometric measurements of the user’s hand. The k-curvature algorithm, depth-based and parallel border-based methods are used to detect fingertips in the image. An average detection accuracy of 88% is achieved. A neural network and k-means classifier are then used to classify the static hand gestures. The framework was tested on a dataset of 15 users, 12 gestures and 3 repetitions per gesture. A correct classification rate of 75% is achieved using the neural network. It is shown that the proposed system is robust to changes in skin colour and user hand size

    Visuelle Benutzermodellierung mit Tracking und Zeigegestenerkennung für einen humanoiden Roboter

    Get PDF

    Digital Interaction and Machine Intelligence

    Get PDF
    This book is open access, which means that you have free and unlimited access. This book presents the Proceedings of the 9th Machine Intelligence and Digital Interaction Conference. Significant progress in the development of artificial intelligence (AI) and its wider use in many interactive products are quickly transforming further areas of our life, which results in the emergence of various new social phenomena. Many countries have been making efforts to understand these phenomena and find answers on how to put the development of artificial intelligence on the right track to support the common good of people and societies. These attempts require interdisciplinary actions, covering not only science disciplines involved in the development of artificial intelligence and human-computer interaction but also close cooperation between researchers and practitioners. For this reason, the main goal of the MIDI conference held on 9-10.12.2021 as a virtual event is to integrate two, until recently, independent fields of research in computer science: broadly understood artificial intelligence and human-technology interaction

    New HCI techniques for better living through technology

    Get PDF
    In the Human Computer Interaction community, researchers work on many projects that investigate the efficacy of new technologies for better living, but unlike other research fields, these researchers must have an approach that is typically multi-disciplinary. Technology is always developing thus improving our lives in many ways like education, health and communication. This due to the fact that it is supposed to make life easier. This dissertation explores three main aspects: the first is learning with new technologies, the second is the improvement of real life by using innovative devices while the third is the usage of mobile devices in combination with image processing algorithms and computer graphics techniques. We firstly describe the progress on the state of the art and related work that have been necessary to implement such tools on commodity hardware and deploy them in both mobile and desktop settings. We propose the usage of different technologies in different settings, comparing these solutions for enhancing the interaction experience by introducing virtual/augmented reality tools for supporting this kind of activities. We also applied well-known gamification techniques coming from different mobile applications for demonstrating how users can be entertained and motivated in their working out. We describe our design and prototype of several integrated systems created to improve the educational process, to enhance the shopping experience, to provide new experiences for travellers and even to improve fitness and wellness activities. Finally, we discuss our findings and frame them in the broader context of better living thanks to technology, drawing the lessons learnt from each work while also proposing relative future work

    Understanding and designing for control in camera operation

    Get PDF
    Kameraleute nutzen traditionell gezielt Hilfsmittel um kontrollierte Kamerabewegungen zu ermöglichen. Der technische Fortschritt hat hierbei unlängst zum Entstehen neuer Werkzeugen wie Gimbals, Drohnen oder Robotern beigetragen. Dabei wurden durch eine Kombination von Motorisierung, Computer-Vision und Machine-Learning auch neue Interaktionstechniken eingeführt. Neben dem etablierten achsenbasierten Stil wurde nun auch ein inhaltsbasierter Interaktionsstil ermöglicht. Einerseits vereinfachte dieser die Arbeit, andererseits aber folgten dieser (Teil-)Automatisierung auch unerwünschte Nebeneffekte. Grundsätzlich wollen sich Kameraleute während der Kamerabewegung kontinuierlich in Kontrolle und am Ende als Autoren der Aufnahmen fühlen. Während Automatisierung hierbei Experten unterstützen und Anfänger befähigen kann, führt sie unweigerlich auch zu einem gewissen Verlust an gewünschter Kontrolle. Wenn wir Kamerabewegung mit neuen Werkzeugen unterstützen wollen, stellt sich uns daher die Frage: Wie sollten wir diese Werkzeuge gestalten damit sie, trotz fortschreitender Automatisierung ein Gefühl von Kontrolle vermitteln? In der Vergangenheit wurde Kamerakontrolle bereits eingehend erforscht, allerdings vermehrt im virtuellen Raum. Die Anwendung inhaltsbasierter Kontrolle im physikalischen Raum trifft jedoch auf weniger erforschte domänenspezifische Herausforderungen welche gleichzeitig auch neue Gestaltungsmöglichkeiten eröffnen. Um dabei auf Nutzerbedürfnisse einzugehen, müssen sich Schnittstellen zum Beispiel an diese Einschränkungen anpassen können und ein Zusammenspiel mit bestehenden Praktiken erlauben. Bisherige Forschung fokussierte sich oftmals auf ein technisches Verständnis von Kamerafahrten, was sich auch in der Schnittstellengestaltung niederschlug. Im Gegensatz dazu trägt diese Arbeit zu einem besseren Verständnis der Motive und Praktiken von Kameraleuten bei und bildet eine Grundlage zur Forschung und Gestaltung von Nutzerschnittstellen. Diese Arbeit präsentiert dazu konkret drei Beiträge: Zuerst beschreiben wir ethnographische Studien über Experten und deren Praktiken. Sie zeigen vor allem die Herausforderungen von Automatisierung bei Kreativaufgaben auf (Assistenz vs. Kontrollgefühl). Zweitens, stellen wir ein Prototyping-Toolkit vor, dass für den Einsatz im Feld geeignet ist. Das Toolkit stellt Software für eine Replikation quelloffen bereit und erleichtert somit die Exploration von Designprototypen. Um Fragen zu deren Gestaltung besser beantworten zu können, stellen wir ebenfalls ein Evaluations-Framework vor, das vor allem Kontrollqualität und -gefühl bestimmt. Darin erweitern wir etablierte Ansätze um eine neurowissenschaftliche Methodik, um Daten explizit wie implizit erheben zu können. Drittens, präsentieren wir Designs und deren Evaluation aufbauend auf unserem Toolkit und Framework. Die Alternativen untersuchen Kontrolle bei verschiedenen Automatisierungsgraden und inhaltsbasierten Interaktionen. Auftretende Verdeckung durch graphische Elemente, wurde dabei durch visuelle Reduzierung und Mid-Air Gesten kompensiert. Unsere Studien implizieren hohe Grade an Kontrollqualität und -gefühl bei unseren Ansätzen, die zudem kreatives Arbeiten und bestehende Praktiken unterstützen.Cinematographers often use supportive tools to craft desired camera moves. Recent technological advances added new tools to the palette such as gimbals, drones or robots. The combination of motor-driven actuation, computer vision and machine learning in such systems also rendered new interaction techniques possible. In particular, a content-based interaction style was introduced in addition to the established axis-based style. On the one hand, content-based cocreation between humans and automated systems made it easier to reach high level goals. On the other hand however, the increased use of automation also introduced negative side effects. Creatives usually want to feel in control during executing the camera motion and in the end as the authors of the recorded shots. While automation can assist experts or enable novices, it unfortunately also takes away desired control from operators. Thus, if we want to support cinematographers with new tools and interaction techniques the following question arises: How should we design interfaces for camera motion control that, despite being increasingly automated, provide cinematographers with an experience of control? Camera control has been studied for decades, especially in virtual environments. Applying content-based interaction to physical environments opens up new design opportunities but also faces, less researched, domain-specific challenges. To suit the needs of cinematographers, designs need to be crafted with care. In particular, they must adapt to constraints of recordings on location. This makes an interplay with established practices essential. Previous work has mainly focused on a technology-centered understanding of camera travel which consequently influenced the design of camera control systems. In contrast, this thesis, contributes to the understanding of the motives of cinematographers, how they operate on set and provides a user-centered foundation informing cinematography specific research and design. The contribution of this thesis is threefold: First, we present ethnographic studies on expert users and their shooting practices on location. These studies highlight the challenges of introducing automation to a creative task (assistance vs feeling in control). Second, we report on a domain specific prototyping toolkit for in-situ deployment. The toolkit provides open source software for low cost replication enabling the exploration of design alternatives. To better inform design decisions, we further introduce an evaluation framework for estimating the resulting quality and sense of control. By extending established methodologies with a recent neuroscientific technique, it provides data on explicit as well as implicit levels and is designed to be applicable to other domains of HCI. Third, we present evaluations of designs based on our toolkit and framework. We explored a dynamic interplay of manual control with various degrees of automation. Further, we examined different content-based interaction styles. Here, occlusion due to graphical elements was found and addressed by exploring visual reduction strategies and mid-air gestures. Our studies demonstrate that high degrees of quality and sense of control are achievable with our tools that also support creativity and established practices
    corecore