283 research outputs found

    DeePoint: Pointing Recognition and Direction Estimation From A Fixed View

    Full text link
    In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset. DP Dataset consists of more than 2 million frames of over 33 people pointing in various styles annotated for each frame with pointing timings and 3D directions. The second is DeePoint, a novel deep network model for joint recognition and 3D direction estimation of pointing. DeePoint is a Transformer-based network which fully leverages the spatio-temporal coordination of the body parts, not just the hands. Through extensive experiments, we demonstrate the accuracy and efficiency of DeePoint. We believe DP Dataset and DeePoint will serve as a sound foundation for visual human intention understanding

    Tracking and modeling focus of attention in meetings [online]

    Get PDF
    Abstract This thesis addresses the problem of tracking the focus of attention of people. In particular, a system to track the focus of attention of participants in meetings is developed. Obtaining knowledge about a person\u27s focus of attention is an important step towards a better understanding of what people do, how and with what or whom they interact or to what they refer. In meetings, focus of attention can be used to disambiguate the addressees of speech acts, to analyze interaction and for indexing of meeting transcripts. Tracking a user\u27s focus of attention also greatly contributes to the improvement of human­computer interfaces since it can be used to build interfaces and environments that become aware of what the user is paying attention to or with what or whom he is interacting. The direction in which people look; i.e., their gaze, is closely related to their focus of attention. In this thesis, we estimate a subject\u27s focus of attention based on his or her head orientation. While the direction in which someone looks is determined by head orientation and eye gaze, relevant literature suggests that head orientation alone is a su#cient cue for the detection of someone\u27s direction of attention during social interaction. We present experimental results from a user study and from several recorded meetings that support this hypothesis. We have developed a Bayesian approach to model at whom or what someone is look­ ing based on his or her head orientation. To estimate head orientations in meetings, the participants\u27 faces are automatically tracked in the view of a panoramic camera and neural networks are used to estimate their head orientations from pre­processed images of their faces. Using this approach, the focus of attention target of subjects could be correctly identified during 73% of the time in a number of evaluation meet­ ings with four participants. In addition, we have investigated whether a person\u27s focus of attention can be pre­dicted from other cues. Our results show that focus of attention is correlated to who is speaking in a meeting and that it is possible to predict a person\u27s focus of attention based on the information of who is talking or was talking before a given moment. We have trained neural networks to predict at whom a person is looking, based on information about who was speaking. Using this approach we were able to predict who is looking at whom with 63% accuracy on the evaluation meetings using only information about who was speaking. We show that by using both head orientation and speaker information to estimate a person\u27s focus, the accuracy of focus detection can be improved compared to just using one of the modalities for focus estimation. To demonstrate the generality of our approach, we have built a prototype system to demonstrate focus­aware interaction with a household robot and other smart appliances in a room using the developed components for focus of attention tracking. In the demonstration environment, a subject could interact with a simulated household robot, a speech­enabled VCR or with other people in the room, and the recipient of the subject\u27s speech was disambiguated based on the user\u27s direction of attention. Zusammenfassung Die vorliegende Arbeit beschäftigt sich mit der automatischen Bestimmung und Ver­folgung des Aufmerksamkeitsfokus von Personen in Besprechungen. Die Bestimmung des Aufmerksamkeitsfokus von Personen ist zum Verständnis und zur automatischen Auswertung von Besprechungsprotokollen sehr wichtig. So kann damit beispielsweise herausgefunden werden, wer zu einem bestimmten Zeitpunkt wen angesprochen hat beziehungsweise wer wem zugehört hat. Die automatische Bestim­mung des Aufmerksamkeitsfokus kann desweiteren zur Verbesserung von Mensch-Maschine­Schnittstellen benutzt werden. Ein wichtiger Hinweis auf die Richtung, in welche eine Person ihre Aufmerksamkeit richtet, ist die Kopfstellung der Person. Daher wurde ein Verfahren zur Bestimmung der Kopfstellungen von Personen entwickelt. Hierzu wurden künstliche neuronale Netze benutzt, welche als Eingaben vorverarbeitete Bilder des Kopfes einer Person erhalten, und als Ausgabe eine Schätzung der Kopfstellung berechnen. Mit den trainierten Netzen wurde auf Bilddaten neuer Personen, also Personen, deren Bilder nicht in der Trainingsmenge enthalten waren, ein mittlerer Fehler von neun bis zehn Grad für die Bestimmung der horizontalen und vertikalen Kopfstellung erreicht. Desweiteren wird ein probabilistischer Ansatz zur Bestimmung von Aufmerksamkeits­zielen vorgestellt. Es wird hierbei ein Bayes\u27scher Ansatzes verwendet um die A­posterior iWahrscheinlichkeiten verschiedener Aufmerksamkteitsziele, gegeben beobachteter Kopfstellungen einer Person, zu bestimmen. Die entwickelten Ansätze wurden auf mehren Besprechungen mit vier bis fünf Teilnehmern evaluiert. Ein weiterer Beitrag dieser Arbeit ist die Untersuchung, inwieweit sich die Blickrich­tung der Besprechungsteilnehmer basierend darauf, wer gerade spricht, vorhersagen läßt. Es wurde ein Verfahren entwickelt um mit Hilfe von neuronalen Netzen den Fokus einer Person basierend auf einer kurzen Historie der Sprecherkonstellationen zu schätzen. Wir zeigen, dass durch Kombination der bildbasierten und der sprecherbasierten Schätzung des Aufmerksamkeitsfokus eine deutliche verbesserte Schätzung erreicht werden kann. Insgesamt wurde mit dieser Arbeit erstmals ein System vorgestellt um automatisch die Aufmerksamkeit von Personen in einem Besprechungsraum zu verfolgen. Die entwickelten Ansätze und Methoden können auch zur Bestimmung der Aufmerk­samkeit von Personen in anderen Bereichen, insbesondere zur Steuerung von comput­erisierten, interaktiven Umgebungen, verwendet werden. Dies wird an einer Beispielapplikation gezeigt

    Development of new intelligent autonomous robotic assistant for hospitals

    Get PDF
    Continuous technological development in modern societies has increased the quality of life and average life-span of people. This imposes an extra burden on the current healthcare infrastructure, which also creates the opportunity for developing new, autonomous, assistive robots to help alleviate this extra workload. The research question explored the extent to which a prototypical robotic platform can be created and how it may be implemented in a hospital environment with the aim to assist the hospital staff with daily tasks, such as guiding patients and visitors, following patients to ensure safety, and making deliveries to and from rooms and workstations. In terms of major contributions, this thesis outlines five domains of the development of an actual robotic assistant prototype. Firstly, a comprehensive schematic design is presented in which mechanical, electrical, motor control and kinematics solutions have been examined in detail. Next, a new method has been proposed for assessing the intrinsic properties of different flooring-types using machine learning to classify mechanical vibrations. Thirdly, the technical challenge of enabling the robot to simultaneously map and localise itself in a dynamic environment has been addressed, whereby leg detection is introduced to ensure that, whilst mapping, the robot is able to distinguish between people and the background. The fourth contribution is geometric collision prediction into stabilised dynamic navigation methods, thus optimising the navigation ability to update real-time path planning in a dynamic environment. Lastly, the problem of detecting gaze at long distances has been addressed by means of a new eye-tracking hardware solution which combines infra-red eye tracking and depth sensing. The research serves both to provide a template for the development of comprehensive mobile assistive-robot solutions, and to address some of the inherent challenges currently present in introducing autonomous assistive robots in hospital environments.Open Acces

    Spatially Aware Computing for Natural Interaction

    Get PDF
    Spatial information refers to the location of an object in a physical or digital world. Besides, it also includes the relative position of an object related to other objects around it. In this dissertation, three systems are designed and developed. All of them apply spatial information in different fields. The ultimate goal is to increase the user friendliness and efficiency in those applications by utilizing spatial information. The first system is a novel Web page data extraction application, which takes advantage of 2D spatial information to discover structured records from a Web page. The extracted information is useful to re-organize the layout of a Web page to fit mobile browsing. The second application utilizes the 3D spatial information of a mobile device within a large paper-based workspace to implement interactive paper that combines the merits of paper documents and mobile devices. This application can overlay digital information on top of a paper document based on the location of a mobile device within a workspace. The third application further integrates 3D space information with sound detection to realize an automatic camera management system. This application automatically controls multiple cameras in a conference room, and creates an engaging video by intelligently switching camera shots among meeting participants based on their activities. Evaluations have been made on all three applications, and the results are promising. In summary, this dissertation comprehensively explores the usage of spatial information in various applications to improve the usability

    An expandable walking in place platform

    Get PDF
    The control of locomotion in 3D virtual environments should be an ordinary task, from the user point-of-view. Several navigation metaphors have been explored to control locomotion naturally, such as: real walking, the use of simulators, and walking in place. These have proven that the more natural the approach used to control locomotion, the more immerse the user will feel inside the virtual environment. Overcoming the high cost and complexity for the use of most approaches in the field, we introduce a walking in place platform that is able to identify orientation, speed for displacement, as well as lateral steps, of a person mimicking walking pattern. The detection of this information is made without use of additional sensors attached to user body. Our device is simple to mount, inexpensive and allows almost natural use, with lazy steps, thus releasing the hands for other uses. Also, we explore and test a passive, tactile surface for safe use of our platform. The platform was conceived to be utilized as an interface to control navigation in virtual environments, and augmented reality. Extending our device and techniques, we have elaborated a redirection walking metaphor, to be used together with a cave automatic virtual environment. Another metaphor allowed the use of our technique for navigating in point clouds for tagging of data. We tested the use of our technique associated with two different navigation modes: human walking and vehicle driving. In the human walking approach, the virtual orientation inhibits the displacement when sharp turns are made by the user. In vehicle mode, the virtual orientation and displacement occur together, more similar to a vehicle driving approach. We applied tests to detect preferences of navigation mode and ability to use our device to 52 subjects. We identified a preference for the vehicle driving mode of navigation. The use of statistics revealed that users learned easily the use of our technique for navigation. Users were faster walking in vehicle mode; but human mode allowed precise walking in the virtual test environment. The tactile platform proved to allow safe use of our device, being an effective and simple solution for the field. More than 200 people tested our device: UFRGS Portas Abertas in 2013 and 2014, which was a event to present to local community academic works; during 3DUI 2014, where our work was utilized together with a tool for point cloud manipulation. The main contributions of our work are a new approach for detection of walking in place, which allows simple use, with naturalness of movements, expandable for utilization in large areas (such as public spaces), and that efficiently supply orientation and speed to use in virtual environments or augmented reality, with inexpensive hardware.O controle da locomoção em ambientes virtuais 3D deveria ser uma tarefa simples, do ponto de vista do usuário. Durante os anos, metáforas para navegação têm sido exploradas para permitir o controle da locomoção naturalmente, tais como: caminhada real; uso de simuladores e imitação de caminhada. Estas técnicas provaram que, quanto mais natural à abordagem utilizada para controlar a locomoção, mais imerso o usuário vai se sentir dentro do ambiente virtual. Superando o alto custo e complexidade de uso da maioria das abordagens na área, introduzimos uma plataforma para caminhada no lugar, (usualmente reportado como wal king in place), que é capaz de identificar orientação, velocidade de deslocamento, bem como passos laterais, de uma pessoa imitando a caminhada. A detecção desta informação é feita sem o uso de sensores presos no corpo dos usuários, apenas utilizando a plataforma. Nosso dispositivo é simples de montar, barato e permite seu uso por pessoas comuns de forma quase natural, com passos pequenos, assim deixando as mãos livres para outras tarefas. Nós também exploramos e testamos uma superfície táctil passiva para utilização segura de nossa plataforma. A plataforma foi concebida para ser utilizada como uma interface para navegação em ambientes virtuais. Estendendo o uso de nossa técnica e dis positivo, nós elaboramos uma metáfora para caminhada redirecionada, para ser utilizada em conjunto com cavernas de projeção, (usualmente reportado como Cave automatic vir tual environment (CAVE)). Criamos também uma segunda metáfora para navegação, a qual permitiu o uso de nossa técnica para navegação em nuvem de pontos, auxiliando no processo de etiquetagem destes, como parte da competição para o 3D User Interface que ocorreu em Minessota, nos Estados Unidos, em 2014. Nós testamos o uso da técnica e dispositivos associada com duas nuances de navegação: caminhada humana e controle de veiculo. Na abordagem caminhada humana, a taxa de mudança da orientação gerada pelo usuário ao utilizar nosso dispositivo, inibia o deslocamento quando curvas agudas eram efetuadas. No modo veículo, a orientação e o deslocamento ocorriam conjuntamente quando o usuário utilizava nosso dispositivo e técnicas, similarmente ao processo de controle de direção de um veículo. Nós aplicamos testes para determinar o modo de navegação de preferencia para uti lização de nosso dispositivo, em 52 sujeitos. Identificamos uma preferencia pelo modo de uso que se assimila a condução de um veículo. Testes estatísticos revelaram que os usuários aprenderam facilmente a usar nossa técnica para navegar em ambientes virtuais. Os usuários foram mais rápidos utilizando o modo veículo, mas o modo humano garantiu maior precisão no deslocamento no ambiente virtual. A plataforma táctil provou permi tir o uso seguro de nosso dispositivo, sendo uma solução efetiva e simples para a área. Mais de 200 pessoas testaram nosso dispositivo e técnicas: no evento Portas Abertas da UFRGS em 2013 e 2014, um evento onde são apresentados para a comunidade local os trabalhos executados na universidade; e no 3D User Interface, onde nossa técnica e dis positivos foram utilizados em conjunto com uma ferramenta de seleção de pontos numa competição. As principais contribuições do nosso trabalho são: uma nova abordagem para de tecção de imitação de caminhada, a qual permite um uso simples, com naturalidade de movimentos, expansível para utilização em áreas grandes, como espaços públicos e que efetivamente captura informações de uso e fornece orientação e velocidade para uso em ambientes virtuais ou de realidade aumentada, com uso de hardware barato

    The feet in human--computer interaction: a survey of foot-based interaction

    Get PDF
    Foot-operated computer interfaces have been studied since the inception of human--computer interaction. Thanks to the miniaturisation and decreasing cost of sensing technology, there is an increasing interest exploring this alternative input modality, but no comprehensive overview of its research landscape. In this survey, we review the literature on interfaces operated by the lower limbs. We investigate the characteristics of users and how they affect the design of such interfaces. Next, we describe and analyse foot-based research prototypes and commercial systems in how they capture input and provide feedback. We then analyse the interactions between users and systems from the perspective of the actions performed in these interactions. Finally, we discuss our findings and use them to identify open questions and directions for future research

    Activity Recognition Based on Micro-Doppler Signature with In-Home Wi-Fi

    Get PDF
    Device free activity recognition and monitoring has become a promising research area with increasing public interest in pattern of life monitoring and chronic health conditions. This paper proposes a novel framework for inhome Wi-Fi signal-based activity recognition in e-healthcare applications using passive micro-Doppler (m-D) signature classification. The framework includes signal modeling, Doppler extraction and m-D classification. A data collection campaign was designed to verify the framework where six m-D signatures corresponding to typical daily activities are sucessfully detected and classified using our software defined radio (SDR) demo system. Analysis of the data focussed on potential discriminative characteristics, such as maximum Doppler frequency and time duration of activity. Finally, a sparsity induced classifier is applied for adaptting the method in healthcare application scenarios and the results are compared with those from the well-known Support Vector Machine (SVM) method
    corecore