161 research outputs found

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    Computer Vision Tools for Rodent Monitoring

    Get PDF
    RÉSUMÉ Les rongeurs sont régulièrement utilisés dans les expériences et la recherche biomédicale. Ceci est dû entre autres aux caractéristiques qu’ils partagent avec les humains, au faible coût et la facilité de leur entretien, et à la brièveté de leur cycle de vie. La recherche sur les rongeurs implique généralement de longues périodes de surveillance et de suivi. Quand cela est fait manuellement, ces tâches sont très fastidieuses et possiblement erronées. Ces tâches impliquent un technicien pour noter la position ou le comportement du rongeur en chaque instant. Des solutions de surveillance et de suivi automatique ont été mises au point pour diminuer la quantité de travail manuel et permettre de plus longues périodes de surveillance. Plusieurs des solutions proposées pour la surveillance automatique des animaux utilisent des capteurs mécaniques. Même si ces solutions ont été couronnées de succès dans leurs tâches prévues, les caméras vidéo sont toujours indispensables pour la validation ultérieure. Pour cette raison, il est logique d'utiliser la vision artificielle comme un moyen de surveiller et de suivre les rongeurs. Dans cette thèse, nous présentons des solutions de vision artificielle à trois problèmes connexes concernant le suivi et l’observation de rongeurs. La première solution consiste en un procédé pour suivre les rongeurs dans un environnement biomédical typique avec des contraintes minimales. La méthode est faite de deux phases. Dans la première phase, une technique de fenêtre glissante fondée sur trois caractéristiques est utilisée pour suivre le rongeur et déterminer sa position approximative dans le cadre. La seconde phase utilise la carte d’arrêts et un système d'impulsions pour ajuster les limites de la fenêtre de suivi aux contours du rongeur. Cette solution présente deux contributions. La première contribution consiste en une nouvelle caractéristique, les histogrammes d’intensité qui se chevauchent. La seconde contribution consiste en un nouveau procédé de segmentation qui utilise une soustraction d’arrière-plan en ligne basée sur les arrêts pour segmenter les bords du rongeur. La précision de suivi de la solution proposée est stable lorsqu’elle est appliquée à des rongeurs de tailles différentes. Il est également montré que la solution permet d'obtenir de meilleurs résultats qu’une méthode de l'état d’art. La deuxième solution consiste en un procédé pour détecter et identifier trois comportements chez les rongeurs dans des conditions biomédicales typiques. La solution utilise une méthode basée sur des règles combinée avec un système de classificateur multiple pour détecter et classifier le redressement, l’exploration et l’état statique chez un rongeur. La solution offre deux contributions. La première contribution consiste en une nouvelle méthode pour détecter le comportement des rongeurs en utilisant l'image historique du mouvement. La seconde contribution est une nouvelle règle de fusion pour combiner les estimations de plusieurs classificateurs de machine à vecteur du support. La solution permet d'obtenir un taux de précision de reconnaissance de 87%. Ceci est conforme aux exigences typiques dans la recherche biomédicale. La solution se compare favorablement à d'autres solutions de l’état de l’art. La troisième solution comprend un algorithme de suivi qui a le même comportement apparent et qui maintient la robustesse de l’algorithme de CONDENSATION. L'algorithme de suivi simplifie les opérations et réduit la charge de calcul de l'algorithme de CONDENSATION tandis qu’il maintient une précision de localisation semblable. La solution contribue à un nouveau dispositif pour attribuer les particules, à un certain intervalle de temps, aux particules du pas de temps précédent. Ce système réduit le nombre d'opérations complexes requis par l'algorithme de CONDENSATION classique. La solution contribue également à un procédé pour réduire le nombre moyen de particules générées au niveau de chaque pas de temps, tout en maintenant le même nombre maximal des particules comme dans l'algorithme de CONDENSATION classique. Finalement, la solution atteint une accélération 4,4 × à 12 × par rapport à l'algorithme de CONDENSATION classique, tout en conservant à peu près la même précision de suivi.----------ABSTRACT Rodents are widely used in biomedical experiments and research. This is due to the similar characteristics that they share with humans, to the low cost and ease of their maintenance and to the shortness of their life cycle, among other reasons. Research on rodents usually involves long periods of monitoring and tracking. When done manually, these tasks are very tedious and prone to error. They involve a technician annotating the location or the behavior of the rodent at each time step. Automatic tracking and monitoring solutions decrease the amount of manual labor and allow for longer monitoring periods. Several solutions have been provided for automatic animal monitoring that use mechanical sensors. Even though these solutions have been successful in their intended tasks, video cameras are still indispensable for later validation. For this reason, it is logical to use computer vision as a means to monitor and track rodents. In this thesis, we present computer vision solutions to three related problems concerned with rodent tracking and observation. The first solution consists of a method to track rodents in a typical biomedical environment with minimal constraints. The method consists of two phases. In the first phase, a sliding window technique based on three features is used to track the rodent and determine its coarse position in the frame. The second phase uses the edge map and a system of pulses to fit the boundaries of the tracking window to the contour of the rodent. This solution presents two contributions. The first contribution consists of a new feature, the Overlapped Histograms of Intensity (OHI). The second contribution consists of a new segmentation method that uses an online edge-based background subtraction to segment the edges of the rodent. The proposed solution tracking accuracy is stable when applied to rodents with different sizes. It is also shown that the solution achieves better results than a state of the art tracking algorithm. The second solution consists of a method to detect and identify three behaviors in rodents under typical biomedical conditions. The solution uses a rule-based method combined with a Multiple Classifier System (MCS) to detect and classify rearing, exploring and being static. The solution offers two contributions. The first contribution is a new method to detect rodent behavior using the Motion History Image (MHI). The second contribution is a new fusion rule to combine the estimations of several Support Vector Machine (SVM) Classifiers. The solution achieves an 87% recognition accuracy rate. This is compliant with typical requirements in biomedical research. The solution also compares favorably to other state of the art solutions. The third solution comprises a tracking algorithm that has the same apparent behavior and that maintains the robustness of the CONDENSATION algorithm. The tracking algorithm simplifies the operations and reduces the computational load of the CONDENSATION algorithm while conserving similar tracking accuracy. The solution contributes to a new scheme to assign the particles at a certain time step to the particles of the previous time step. This scheme reduces the number of complex operations required by the classic CONDENSATION algorithm. The solution also contributes a method to reduce the average number of particles generated at each time step, while maintaining the same maximum number of particles as in the classic CONDENSATION algorithm. Finally, the solution achieves 4.4× to 12× acceleration when compared to the classical CONDENSATION algorithm, while maintaining roughly the same tracking accuracy

    Novel Technique for Gait Analysis Using Two Waist Mounted Gyroscopes

    Get PDF
    Analysis of the human gait is used in many applications such as medicine, sports, and person identification. Several research studies focused on the use of MEMS inertial sensors for gait analysis and showed promising results. The miniaturization of these sensors and their wearability allowed the analysis of gait on a long term outside of the laboratory environment which can reveal more information about the person and introduced the use of gait analysis in new applications such as indoor localization. Step detection and step length estimation are two basic and important gait analysis tasks. In fact, step detection is a prerequisite for the exploration of all other gait parameters. Researchers have proposed many methods for step detection, and their experiments results showed high accuracies that exceeded 99% in some cases. All of these methods rely on experimental thresholds selected based on a limited number of subjects and walking conditions. Selecting and verifying an optimal threshold is a difficult task since it can vary according to a lot of factors such as user, footwear, and the walking surface material. Also, most of these methods do not distinguish walking from other activities; they can only recognize motion state from idle state. Methods that can be used to distinguish walking from other activities are mainly machine learning methods that need training and complex data labeling. On the other hand, step length estimation methods used in the literature either need constant calibration for each user, rely on impractical sensor placement, or both. In this thesis, we employ the human walking bipedal nature for gait analysis using two MEMS gyroscopes, one attached to each side of the lower waist. This setup allowed the step detection and discrimination from other non bipedal activities without the need for magnitude thresholds or training. We were also able to calculate the hip rotation angle in the sagittal plane which allowed us to estimate the step length. without needing for constants calibration. By mounting an accelerometer on the center of the back of the waist, we were able to develop a method to auto-calibrate the Weinberg method constant, which is one of the most accurate step length estimation methods, and increase its accuracy even more

    Machine learning approaches to video activity recognition: from computer vision to signal processing

    Get PDF
    244 p.La investigación presentada se centra en técnicas de clasificación para dos tareas diferentes, aunque relacionadas, de tal forma que la segunda puede ser considerada parte de la primera: el reconocimiento de acciones humanas en vídeos y el reconocimiento de lengua de signos.En la primera parte, la hipótesis de partida es que la transformación de las señales de un vídeo mediante el algoritmo de Patrones Espaciales Comunes (CSP por sus siglas en inglés, comúnmente utilizado en sistemas de Electroencefalografía) puede dar lugar a nuevas características que serán útiles para la posterior clasificación de los vídeos mediante clasificadores supervisados. Se han realizado diferentes experimentos en varias bases de datos, incluyendo una creada durante esta investigación desde el punto de vista de un robot humanoide, con la intención de implementar el sistema de reconocimiento desarrollado para mejorar la interacción humano-robot.En la segunda parte, las técnicas desarrolladas anteriormente se han aplicado al reconocimiento de lengua de signos, pero además de ello se propone un método basado en la descomposición de los signos para realizar el reconocimiento de los mismos, añadiendo la posibilidad de una mejor explicabilidad. El objetivo final es desarrollar un tutor de lengua de signos capaz de guiar a los usuarios en el proceso de aprendizaje, dándoles a conocer los errores que cometen y el motivo de dichos errores

    Hand gesture recognition system based in computer vision and machine learning: Applications on human-machine interaction

    Get PDF
    Tese de Doutoramento em Engenharia de Eletrónica e de ComputadoresSendo uma forma natural de interação homem-máquina, o reconhecimento de gestos implica uma forte componente de investigação em áreas como a visão por computador e a aprendizagem computacional. O reconhecimento gestual é uma área com aplicações muito diversas, fornecendo aos utilizadores uma forma mais natural e mais simples de comunicar com sistemas baseados em computador, sem a necessidade de utilização de dispositivos extras. Assim, o objectivo principal da investigação na área de reconhecimento de gestos aplicada à interacção homemmáquina é o da criação de sistemas, que possam identificar gestos específicos e usálos para transmitir informações ou para controlar dispositivos. Para isso as interfaces baseados em visão para o reconhecimento de gestos, necessitam de detectar a mão de forma rápida e robusta e de serem capazes de efetuar o reconhecimento de gestos em tempo real. Hoje em dia, os sistemas de reconhecimento de gestos baseados em visão são capazes de trabalhar com soluções específicas, construídos para resolver um determinado problema e configurados para trabalhar de uma forma particular. Este projeto de investigação estudou e implementou soluções, suficientemente genéricas, com o recurso a algoritmos de aprendizagem computacional, permitindo a sua aplicação num conjunto alargado de sistemas de interface homem-máquina, para reconhecimento de gestos em tempo real. A solução proposta, Gesture Learning Module Architecture (GeLMA), permite de forma simples definir um conjunto de comandos que pode ser baseado em gestos estáticos e dinâmicos e que pode ser facilmente integrado e configurado para ser utilizado numa série de aplicações. É um sistema de baixo custo e fácil de treinar e usar, e uma vez que é construído unicamente com bibliotecas de código. As experiências realizadas permitiram mostrar que o sistema atingiu uma precisão de 99,2% em termos de reconhecimento de gestos estáticos e uma precisão média de 93,7% em termos de reconhecimento de gestos dinâmicos. Para validar a solução proposta, foram implementados dois sistemas completos. O primeiro é um sistema em tempo real capaz de ajudar um árbitro a arbitrar um jogo de futebol robótico. A solução proposta combina um sistema de reconhecimento de gestos baseada em visão com a definição de uma linguagem formal, o CommLang Referee, à qual demos a designação de Referee Command Language Interface System (ReCLIS). O sistema identifica os comandos baseados num conjunto de gestos estáticos e dinâmicos executados pelo árbitro, sendo este posteriormente enviado para um interface de computador que transmite a respectiva informação para os robôs. O segundo é um sistema em tempo real capaz de interpretar um subconjunto da Linguagem Gestual Portuguesa. As experiências demonstraram que o sistema foi capaz de reconhecer as vogais em tempo real de forma fiável. Embora a solução implementada apenas tenha sido treinada para reconhecer as cinco vogais, o sistema é facilmente extensível para reconhecer o resto do alfabeto. As experiências também permitiram mostrar que a base dos sistemas de interação baseados em visão pode ser a mesma para todas as aplicações e, deste modo facilitar a sua implementação. A solução proposta tem ainda a vantagem de ser suficientemente genérica e uma base sólida para o desenvolvimento de sistemas baseados em reconhecimento gestual que podem ser facilmente integrados com qualquer aplicação de interface homem-máquina. A linguagem formal de definição da interface pode ser redefinida e o sistema pode ser facilmente configurado e treinado com um conjunto de gestos diferentes de forma a serem integrados na solução final.Hand gesture recognition is a natural way of human computer interaction and an area of very active research in computer vision and machine learning. This is an area with many different possible applications, giving users a simpler and more natural way to communicate with robots/systems interfaces, without the need for extra devices. So, the primary goal of gesture recognition research applied to Human-Computer Interaction (HCI) is to create systems, which can identify specific human gestures and use them to convey information or controlling devices. For that, vision-based hand gesture interfaces require fast and extremely robust hand detection, and gesture recognition in real time. Nowadays, vision-based gesture recognition systems are able to work with specific solutions, built to solve one particular problem and configured to work in a particular manner. This research project studied and implemented solutions, generic enough, with the help of machine learning algorithms, allowing its application in a wide range of human-computer interfaces, for real-time gesture recognition. The proposed solution, Gesture Learning Module Architecture (GeLMA), allows the definition in a simple way of a set of commands that can be based on static and dynamic gestures and that can be easily integrated and configured to be used in a number of applications. It is easy to train and use, and since it is mainly built with open source libraries it is also an inexpensive solution. Experiments carried out showed that the system achieved an accuracy of 99.2% in terms of hand posture recognition and an average accuracy of 93,72% in terms of dynamic gesture recognition. To validate the proposed framework, two systems were implemented. The first one is an online system able to help a robotic soccer game referee judge a game in real time. The proposed solution combines a vision-based hand gesture recognition system with a formal language definition, the Referee CommLang, into what is called the Referee Command Language Interface System (ReCLIS). The system builds a command based on system-interpreted static and dynamic referee gestures, and is able to send it to a computer interface which can then transmit the proper commands to the robots. The second one is an online system able to interpret the Portuguese Sign Language. The experiments showed that the system was able to reliably recognize the vowels in real-time. Although the implemented solution was only trained to recognize the five vowels, it is easily extended to recognize the rest of the alphabet. These experiments also showed that the core of vision-based interaction systems can be the same for all applications and thus facilitate its implementation. The proposed framework has the advantage of being generic enough and a solid foundation for the development of hand gesture recognition systems that can be integrated in any human-computer interface application. The interface language can be redefined and the system can be easily configured to train different sets of gestures that can be easily integrated into the final solution

    Shortest Route at Dynamic Location with Node Combination-Dijkstra Algorithm

    Get PDF
    Abstract— Online transportation has become a basic requirement of the general public in support of all activities to go to work, school or vacation to the sights. Public transportation services compete to provide the best service so that consumers feel comfortable using the services offered, so that all activities are noticed, one of them is the search for the shortest route in picking the buyer or delivering to the destination. Node Combination method can minimize memory usage and this methode is more optimal when compared to A* and Ant Colony in the shortest route search like Dijkstra algorithm, but can’t store the history node that has been passed. Therefore, using node combination algorithm is very good in searching the shortest distance is not the shortest route. This paper is structured to modify the node combination algorithm to solve the problem of finding the shortest route at the dynamic location obtained from the transport fleet by displaying the nodes that have the shortest distance and will be implemented in the geographic information system in the form of map to facilitate the use of the system. Keywords— Shortest Path, Algorithm Dijkstra, Node Combination, Dynamic Location (key words

    Taming Crowded Visual Scenes

    Get PDF
    Computer vision algorithms have played a pivotal role in commercial video surveillance systems for a number of years. However, a common weakness among these systems is their inability to handle crowded scenes. In this thesis, we have developed algorithms that overcome some of the challenges encountered in videos of crowded environments such as sporting events, religious festivals, parades, concerts, train stations, airports, and malls. We adopt a top-down approach by first performing a global-level analysis that locates dynamically distinct crowd regions within the video. This knowledge is then employed in the detection of abnormal behaviors and tracking of individual targets within crowds. In addition, the thesis explores the utility of contextual information necessary for persistent tracking and re-acquisition of objects in crowded scenes. For the global-level analysis, a framework based on Lagrangian Particle Dynamics is proposed to segment the scene into dynamically distinct crowd regions or groupings. For this purpose, the spatial extent of the video is treated as a phase space of a time-dependent dynamical system in which transport from one region of the phase space to another is controlled by the optical flow. Next, a grid of particles is advected forward in time through the phase space using a numerical integration to generate a flow map . The flow map relates the initial positions of particles to their final positions. The spatial gradients of the flow map are used to compute a Cauchy Green Deformation tensor that quantifies the amount by which the neighboring particles diverge over the length of the integration. The maximum eigenvalue of the tensor is used to construct a forward Finite Time Lyapunov Exponent (FTLE) field that reveals the Attracting Lagrangian Coherent Structures (LCS). The same process is repeated by advecting the particles backward in time to obtain a backward FTLE field that reveals the repelling LCS. The attracting and repelling LCS are the time dependent invariant manifolds of the phase space and correspond to the boundaries between dynamically distinct crowd flows. The forward and backward FTLE fields are combined to obtain one scalar field that is segmented using a watershed segmentation algorithm to obtain the labeling of distinct crowd-flow segments. Next, abnormal behaviors within the crowd are localized by detecting changes in the number of crowd-flow segments over time. Next, the global-level knowledge of the scene generated by the crowd-flow segmentation is used as an auxiliary source of information for tracking an individual target within a crowd. This is achieved by developing a scene structure-based force model. This force model captures the notion that an individual, when moving in a particular scene, is subjected to global and local forces that are functions of the layout of that scene and the locomotive behavior of other individuals in his or her vicinity. The key ingredients of the force model are three floor fields that are inspired by research in the field of evacuation dynamics; namely, Static Floor Field (SFF), Dynamic Floor Field (DFF), and Boundary Floor Field (BFF). These fields determine the probability of moving from one location to the next by converting the long-range forces into local forces. The SFF specifies regions of the scene that are attractive in nature, such as an exit location. The DFF, which is based on the idea of active walker models, corresponds to the virtual traces created by the movements of nearby individuals in the scene. The BFF specifies influences exhibited by the barriers within the scene, such as walls and no-entry areas. By combining influence from all three fields with the available appearance information, we are able to track individuals in high-density crowds. The results are reported on real-world sequences of marathons and railway stations that contain thousands of people. A comparative analysis with respect to an appearance-based mean shift tracker is also conducted by generating the ground truth. The result of this analysis demonstrates the benefit of using floor fields in crowded scenes. The occurrence of occlusion is very frequent in crowded scenes due to a high number of interacting objects. To overcome this challenge, we propose an algorithm that has been developed to augment a generic tracking algorithm to perform persistent tracking in crowded environments. The algorithm exploits the contextual knowledge, which is divided into two categories consisting of motion context (MC) and appearance context (AC). The MC is a collection of trajectories that are representative of the motion of the occluded or unobserved object. These trajectories belong to other moving individuals in a given environment. The MC is constructed using a clustering scheme based on the Lyapunov Characteristic Exponent (LCE), which measures the mean exponential rate of convergence or divergence of the nearby trajectories in a given state space. Next, the MC is used to predict the location of the occluded or unobserved object in a regression framework. It is important to note that the LCE is used for measuring divergence between a pair of particles while the FTLE field is obtained by computing the LCE for a grid of particles. The appearance context (AC) of a target object consists of its own appearance history and appearance information of the other objects that are occluded. The intent is to make the appearance descriptor of the target object more discriminative with respect to other unobserved objects, thereby reducing the possible confusion between the unobserved objects upon re-acquisition. This is achieved by learning the distribution of the intra-class variation of each occluded object using all of its previous observations. In addition, a distribution of inter-class variation for each target-unobservable object pair is constructed. Finally, the re-acquisition decision is made using both the MC and the AC
    • …
    corecore