161 research outputs found
Human robot interaction in a crowded environment
Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3].
Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person.
Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]
Recommended from our members
Evaluation and analysis of hybrid intelligent pattern recognition techniques for speaker identification
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem
of identifying a speaker from its voice regardless of the content (i.e.
text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system.
A novel approach towards speaker identification is developed using
wavelet analysis, and multiple neural networks including Probabilistic
Neural Network (PNN), General Regressive Neural Network (GRNN)and Radial Basis Function-Neural Network (RBF NN) with the AND
voting scheme. This approach is tested on GRID and VidTIMIT cor-pora and comprehensive test results have been validated with state-
of-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coe±cients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA).
Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and Vid-TIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it di±cult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear.
Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identi¯cation and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that
the proposed scheme is one of the best candidates for the fusion of
face and voice due to its low computational time and high recognition accuracy
Computer Vision Tools for Rodent Monitoring
RÉSUMÉ
Les rongeurs sont régulièrement utilisés dans les expériences et la recherche biomédicale. Ceci est dû entre autres aux caractéristiques qu’ils partagent avec les humains, au faible coût et la facilité de leur entretien, et à la brièveté de leur cycle de vie.
La recherche sur les rongeurs implique généralement de longues périodes de surveillance et de suivi. Quand cela est fait manuellement, ces tâches sont très fastidieuses et possiblement erronées. Ces tâches impliquent un technicien pour noter la position ou le comportement du rongeur en chaque instant. Des solutions de surveillance et de suivi automatique ont été mises au point pour diminuer la quantité de travail manuel et permettre de plus longues périodes de surveillance. Plusieurs des solutions proposées pour la surveillance automatique des animaux utilisent des capteurs mécaniques. Même si ces solutions ont été couronnées de succès dans leurs tâches prévues, les caméras vidéo sont toujours indispensables pour la validation ultérieure. Pour cette raison, il est logique d'utiliser la vision artificielle comme un moyen de surveiller et de suivre les rongeurs.
Dans cette thèse, nous présentons des solutions de vision artificielle à trois problèmes connexes concernant le suivi et l’observation de rongeurs. La première solution consiste en un procédé pour suivre les rongeurs dans un environnement biomédical typique avec des contraintes minimales. La méthode est faite de deux phases. Dans la première phase, une technique de fenêtre glissante fondée sur trois caractéristiques est utilisée pour suivre le rongeur et déterminer sa position approximative dans le cadre. La seconde phase utilise la carte d’arrêts et un système d'impulsions pour ajuster les limites de la fenêtre de suivi aux contours du rongeur. Cette solution présente deux contributions. La première contribution consiste en une nouvelle caractéristique, les histogrammes d’intensité qui se chevauchent. La seconde contribution consiste en un nouveau procédé de segmentation qui utilise une soustraction d’arrière-plan en ligne basée sur les arrêts pour segmenter les bords du rongeur. La précision de suivi de la solution proposée est stable lorsqu’elle est appliquée à des rongeurs de tailles différentes. Il est également montré que la solution permet d'obtenir de meilleurs résultats qu’une méthode de l'état d’art.
La deuxième solution consiste en un procédé pour détecter et identifier trois comportements chez les rongeurs dans des conditions biomédicales typiques. La solution utilise une méthode basée sur des règles combinée avec un système de classificateur multiple pour détecter et classifier le redressement, l’exploration et l’état statique chez un rongeur. La solution offre deux contributions. La première contribution consiste en une nouvelle méthode pour détecter le comportement des rongeurs en utilisant l'image historique du mouvement. La seconde contribution est une nouvelle règle de fusion pour combiner les estimations de plusieurs classificateurs de machine à vecteur du support. La solution permet d'obtenir un taux de précision de reconnaissance de 87%. Ceci est conforme aux exigences typiques dans la recherche biomédicale. La solution se compare favorablement à d'autres solutions de l’état de l’art.
La troisième solution comprend un algorithme de suivi qui a le même comportement apparent et qui maintient la robustesse de l’algorithme de CONDENSATION. L'algorithme de suivi simplifie les opérations et réduit la charge de calcul de l'algorithme de CONDENSATION tandis qu’il maintient une précision de localisation semblable. La solution contribue à un nouveau dispositif pour attribuer les particules, à un certain intervalle de temps, aux particules du pas de temps précédent. Ce système réduit le nombre d'opérations complexes requis par l'algorithme de CONDENSATION classique. La solution contribue également à un procédé pour réduire le nombre moyen de particules générées au niveau de chaque pas de temps, tout en maintenant le même nombre maximal des particules comme dans l'algorithme de CONDENSATION classique. Finalement, la solution atteint une accélération 4,4 × à 12 × par rapport à l'algorithme de CONDENSATION classique, tout en conservant à peu près la même précision de suivi.----------ABSTRACT
Rodents are widely used in biomedical experiments and research. This is due to the similar characteristics that they share with humans, to the low cost and ease of their maintenance and to the shortness of their life cycle, among other reasons.
Research on rodents usually involves long periods of monitoring and tracking. When done manually, these tasks are very tedious and prone to error. They involve a technician annotating the location or the behavior of the rodent at each time step. Automatic tracking and monitoring solutions decrease the amount of manual labor and allow for longer monitoring periods. Several solutions have been provided for automatic animal monitoring that use mechanical sensors. Even though these solutions have been successful in their intended tasks, video cameras are still indispensable for later validation. For this reason, it is logical to use computer vision as a means to monitor and track rodents.
In this thesis, we present computer vision solutions to three related problems concerned with rodent tracking and observation. The first solution consists of a method to track rodents in a typical biomedical environment with minimal constraints. The method consists of two phases. In the first phase, a sliding window technique based on three features is used to track the rodent and determine its coarse position in the frame. The second phase uses the edge map and a system of pulses to fit the boundaries of the tracking window to the contour of the rodent. This solution presents two contributions. The first contribution consists of a new feature, the Overlapped Histograms of Intensity (OHI). The second contribution consists of a new segmentation method that uses an online edge-based background subtraction to segment the edges of the rodent. The proposed solution tracking accuracy is stable when applied to rodents with different sizes. It is also shown that the solution achieves better results than a state of the art tracking algorithm.
The second solution consists of a method to detect and identify three behaviors in rodents under typical biomedical conditions. The solution uses a rule-based method combined with a Multiple Classifier System (MCS) to detect and classify rearing, exploring and being static. The solution offers two contributions. The first contribution is a new method to detect rodent behavior using the Motion History Image (MHI). The second contribution is a new fusion rule to combine the estimations of several Support Vector Machine (SVM) Classifiers. The solution achieves an 87% recognition accuracy rate. This is compliant with typical requirements in biomedical research. The solution also compares favorably to other state of the art solutions.
The third solution comprises a tracking algorithm that has the same apparent behavior and that maintains the robustness of the CONDENSATION algorithm. The tracking algorithm simplifies the operations and reduces the computational load of the CONDENSATION algorithm while conserving similar tracking accuracy. The solution contributes to a new scheme to assign the particles at a certain time step to the particles of the previous time step. This scheme reduces the number of complex operations required by the classic CONDENSATION algorithm. The solution also contributes a method to reduce the average number of particles generated at each time step, while maintaining the same maximum number of particles as in the classic CONDENSATION algorithm. Finally, the solution achieves 4.4× to 12× acceleration when compared to the classical CONDENSATION algorithm, while maintaining roughly the same tracking accuracy
Novel Technique for Gait Analysis Using Two Waist Mounted Gyroscopes
Analysis of the human gait is used in many applications such as medicine, sports, and person identification. Several research studies focused on the use of MEMS inertial sensors for gait analysis and showed promising results. The miniaturization of these sensors and their wearability allowed the analysis of gait on a long term outside of the laboratory environment which can reveal more information about the person and introduced the use of gait analysis in new applications such as indoor localization.
Step detection and step length estimation are two basic and important gait analysis tasks. In fact, step detection is a prerequisite for the exploration of all other gait parameters. Researchers have proposed many methods for step detection, and their experiments results showed high accuracies that exceeded 99% in some cases. All of these methods rely on experimental thresholds selected based on a limited number of subjects and walking conditions. Selecting and verifying an optimal threshold is a difficult task since it can vary according to a lot of factors such as user, footwear, and the walking surface material. Also, most of these methods do not distinguish walking from other activities; they can only recognize motion state from idle state. Methods that can be used to distinguish walking from other activities are mainly machine learning methods that need training and complex data labeling. On the other hand, step length estimation methods used in the literature either need constant calibration for each user, rely on impractical sensor placement, or both.
In this thesis, we employ the human walking bipedal nature for gait analysis using two MEMS gyroscopes, one attached to each side of the lower waist. This setup allowed the step detection and discrimination from other non bipedal activities without the need for magnitude thresholds or training. We were also able to calculate the hip rotation angle in the sagittal plane which allowed us to estimate the step length. without needing for constants calibration. By mounting an accelerometer on the center of the back of the waist, we were able to develop a method to auto-calibrate the Weinberg method constant, which is one of the most accurate step length estimation methods, and increase its accuracy even more
Machine learning approaches to video activity recognition: from computer vision to signal processing
244 p.La investigación presentada se centra en técnicas de clasificación para dos tareas diferentes, aunque relacionadas, de tal forma que la segunda puede ser considerada parte de la primera: el reconocimiento de acciones humanas en vÃdeos y el reconocimiento de lengua de signos.En la primera parte, la hipótesis de partida es que la transformación de las señales de un vÃdeo mediante el algoritmo de Patrones Espaciales Comunes (CSP por sus siglas en inglés, comúnmente utilizado en sistemas de ElectroencefalografÃa) puede dar lugar a nuevas caracterÃsticas que serán útiles para la posterior clasificación de los vÃdeos mediante clasificadores supervisados. Se han realizado diferentes experimentos en varias bases de datos, incluyendo una creada durante esta investigación desde el punto de vista de un robot humanoide, con la intención de implementar el sistema de reconocimiento desarrollado para mejorar la interacción humano-robot.En la segunda parte, las técnicas desarrolladas anteriormente se han aplicado al reconocimiento de lengua de signos, pero además de ello se propone un método basado en la descomposición de los signos para realizar el reconocimiento de los mismos, añadiendo la posibilidad de una mejor explicabilidad. El objetivo final es desarrollar un tutor de lengua de signos capaz de guiar a los usuarios en el proceso de aprendizaje, dándoles a conocer los errores que cometen y el motivo de dichos errores
Hand gesture recognition system based in computer vision and machine learning: Applications on human-machine interaction
Tese de Doutoramento em Engenharia de Eletrónica e de ComputadoresSendo uma forma natural de interação homem-máquina, o reconhecimento de gestos
implica uma forte componente de investigação em áreas como a visão por
computador e a aprendizagem computacional. O reconhecimento gestual é uma área
com aplicações muito diversas, fornecendo aos utilizadores uma forma mais natural e
mais simples de comunicar com sistemas baseados em computador, sem a
necessidade de utilização de dispositivos extras. Assim, o objectivo principal da
investigação na área de reconhecimento de gestos aplicada à interacção homemmáquina
é o da criação de sistemas, que possam identificar gestos especÃficos e usálos
para transmitir informações ou para controlar dispositivos. Para isso as interfaces
baseados em visão para o reconhecimento de gestos, necessitam de detectar a mão de
forma rápida e robusta e de serem capazes de efetuar o reconhecimento de gestos em
tempo real. Hoje em dia, os sistemas de reconhecimento de gestos baseados em visão
são capazes de trabalhar com soluções especÃficas, construÃdos para resolver um
determinado problema e configurados para trabalhar de uma forma particular. Este
projeto de investigação estudou e implementou soluções, suficientemente genéricas,
com o recurso a algoritmos de aprendizagem computacional, permitindo a sua
aplicação num conjunto alargado de sistemas de interface homem-máquina, para
reconhecimento de gestos em tempo real. A solução proposta, Gesture Learning
Module Architecture (GeLMA), permite de forma simples definir um conjunto de
comandos que pode ser baseado em gestos estáticos e dinâmicos e que pode ser
facilmente integrado e configurado para ser utilizado numa série de aplicações. É um
sistema de baixo custo e fácil de treinar e usar, e uma vez que é construÃdo
unicamente com bibliotecas de código. As experiências realizadas permitiram
mostrar que o sistema atingiu uma precisão de 99,2% em termos de reconhecimento
de gestos estáticos e uma precisão média de 93,7% em termos de reconhecimento de
gestos dinâmicos. Para validar a solução proposta, foram implementados dois
sistemas completos. O primeiro é um sistema em tempo real capaz de ajudar um
árbitro a arbitrar um jogo de futebol robótico. A solução proposta combina um
sistema de reconhecimento de gestos baseada em visão com a definição de uma linguagem formal, o CommLang Referee, à qual demos a designação de Referee
Command Language Interface System (ReCLIS). O sistema identifica os comandos
baseados num conjunto de gestos estáticos e dinâmicos executados pelo árbitro,
sendo este posteriormente enviado para um interface de computador que transmite a
respectiva informação para os robôs. O segundo é um sistema em tempo real capaz
de interpretar um subconjunto da Linguagem Gestual Portuguesa. As experiências
demonstraram que o sistema foi capaz de reconhecer as vogais em tempo real de
forma fiável. Embora a solução implementada apenas tenha sido treinada para
reconhecer as cinco vogais, o sistema é facilmente extensÃvel para reconhecer o resto
do alfabeto. As experiências também permitiram mostrar que a base dos sistemas de
interação baseados em visão pode ser a mesma para todas as aplicações e, deste
modo facilitar a sua implementação. A solução proposta tem ainda a vantagem de ser
suficientemente genérica e uma base sólida para o desenvolvimento de sistemas
baseados em reconhecimento gestual que podem ser facilmente integrados com
qualquer aplicação de interface homem-máquina. A linguagem formal de definição
da interface pode ser redefinida e o sistema pode ser facilmente configurado e
treinado com um conjunto de gestos diferentes de forma a serem integrados na
solução final.Hand gesture recognition is a natural way of human computer interaction and an area
of very active research in computer vision and machine learning. This is an area with
many different possible applications, giving users a simpler and more natural way to
communicate with robots/systems interfaces, without the need for extra devices. So,
the primary goal of gesture recognition research applied to Human-Computer
Interaction (HCI) is to create systems, which can identify specific human gestures
and use them to convey information or controlling devices. For that, vision-based
hand gesture interfaces require fast and extremely robust hand detection, and gesture
recognition in real time.
Nowadays, vision-based gesture recognition systems are able to work with specific
solutions, built to solve one particular problem and configured to work in a particular
manner. This research project studied and implemented solutions, generic enough,
with the help of machine learning algorithms, allowing its application in a wide
range of human-computer interfaces, for real-time gesture recognition.
The proposed solution, Gesture Learning Module Architecture (GeLMA), allows the
definition in a simple way of a set of commands that can be based on static and
dynamic gestures and that can be easily integrated and configured to be used in a
number of applications. It is easy to train and use, and since it is mainly built with
open source libraries it is also an inexpensive solution. Experiments carried out
showed that the system achieved an accuracy of 99.2% in terms of hand posture
recognition and an average accuracy of 93,72% in terms of dynamic gesture
recognition. To validate the proposed framework, two systems were implemented.
The first one is an online system able to help a robotic soccer game referee judge a
game in real time. The proposed solution combines a vision-based hand gesture
recognition system with a formal language definition, the Referee CommLang, into
what is called the Referee Command Language Interface System (ReCLIS). The
system builds a command based on system-interpreted static and dynamic referee
gestures, and is able to send it to a computer interface which can then transmit the
proper commands to the robots. The second one is an online system able to interpret
the Portuguese Sign Language. The experiments showed that the system was able to reliably recognize the vowels in real-time. Although the implemented solution was
only trained to recognize the five vowels, it is easily extended to recognize the rest of
the alphabet. These experiments also showed that the core of vision-based interaction
systems can be the same for all applications and thus facilitate its implementation.
The proposed framework has the advantage of being generic enough and a solid
foundation for the development of hand gesture recognition systems that can be
integrated in any human-computer interface application. The interface language can
be redefined and the system can be easily configured to train different sets of
gestures that can be easily integrated into the final solution
Shortest Route at Dynamic Location with Node Combination-Dijkstra Algorithm
Abstract— Online transportation has become a basic
requirement of the general public in support of all activities to go
to work, school or vacation to the sights. Public transportation
services compete to provide the best service so that consumers
feel comfortable using the services offered, so that all activities
are noticed, one of them is the search for the shortest route in
picking the buyer or delivering to the destination. Node
Combination method can minimize memory usage and this
methode is more optimal when compared to A* and Ant Colony
in the shortest route search like Dijkstra algorithm, but can’t
store the history node that has been passed. Therefore, using
node combination algorithm is very good in searching the
shortest distance is not the shortest route. This paper is
structured to modify the node combination algorithm to solve the
problem of finding the shortest route at the dynamic location
obtained from the transport fleet by displaying the nodes that
have the shortest distance and will be implemented in the
geographic information system in the form of map to facilitate
the use of the system.
Keywords— Shortest Path, Algorithm Dijkstra, Node
Combination, Dynamic Location (key words
Taming Crowded Visual Scenes
Computer vision algorithms have played a pivotal role in commercial video surveillance systems for a number of years. However, a common weakness among these systems is their inability to handle crowded scenes. In this thesis, we have developed algorithms that overcome some of the challenges encountered in videos of crowded environments such as sporting events, religious festivals, parades, concerts, train stations, airports, and malls. We adopt a top-down approach by first performing a global-level analysis that locates dynamically distinct crowd regions within the video. This knowledge is then employed in the detection of abnormal behaviors and tracking of individual targets within crowds. In addition, the thesis explores the utility of contextual information necessary for persistent tracking and re-acquisition of objects in crowded scenes. For the global-level analysis, a framework based on Lagrangian Particle Dynamics is proposed to segment the scene into dynamically distinct crowd regions or groupings. For this purpose, the spatial extent of the video is treated as a phase space of a time-dependent dynamical system in which transport from one region of the phase space to another is controlled by the optical flow. Next, a grid of particles is advected forward in time through the phase space using a numerical integration to generate a flow map . The flow map relates the initial positions of particles to their final positions. The spatial gradients of the flow map are used to compute a Cauchy Green Deformation tensor that quantifies the amount by which the neighboring particles diverge over the length of the integration. The maximum eigenvalue of the tensor is used to construct a forward Finite Time Lyapunov Exponent (FTLE) field that reveals the Attracting Lagrangian Coherent Structures (LCS). The same process is repeated by advecting the particles backward in time to obtain a backward FTLE field that reveals the repelling LCS. The attracting and repelling LCS are the time dependent invariant manifolds of the phase space and correspond to the boundaries between dynamically distinct crowd flows. The forward and backward FTLE fields are combined to obtain one scalar field that is segmented using a watershed segmentation algorithm to obtain the labeling of distinct crowd-flow segments. Next, abnormal behaviors within the crowd are localized by detecting changes in the number of crowd-flow segments over time. Next, the global-level knowledge of the scene generated by the crowd-flow segmentation is used as an auxiliary source of information for tracking an individual target within a crowd. This is achieved by developing a scene structure-based force model. This force model captures the notion that an individual, when moving in a particular scene, is subjected to global and local forces that are functions of the layout of that scene and the locomotive behavior of other individuals in his or her vicinity. The key ingredients of the force model are three floor fields that are inspired by research in the field of evacuation dynamics; namely, Static Floor Field (SFF), Dynamic Floor Field (DFF), and Boundary Floor Field (BFF). These fields determine the probability of moving from one location to the next by converting the long-range forces into local forces. The SFF specifies regions of the scene that are attractive in nature, such as an exit location. The DFF, which is based on the idea of active walker models, corresponds to the virtual traces created by the movements of nearby individuals in the scene. The BFF specifies influences exhibited by the barriers within the scene, such as walls and no-entry areas. By combining influence from all three fields with the available appearance information, we are able to track individuals in high-density crowds. The results are reported on real-world sequences of marathons and railway stations that contain thousands of people. A comparative analysis with respect to an appearance-based mean shift tracker is also conducted by generating the ground truth. The result of this analysis demonstrates the benefit of using floor fields in crowded scenes. The occurrence of occlusion is very frequent in crowded scenes due to a high number of interacting objects. To overcome this challenge, we propose an algorithm that has been developed to augment a generic tracking algorithm to perform persistent tracking in crowded environments. The algorithm exploits the contextual knowledge, which is divided into two categories consisting of motion context (MC) and appearance context (AC). The MC is a collection of trajectories that are representative of the motion of the occluded or unobserved object. These trajectories belong to other moving individuals in a given environment. The MC is constructed using a clustering scheme based on the Lyapunov Characteristic Exponent (LCE), which measures the mean exponential rate of convergence or divergence of the nearby trajectories in a given state space. Next, the MC is used to predict the location of the occluded or unobserved object in a regression framework. It is important to note that the LCE is used for measuring divergence between a pair of particles while the FTLE field is obtained by computing the LCE for a grid of particles. The appearance context (AC) of a target object consists of its own appearance history and appearance information of the other objects that are occluded. The intent is to make the appearance descriptor of the target object more discriminative with respect to other unobserved objects, thereby reducing the possible confusion between the unobserved objects upon re-acquisition. This is achieved by learning the distribution of the intra-class variation of each occluded object using all of its previous observations. In addition, a distribution of inter-class variation for each target-unobservable object pair is constructed. Finally, the re-acquisition decision is made using both the MC and the AC
- …