14 research outputs found

    UWB Based Static Gesture Classification

    Full text link
    Our paper presents a robust framework for UWB-based static gesture recognition, leveraging proprietary UWB radar sensor technology. Extensive data collection efforts were undertaken to compile datasets containing five commonly used gestures. Our approach involves a comprehensive data pre-processing pipeline that encompasses outlier handling, aspect ratio-preserving resizing, and false-color image transformation. Both CNN and MobileNet models were trained on the processed images. Remarkably, our best-performing model achieved an accuracy of 96.78%. Additionally, we developed a user-friendly GUI framework to assess the model's system resource usage and processing times, which revealed low memory utilization and real-time task completion in under one second. This research marks a significant step towards enhancing static gesture recognition using UWB technology, promising practical applications in various domains

    A colour-based building recognition using support vector machine

    Get PDF
    Many applications apply the concept of image recognition to help human in recognising objects simply by just using digital images. A content-based building recognition system could solve the problem of using just text as search input. In this paper, a building recognition system using colour histogram is proposed for recognising buildings in Ipoh city, Perak, Malaysia. The colour features of each building image will be extracted. A feature vector combining the mean, standard deviation, variance, skewness and kurtosis of gray level will be formed to represent each building image. These feature values are later used to train the system using supervised learning algorithm, which is Support Vector Machine (SVM). Lastly, the accuracy of the recognition system is evaluated using 10-fold cross validation. The evaluation results show that the building recognition system is well trained and able to effectively recognise the building images with low misclassification rate

    Blur Classification Using Segmentation Based Fractal Texture Analysis

    Get PDF
    The objective of vision based gesture recognition is to design a system, which can understand the human actions and convey the acquired information with the help of captured images. An image restoration approach is extremely required whenever image gets blur during acquisition process since blurred images can severely degrade the performance of such systems. Image restoration recovers a true image from a degraded version. It is referred as blind restoration if blur information is unidentified. Blur identification is essential before application of any blind restoration algorithm. This paper presents a blur identification approach which categories a hand gesture image into one of the sharp, motion, defocus and combined blurred categories. Segmentation based fractal texture analysis extraction algorithm is utilized for featuring the neural network based classification system. The simulation results demonstrate the preciseness of proposed method

    Unsupervised Embedded Gesture Recognition Based on Multi-objective NAS and Capacitive Sensing

    Get PDF
    Gesture recognition has become pervasive in many interactive environments. Recognition based on Neural Networks often reaches higher recognition rates than competing methods at a cost of a higher computational complexity that becomes very challenging in low resource computing platforms such as microcontrollers. New optimization methodologies, such as quantization and Neural Architecture Search are steps forward for the development of embeddable networks. In addition, as neural networks are commonly used in a supervised fashion, labeling tends to include bias in the model. Unsupervised methods allow for performing tasks as classification without depending on labeling. In this work, we present an embedded and unsupervised gesture recognition system, composed of a neural network autoencoder and K-Means clustering algorithm and optimized through a state-of-the-art multi- objective NAS. The present method allows for a method to develop, deploy and perform unsupervised classification in low resource embedded devices

    Active Perception by Interaction with Other Agents in a Predictive Coding Framework: Application to Internet of Things Environment

    Get PDF
    Predicting the state of an agent\u27s partially-observable environment is a problem of interest in many domains. Typically in the real world, the environment consists of multiple agents, not necessarily working towards a common goal. Though the goal and sensory observation for each agent is unique, one agent might have acquired some knowledge that may benefit the other. In essence, the knowledge base regarding the environment is distributed among the agents. An agent can sample this distributed knowledge base by communicating with other agents. Since an agent is not storing the entire knowledge base, its model can be small and its inference can be efficient and fault-tolerant. However, the agent needs to learn -- when, with whom and what -- to communicate (in general interact) under different situations.This dissertation presents an agent model that actively and selectively communicates with other agents to predict the state of its environment efficiently. Communication is a challenge when the internal models of other agents is unknown and unobservable. The proposed agent learns communication policies as mappings from its belief state to when, with whom and what to communicate. The policies are learned using predictive coding in an online manner, without any reinforcement. The proposed agent model is evaluated on widely-studied applications, such as human activity recognition from multimodal, multisource and heterogeneous sensor data, and transferring knowledge across sensor networks. In the applications, either each sensor or each sensor network is assumed to be monitored by an agent. The recognition accuracy on benchmark datasets is comparable to the state-of-the-art, even though our model has significantly fewer parameters and infers the state in a localized manner. The learned policy reduces number of communications. The agent is tolerant to communication failures and can recognize the reliability of each agent from its communication messages. To the best of our knowledge, this is the first work on learning communication policies by an agent for predicting the state of its environment

    Interface gestuelle pour la commande d'un capteur 3D tenu en main

    Get PDF
    Ce mémoire porte sur la conception d'une interface utilisateur basée sur la reconnaissance de gestes pour la commande d'un capteur 3D tenu en main. L'interface proposée permet à l'opérateur d'un tel équipement de commander le logiciel à distance alors qu'il se déplace autour d'un objet à numériser sans devoir revenir auprès du poste de travail. À cet effet, un prototype fonctionnel est conçu au moyen d'une caméra Azure Kinect pointée vers l'utilisateur. Un corpus de gestes de la main est défini et reconnu au moyen d'algorithmes d'apprentissage automatique, et des métaphores d'interactions sont proposées pour la transformation rigide 3D d'un objet virtuel à l'écran. Ces composantes sont implantées dans un prototype fonctionnel compatible avec le logiciel VXelements de Creaform.This thesis presents the development of a gesture-based user interface for the operation of handheld 3D scanning devices. This user interface allows the user to remotely engage with the software while walking around the target object. To this end, we develop a prototype using an Azure Kinect sensor pointed at the user. We propose a set of hand gestures and a machine learning-based approach to classification for triggering momentary actions in the software. Additionally, we define interaction metaphors for applying 3D rigid transformations to a virtual object on screen. We implement these components into a proof-of-concept application compatible with Creaform VXelements

    Multimodaalsel emotsioonide tuvastamisel põhineva inimese-roboti suhtluse arendamine

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneÜks afektiivse arvutiteaduse peamistest huviobjektidest on mitmemodaalne emotsioonituvastus, mis leiab rakendust peamiselt inimese-arvuti interaktsioonis. Emotsiooni äratundmiseks uuritakse nendes süsteemides nii inimese näoilmeid kui kakõnet. Käesolevas töös uuritakse inimese emotsioonide ja nende avaldumise visuaalseid ja akustilisi tunnuseid, et töötada välja automaatne multimodaalne emotsioonituvastussüsteem. Kõnest arvutatakse mel-sageduse kepstri kordajad, helisignaali erinevate komponentide energiad ja prosoodilised näitajad. Näoilmeteanalüüsimiseks kasutatakse kahte erinevat strateegiat. Esiteks arvutatakse inimesenäo tähtsamate punktide vahelised erinevad geomeetrilised suhted. Teiseks võetakse emotsionaalse sisuga video kokku vähendatud hulgaks põhikaadriteks, misantakse sisendiks konvolutsioonilisele tehisnärvivõrgule emotsioonide visuaalsekseristamiseks. Kolme klassifitseerija väljunditest (1 akustiline, 2 visuaalset) koostatakse uus kogum tunnuseid, mida kasutatakse õppimiseks süsteemi viimasesetapis. Loodud süsteemi katsetati SAVEE, Poola ja Serbia emotsionaalse kõneandmebaaside, eNTERFACE’05 ja RML andmebaaside peal. Saadud tulemusednäitavad, et võrreldes olemasolevatega võimaldab käesoleva töö raames loodudsüsteem suuremat täpsust emotsioonide äratundmisel. Lisaks anname käesolevastöös ülevaate kirjanduses väljapakutud süsteemidest, millel on võimekus tunda äraemotsiooniga seotud ̆zeste. Selle ülevaate eesmärgiks on hõlbustada uute uurimissuundade leidmist, mis aitaksid lisada töö raames loodud süsteemile ̆zestipõhiseemotsioonituvastuse võimekuse, et veelgi enam tõsta süsteemi emotsioonide äratundmise täpsust.Automatic multimodal emotion recognition is a fundamental subject of interest in affective computing. Its main applications are in human-computer interaction. The systems developed for the foregoing purpose consider combinations of different modalities, based on vocal and visual cues. This thesis takes the foregoing modalities into account, in order to develop an automatic multimodal emotion recognition system. More specifically, it takes advantage of the information extracted from speech and face signals. From speech signals, Mel-frequency cepstral coefficients, filter-bank energies and prosodic features are extracted. Moreover, two different strategies are considered for analyzing the facial data. First, facial landmarks' geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames. Then they are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to the key-frames summarizing the videos. Afterward, the output confidence values of all the classifiers from both of the modalities are used to define a new feature space. Lastly, the latter values are learned for the final emotion label prediction, in a late fusion. The experiments are conducted on the SAVEE, Polish, Serbian, eNTERFACE'05 and RML datasets. The results show significant performance improvements by the proposed system in comparison to the existing alternatives, defining the current state-of-the-art on all the datasets. Additionally, we provide a review of emotional body gesture recognition systems proposed in the literature. The aim of the foregoing part is to help figure out possible future research directions for enhancing the performance of the proposed system. More clearly, we imply that incorporating data representing gestures, which constitute another major component of the visual modality, can result in a more efficient framework

    The State of the Art of Spatial Interfaces for 3D Visualization

    Get PDF
    International audienceWe survey the state of the art of spatial interfaces for 3D visualization. Interaction techniques are crucial to data visualization processes and the visualization research community has been calling for more research on interaction for years. Yet, research papers focusing on interaction techniques, in particular for 3D visualization purposes, are not always published in visualization venues, sometimes making it challenging to synthesize the latest interaction and visualization results. We therefore introduce a taxonomy of interaction technique for 3D visualization. The taxonomy is organized along two axes: the primary source of input on the one hand and the visualization task they support on the other hand. Surveying the state of the art allows us to highlight specific challenges and missed opportunities for research in 3D visualization. In particular, we call for additional research in: (1) controlling 3D visualization widgets to help scientists better understand their data, (2) 3D interaction techniques for dissemination, which are under-explored yet show great promise for helping museum and science centers in their mission to share recent knowledge, and (3) developing new measures that move beyond traditional time and errors metrics for evaluating visualizations that include spatial interaction
    corecore