Search CORE

181 research outputs found

Hand Gesture Recognition System

Author: Sahoo Lagnajeet
Publication venue
Publication date: 01/01/2015
Field of study

Hand Gesture Recognition is a well-researched topic in the community of Machine Learning, Computer Graphics and Image Processing. The system which are based on Recognition technology follow mathematically rich and complicated algorithms whose main aim is to teach a computer different gestures. Because there are very large sets of gestures, the number of methodologies to identify the set of gestures is also large. In this thesis, I have concentrated on the gestures are based on hands. The thesis is divided into two sections namely: Static mode and Dynamic. The Static mode concentrates on gestures based on still images and Dynamic mode concentrates on gestures based on image sequence. As every hand gesture recognition system, the recognition paths has been divided into basically four parts for Static mode: Segmentation, Feature Extraction, Feature Selection and Classification. In the static mode the algorithm used are the Graph Cut algorithm, Bacterial foraging optimization algorithm, Support vector machine, binary tree color quantization algorithm, block-based discrete cosine transform. The Graph Cut algorithm uses the min-cut of a graph to separate the non-hand pixels from the hand pixels. The min-cut of the graph is found out by the max-flow algorithm. The binary tree color quantization algorithm is used to cluster the pixels into required number of clusters. The BFO algorithm is used to find the optimum value of parameter that are either required to be maximized or minimized. The BFO is an evolutionary algorithm which is a reflection of the swamping behavior of the E. Coli bacteria. The algorithm is a non-linear form of optimization and the convergence of the algorithm is faster than the other evolutionary algorithms. For Dynamic mode the path has been divided into four parts: Segmentation, Tracking, Feature Extraction, Vector Quantization and Classification. The Dynamic mode uses 150 frames of image data to trace the path of the hand and finds the most likely gesture. The hand isolation is done by use of Gaussian Mixture model. To make the system as fast as possible the tracking of hand was more preferred to be fast than accurate. So some amount of accuracy was sacrifice for the sake of performance. As the sequence of image is involved the Hidden Markov model was preferred method for the classification. The training of the HMM was done by the method described by Baum- Welch which is the maximization of the expected value of the parameters of the HMM. The training was followed by the testing where an image sequence of 150 frames was passed to the system. The Viterbi algorithm was used for the testing purposes. The Viterbi algorithm finds the most like sequence of states for which that particular sequence of observation is taken out

ethesis@nitr

Structure-Constrained Basis Pursuit for Compressively Sensing Speech

Author: Dominguez Miguel
Publication venue: RIT Scholar Works
Publication date: 01/05/2016
Field of study

Compressed Sensing (CS) exploits the sparsity of many signals to enable sampling below the Nyquist rate. If the original signal is sufficiently sparse, the Basis Pursuit (BP) algorithm will perfectly reconstruct the original signal. Unfortunately many signals that intuitively appear sparse do not meet the threshold for sufficient sparsity . These signals require so many CS samples for accurate reconstruction that the advantages of CS disappear. This is because Basis Pursuit/Basis Pursuit Denoising only models sparsity. We developed a Structure-Constrained Basis Pursuit that models the structure of somewhat sparse signals as upper and lower bound constraints on the Basis Pursuit Denoising solution. We applied it to speech, which seems sparse but does not compress well with CS, and gained improved quality over Basis Pursuit Denoising. When a single parameter (i.e. the phone) is encoded, Normalized Mean Squared Error (NMSE) decreases by between 16.2% and 1.00% when sampling with CS between 1/10 and 1/2 the Nyquist rate, respectively. When bounds are coded as a sum of Gaussians, NMSE decreases between 28.5% and 21.6% in the same range. SCBP can be applied to any somewhat sparse signal with a predictable structure to enable improved reconstruction quality with the same number of samples

RIT Scholar Works

A Survey of Applications and Human Motion Recognition with Microsoft Kinect

Author: Lun Roanna
Zhao Wenbing
Publication venue: EngagedScholarship@CSU
Publication date: 09/07/2015
Field of study

Microsoft Kinect, a low-cost motion sensing device, enables users to interact with computers or game consoles naturally through gestures and spoken commands without any other peripheral equipment. As such, it has commanded intense interests in research and development on the Kinect technology. In this paper, we present, a comprehensive survey on Kinect applications, and the latest research and development on motion recognition using data captured by the Kinect sensor. On the applications front, we review the applications of the Kinect technology in a variety of areas, including healthcare, education and performing arts, robotics, sign language recognition, retail services, workplace safety training, as well as 3D reconstructions. On the technology front, we provide an overview of the main features of both versions of the Kinect sensor together with the depth sensing technologies used, and review literatures on human motion recognition techniques used in Kinect applications. We provide a classification of motion recognition techniques to highlight the different approaches used in human motion recognition. Furthermore, we compile a list of publicly available Kinect datasets. These datasets are valuable resources for researchers to investigate better methods for human motion recognition and lower-level computer vision tasks such as segmentation, object detection and human pose estimation

Crossref

Cleveland-Marshall College of Law

Prioritizing Content of Interest in Multimedia Data Compression

Author: Shao Chong
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2019
Field of study

Image and video compression techniques make data transmission and storage in digital multimedia systems more efficient and feasible for the system's limited storage and bandwidth. Many generic image and video compression techniques such as JPEG and H.264/AVC have been standardized and are now widely adopted. Despite their great success, we observe that these standard compression techniques are not the best solution for data compression in special types of multimedia systems such as microscopy videos and low-power wireless broadcast systems. In these application-specific systems where the content of interest in the multimedia data is known and well-defined, we should re-think the design of a data compression pipeline. We hypothesize that by identifying and prioritizing multimedia data's content of interest, new compression methods can be invented that are far more effective than standard techniques. In this dissertation, a set of new data compression methods based on the idea of prioritizing the content of interest has been proposed for three different kinds of multimedia systems. I will show that the key to designing efficient compression techniques in these three cases is to prioritize the content of interest in the data. The definition of the content of interest of multimedia data depends on the application. First, I show that for microscopy videos, the content of interest is defined as the spatial regions in the video frame with pixels that don't only contain noise. Keeping data in those regions with high quality and throwing out other information yields to a novel microscopy video compression technique. Second, I show that for a Bluetooth low energy beacon based system, practical multimedia data storage and transmission is possible by prioritizing content of interest. I designed custom image compression techniques that preserve edges in a binary image, or foreground regions of a color image of indoor or outdoor objects. Last, I present a new indoor Bluetooth low energy beacon based augmented reality system that integrates a 3D moving object compression method that prioritizes the content of interest.Doctor of Philosoph

Carolina Digital Repository

Development in Signer-Independent Sign Language Recognition and the Ideas of Solving Some Key Problems

Author: Feng JIANG
Publication venue: 'Science China Press., Co. Ltd.'
Publication date: 01/01/2007
Field of study

Crossref

Verification of emotion recognition from facial expression

Author: Sun Yanjia
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2016
Field of study

Analysis of facial expressions is an active topic of research with many potential applications, since the human face plays a significant role in conveying a person’s mental state. Due to the practical values it brings, scientists and researchers from different fields such as psychology, finance, marketing, and engineering have developed significant interest in this area. Hence, there are more of a need than ever for the intelligent tool to be employed in the emotional Human-Computer Interface (HCI) by analyzing facial expressions as a better alternative to the traditional devices such as the keyboard and mouse. The face is a window of human mind. The examination of mental states explores the human’s internal cognitive states. A facial emotion recognition system has a potential to read people’s minds and interpret the emotional thoughts to the world. High rates of recognition accuracy of facial emotions by intelligent machines have been achieved in existing efforts based on the benchmarked databases containing posed facial emotions. However, they are not qualified to interpret the human’s true feelings even if they are recognized. The difference between posed facial emotions and spontaneous ones has been identified and studied in the literature. One of the most interesting challenges in the field of HCI is to make computers more human-like for more intelligent user interfaces. In this dissertation, a Regional Hidden Markov Model (RHMM) based facial emotion recognition system is proposed. In this system, the facial features are extracted from three face regions: the eyebrows, eyes and mouth. These regions convey relevant information regarding facial emotions. As a marked departure from prior work, RHMMs for the states of these three distinct face regions instead of the entire face for each facial emotion type are trained. In the recognition step, regional features are extracted from test video sequences. These features are processed according to the corresponding RHMMs to learn the probabilities for the states of the three face regions. The combination of states is utilized to identify the estimated emotion type of a given frame in a video sequence. An experimental framework is established to validate the results of such a system. RHMM as a new classifier emphasizes the states of three facial regions, rather than the entire face. The dissertation proposes the method of forming observation sequences that represent the changes of states of facial regions for training RHMMs and recognition. The proposed method is applicable to the various forms of video clips, including real-time videos. The proposed system shows the human-like capability to infer people’s mental states from moderate level of facial spontaneous emotions conveyed in the daily life in contrast to posed facial emotions. Moreover, the extended research work associated with the proposed facial emotion recognition system is forwarded into the domain of finance and biomedical engineering, respectively. CEO’s fear facial emotion has been found as the strong and positive predictor to forecast the firm stock price in the market. In addition, the experiment results also have demonstrated the similarity of the spontaneous facial reactions to stimuli and inner affective states translated by brain activity. The results revealed the effectiveness of facial features combined with the features extracted from the signals of brain activity for multiple signals correlation analysis and affective state classification

Digital Commons @ New Jersey Institute of Technology (NJIT)

Arabic Sign Language Recognition

Author
Publication venue
Publication date
Field of study

KFUPM ePrints