234 research outputs found

    Automatic Recognition of Arabic Poetry Meter from Speech Signal using Long Short-term Memory and Support Vector Machine

    Get PDF
    The recognition of the poetry meter in spoken lines is a natural language processing application that aims to identify a stressed and unstressed syllabic pattern in a line of a poem. Stateof-the-art studies include few works on the automatic recognition of Arud meters, all of which are text-based models, and none is voice based. Poetry meter recognition is not easy for an ordinary reader, it is very difficult for the listener and it is usually performed manually by experts. This paper proposes a model to detect the poetry meter from a single spoken line (“Bayt”) of an Arabic poem. Data of 230 samples collected from 10 poems of Arabic poetry, including three meters read by two speakers, are used in this work. The work adopts the extraction of linear prediction cepstrum coefficient and Mel frequency cepstral coefficient (MFCC) features, as a time series input to the proposed long short-term memory (LSTM) classifier, in addition to a global feature set that is computed using some statistics of the features across all of the frames to feed the support vector machine (SVM) classifier. The results show that the SVM model achieves the highest accuracy in the speakerdependent approach. It improves results by 3%, as compared to the state-of-the-art studies, whereas for the speaker-independent approach, the MFCC feature using LSTM exceeds the other proposed models

    Scanning Photo-Induced Impedance Microscopy - Resolution studies and polymer characterization

    Get PDF
    Scanning Photo-Induced Impedance Microscopy (SPIM) is an impedance imaging technique that is based on photocurrent measurements at field-effect structures. The material under investigation is deposited onto a semiconductor-insulator substrate. A thin metal film or an electrolyte solution with an immersed electrode serves as the gate contact. A modulated light beam focused into the space charge region of the semiconductor produces a photocurrent, which is directly related to the local impedance of the material. The absolute impedance of a polymer film can be measured by calibrating photocurrents using a known impedance in series with the sample. Depending on the wavelength of light used, charge carriers are not only generated in the focus but also throughout the bulk of the semiconductor. This can have adverse effects on the lateral resolution. Two-photon experiments were carried out to confine charge carrier generation to the spacecharge layer. The lateral resolution of SPIM is also limited by the lateral diffusion of charge carriers in the semiconductor. This problem can be solved by using thin silicon layers as semiconductor substrates. A resolution of better than 1 mu m was achieved using silicon on sapphire (SOS) substrates with a I l.Lm thick silicon layer

    A Remark On Proper Left H* — Algebras

    Get PDF
    W. Ambrose gave the theory of proper H* -algebras and M. Smiley in (2) gave an example of a left H* -algebra which is not a two-sided H* -algebra. Then he modified some of the arguments of Ambrose which yield the structure of proper right H*-algebras. In fact he proved that a proper right H*-algebra is merely a proper H*-algebra in which the norm has been changed to a certain equivalent norm in each of the simple components. In this short paper, we define proper left H*-algebras and give two lemmas for these classes. Then we prove the main result that every proper left H*-algebra is a proper H*-algebra. Thus, in this paper, we prove that the following are equivalent: (i) Proper left H*-algebras. (ii) Proper right H*-algebras. (iii) Proper H*-algebras

    Enhancing physical layer security of cognitive radio transceiver via chaotic OFDM

    Get PDF
    Due to the enormous potential of improving the spectral utilization by using Cognitive Radio (CR), designing adaptive access system and addressing its physical layer security are the most important and challenging issues in CR networks. Since CR transceivers need to transmit over multiple non-contiguous frequency holes, multi-carrier based system is one of the best candidates for CR's physical layer design. In this paper, we propose a combined chaotic scrambling (CS) and chaotic shift keying (CSK) scheme in Orthogonal Frequency Division Multiplexing (OFDM) based CR to enhance its physical layer security. By employing chaos based third order Chebyshev map which allows optimum bit error rate (BER) performance of CSK modulation, the proposed combined scheme outperforms the traditional OFDM system in overlay scenario with Rayleigh fading channel. Importantly, with two layers of encryption based on chaotic scrambling and CSK modulation, large key size can be generated to resist any brute-force attack, leading to a significantly improved level of security

    Enhancing secrecy rate in cognitive radio networks via multilevel Stackelberg game

    Get PDF
    In this letter, physical layer (PHY) security is investigated for both primary and secondary transmissions of a cognitive radio network (CRN) that is in danger of malicious attempt by an eavesdropper (ED). In our proposed system, the secondary transmitter (ST) is acted as a trusted relay (TR) for primary transmission and the PHY security is facilitated by the cooperation between the primary transmitter (PT) and the ST using the multilevel Stackelberg game. In particular, we formulate and solve the optimization problem of maximizing secrecy rates in different phases of primary and secondary transmissions. Finally, numerical examples are provided to demonstrate that the spectrum leasing based on trading secondary access for cooperation is a promising framework for enhancing secrecy rate in CRNs

    Enhancing secrecy rate in cognitive radio networks via stackelberg game

    Get PDF
    In this paper, a game theory based cooperation scheme is investigated to enhance the physical layer security in both primary and secondary transmissions of a cognitive radio network (CRN). In CRNs, the primary network may decide to lease its own spectrum for a fraction of time to the secondary nodes in exchange of appropriate remuneration. We consider the secondary transmitter node as a trusted relay for primary transmission to forward primary messages in a decode-and-forward (DF) fashion and, at the same time, allows part of its available power to be used to transmit artificial noise (i.e., jamming signal) to enhance primary and secondary secrecy rates. In order to allocate power between message and jamming signals, we formulate and solve the optimization problem for maximizing the secrecy rates under malicious attempts from EDs. We then analyse the cooperation between the primary and secondary nodes from a game-theoretic perspective where we model their interaction as a Stackelberg game with a theoretically proved and computed Stackelberg equilibrium. We show that the spectrum leasing based on trading secondary access for cooperation by means of relay and jammer is a promising framework for enhancing security in CRNs

    Efficient Kinect Sensor-based Kurdish Sign Language Recognition Using Echo System Network

    Get PDF
    Sign language assists in building communication and bridging gaps in understanding. Automatic sign language recognition (ASLR) is a field that has recently been studied for various sign languages. However, Kurdish sign language (KuSL) is relatively new and therefore researches and designed datasets on it are limited. This paper has proposed a model to translate KuSL into text and has designed a dataset using Kinect V2 sensor. The computation complexity of feature extraction and classification steps, which are serious problems for ASLR, has been investigated in this paper. The paper proposed a feature engineering approach on the skeleton position alone to provide a better representation of the features and avoid the use of all of the image information. In addition, the paper proposed model makes use of recurrent neural networks (RNNs)-based models. Training RNNs is inherently difficult, and consequently, motivates to investigate alternatives. Besides the trainable long short-term memory (LSTM), this study has proposed the untrained low complexity echo system network (ESN) classifier. The accuracy of both LSTM and ESN indicates they can outperform those in state-of-the-art studies. In addition, ESN which has not been proposed thus far for ASLT exhibits comparable accuracy to the LSTM with a significantly lower training time

    Automatic Speech Emotion Recognition- Feature Space Dimensionality and Classification Challenges

    Get PDF
    In the last decade, research in Speech Emotion Recognition (SER) has become a major endeavour in Human Computer Interaction (HCI), and speech processing. Accurate SER is essential for many applications, like assessing customer satisfaction with quality of services, and detecting/assessing emotional state of children in care. The large number of studies published on SER reflects the demand for its use. The main concern of this thesis is the investigation of SER from a pattern recognition and machine learning points of view. In particular, we aim to identify appropriate mathematical models of SER and examine the process of designing automatic emotion recognition schemes. There are major challenges to automatic SER including ambiguity about the list/definition of emotions, the lack of agreement on a manageable set of uncorrelated speech-based emotion relevant features, and the difficulty of collected emotion-related datasets under natural circumstances. We shall initiate our work by dealing with the identification of appropriate sets of emotion related features/attributes extractible from speech signals as considered from psychological and computational points of views. We shall investigate the use of pattern-recognition approaches to remove redundancies and achieve compactification of digital representation of the extracted data with minimal loss of information. The thesis will include the design of new or complement existing SER schemes and conduct large sets of experiments to empirically test their performances on different databases, identify advantages, and shortcomings of using speech alone for emotion recognition. Existing SER studies seem to deal with the ambiguity/dis-agreement on a “limited” number of emotion-related features by expanding the list from the same speech signal source/sites and apply various feature selection procedures as a mean of reducing redundancies. Attempts are made to discover more relevant features to emotion from speech. One of our investigations focuses on proposing a newly sets of features for SER, extracted from Linear Predictive (LP)-residual speech. We shall demonstrate the usefulness of the proposed relatively small set of features by testing the performance of an SER scheme that is based on fusing our set of features with the existing set of thousands of features using common machine learning schemes of Support Vector Machine (SVM) and Artificial Neural Network (ANN). The challenge of growing dimensionality of SER feature space and its impact on increased model complexity is another major focus of our research project. By studying the pros and cons of the commonly used feature selection approaches, we argued in favour of meta-feature selection and developed various methods in this direction, not only to reduce dimension, but also to adapt and de-correlate emotional feature spaces for improved SER model recognition accuracy. We used rincipal Component Analysis (PCA) and proposed Data Independent PCA (DIPCA) by training on independent emotional and non-emotional datasets. The DIPCA projections, especially when extracted from speech data coloured with different emotions or from Neutral speech data, had comparable capability to the PCA in terms of SER performance. Another adopted approach in this thesis for dimension reduction is the Random Projection (RP) matrices, independent of training data. We have shown that some versions of RP with SVM classifier can offer an adaptation space for Speaker Independent SER that avoid over-fitting and hence improves recognition accuracy. Using PCA trained on a set of data, while testing on emotional data features, has significant implication for machine learning in general. The thesis other major contribution focuses on the classification aspects of SER. We investigate the drawbacks of the well-known SVM classifier when applied to a preprocessed data by PCA and RP. We shall demonstrate the advantages of using the Linear Discriminant Classifier (LDC) instead especially for PCA de-correlated metafeatures. We initiated a variety of LDC-based ensembles classification, to test performance of scheme using a new form of bagging different subsets of metafeature subsets extracted by PCA with encouraging results. The experiments conducted were applied on two benchmark datasets (Emo-Berlin and FAU-Aibo), and an in-house dataset in the Kurdish language. Recognition accuracy achieved by are significantly higher than the state of art results on all datasets. The results, however, revealed a difficult challenge in the form of persisting wide gap in accuracy over different datasets, which cannot be explained entirely by the differences between the natures of the datasets. We conducted various pilot studies that were based on various visualizations of the confusion matrices for the “difficult” databases to build multi-level SER schemes. These studies provide initial evidences to the presence of more than one “emotion” in the same portion of speech. A possible solution may be through presenting recognition accuracy in a score-based measurement like the spider chart. Such an approach may also reveal the presence of Doddington zoo phenomena in SER

    Emotion recognition from speech: tools and challenges

    Get PDF
    Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more “natural” than the acted datasets where the subjects attempt to suppress all but one emotion. © (2015) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only
    corecore