24 research outputs found

    An auditory classifier employing a wavelet neural network implemented in a digital design

    Get PDF
    This thesis addresses the problem of classifying audio as either voice or music. The goal was to solve this problem by means of digital logic circuit, capable of performing the classification in real time. Since digital audio is essentially a discrete non-periodic timeseries, it was necessary to extract features from the audio which are suitable for classification. The discrete wavelet transform combined with a feature extraction method was found to produce such features. The task of classifying these features was found to be best performed by an artificial neural network. Collectively known as a wavelet neural network, the digital logic design implementation of this architecture was effective in correctly identifying the test data sets. The wavelet neural network was first implemented as a software model, to develop the network architecture and parameters, and to determine ideal results. The unconstrained software simulation was capable of correctly classifying test data sets with greater than 90% accuracy. This model was not feasible as a digital logic design however, as the size of the implementation would have been prohibitive. The size of the resulting hardware model was constrained by reducing the widths of the data paths and storage registers. The hardware implementation of the wavelet processor consisted of a novel pipelined design with a novel data-flow control structure. The neural network training was performed entirely in software by way of a novel training algorithm, and the resulting weights were made to be available to be uploaded to the hardware model. The digital design of the wavelet neural network was modeled in VHDL and was synthesized with Synplicity Synplify, using Actel ProASICPlus APA600 synthesized library cells with a target clock frequency of 11.025 KHz, to match the sampling rate of the digital audio. The results of the synthesis indicated that the design could operate at 15.6 MHz, and required 96,265 logic cells. The resulting constrained wavelet neural network processor was capable of correctly classifying test data sets with greater than 70% accuracy. Additional modeling showed that with a reasonable increase in hardware size, greater than 86% accuracy is attainable. This thesis focused on classifying audio as either voice or music, and future research could readily extend this work to the problem of speaker recognition and multimedia indexing

    XVIII. Magyar Szåmítógépes Nyelvészeti Konferencia

    Get PDF

    Learning Sensory Representations with Minimal Supervision

    Get PDF

    Towards On-line Domain-Independent Big Data Learning: Novel Theories and Applications

    Get PDF
    Feature extraction is an extremely important pre-processing step to pattern recognition, and machine learning problems. This thesis highlights how one can best extract features from the data in an exhaustively online and purely adaptive manner. The solution to this problem is given for both labeled and unlabeled datasets, by presenting a number of novel on-line learning approaches. Specifically, the differential equation method for solving the generalized eigenvalue problem is used to derive a number of novel machine learning and feature extraction algorithms. The incremental eigen-solution method is used to derive a novel incremental extension of linear discriminant analysis (LDA). Further the proposed incremental version is combined with extreme learning machine (ELM) in which the ELM is used as a preprocessor before learning. In this first key contribution, the dynamic random expansion characteristic of ELM is combined with the proposed incremental LDA technique, and shown to offer a significant improvement in maximizing the discrimination between points in two different classes, while minimizing the distance within each class, in comparison with other standard state-of-the-art incremental and batch techniques. In the second contribution, the differential equation method for solving the generalized eigenvalue problem is used to derive a novel state-of-the-art purely incremental version of slow feature analysis (SLA) algorithm, termed the generalized eigenvalue based slow feature analysis (GENEIGSFA) technique. Further the time series expansion of echo state network (ESN) and radial basis functions (EBF) are used as a pre-processor before learning. In addition, the higher order derivatives are used as a smoothing constraint in the output signal. Finally, an online extension of the generalized eigenvalue problem, derived from James Stone’s criterion, is tested, evaluated and compared with the standard batch version of the slow feature analysis technique, to demonstrate its comparative effectiveness. In the third contribution, light-weight extensions of the statistical technique known as canonical correlation analysis (CCA) for both twinned and multiple data streams, are derived by using the same existing method of solving the generalized eigenvalue problem. Further the proposed method is enhanced by maximizing the covariance between data streams while simultaneously maximizing the rate of change of variances within each data stream. A recurrent set of connections used by ESN are used as a pre-processor between the inputs and the canonical projections in order to capture shared temporal information in two or more data streams. A solution to the problem of identifying a low dimensional manifold on a high dimensional dataspace is then presented in an incremental and adaptive manner. Finally, an online locally optimized extension of Laplacian Eigenmaps is derived termed the generalized incremental laplacian eigenmaps technique (GENILE). Apart from exploiting the benefit of the incremental nature of the proposed manifold based dimensionality reduction technique, most of the time the projections produced by this method are shown to produce a better classification accuracy in comparison with standard batch versions of these techniques - on both artificial and real datasets

    Exploiting behavioral biometrics for user security enhancements

    Get PDF
    As online business has been very popular in the past decade, the tasks of providing user authentication and verification have become more important than before to protect user sensitive information from malicious hands. The most common approach to user authentication and verification is the use of password. However, the dilemma users facing in traditional passwords becomes more and more evident: users tend to choose easy-to-remember passwords, which are often weak passwords that are easy to crack. Meanwhile, behavioral biometrics have promising potentials in meeting both security and usability demands, since they authenticate users by who you are , instead of what you have . In this dissertation, we first develop two such user verification applications based on behavioral biometrics: the first one is via mouse movements, and the second via tapping behaviors on smartphones; then we focus on modeling user web browsing behaviors by Fitts\u27 Law.;Specifically, we develop a user verification system by exploiting the uniqueness of people\u27s mouse movements. The key feature of our system lies in using much more fine-grained (point-by-point) angle-based metrics of mouse movements for user verification. These new metrics are relatively unique from person to person and independent of the computing platform. We conduct a series of experiments to show that the proposed system can verify a user in an accurate and timely manner, and induced system overhead is minor. Similar to mouse movements, the tapping behaviors of smartphone users on touchscreen also vary from person to person. We propose a non-intrusive user verification mechanism to substantiate whether an authenticating user is the true owner of the smartphone or an impostor who happens to know the passcode. The effectiveness of the proposed approach is validated through real experiments. to further understand user pointing behaviors, we attempt to stress-test Fitts\u27 law in the wild , namely, under natural web browsing environments, instead of restricted laboratory settings in previous studies. Our analysis shows that, while the averaged pointing times follow Fitts\u27 law very well, there is considerable deviations from Fitts\u27 law. We observe that, in natural browsing, a fast movement has a different error model from the other two movements. Therefore, a complete profiling on user pointing performance should be done in more details, for example, constructing different error models for slow and fast movements. as future works, we plan to exploit multiple-finger tappings for smartphone user verification, and evaluate user privacy issues in Amazon wish list

    Alzheimer’s Dementia Recognition Through Spontaneous Speech

    Get PDF

    Advances and Applications of Dezert-Smarandache Theory (DSmT) for Information Fusion (Collected Works), Vol. 4

    Get PDF
    The fourth volume on Advances and Applications of Dezert-Smarandache Theory (DSmT) for information fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics. The contributions (see List of Articles published in this book, at the end of the volume) have been published or presented after disseminating the third volume (2009, http://fs.unm.edu/DSmT-book3.pdf) in international conferences, seminars, workshops and journals. First Part of this book presents the theoretical advancement of DSmT, dealing with Belief functions, conditioning and deconditioning, Analytic Hierarchy Process, Decision Making, Multi-Criteria, evidence theory, combination rule, evidence distance, conflicting belief, sources of evidences with different importance and reliabilities, importance of sources, pignistic probability transformation, Qualitative reasoning under uncertainty, Imprecise belief structures, 2-Tuple linguistic label, Electre Tri Method, hierarchical proportional redistribution, basic belief assignment, subjective probability measure, Smarandache codification, neutrosophic logic, Evidence theory, outranking methods, Dempster-Shafer Theory, Bayes fusion rule, frequentist probability, mean square error, controlling factor, optimal assignment solution, data association, Transferable Belief Model, and others. More applications of DSmT have emerged in the past years since the apparition of the third book of DSmT 2009. Subsequently, the second part of this volume is about applications of DSmT in correlation with Electronic Support Measures, belief function, sensor networks, Ground Moving Target and Multiple target tracking, Vehicle-Born Improvised Explosive Device, Belief Interacting Multiple Model filter, seismic and acoustic sensor, Support Vector Machines, Alarm classification, ability of human visual system, Uncertainty Representation and Reasoning Evaluation Framework, Threat Assessment, Handwritten Signature Verification, Automatic Aircraft Recognition, Dynamic Data-Driven Application System, adjustment of secure communication trust analysis, and so on. Finally, the third part presents a List of References related with DSmT published or presented along the years since its inception in 2004, chronologically ordered

    Machine learning for automatic analysis of affective behaviour

    Get PDF
    The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data`` era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task. The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ``ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances. The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise. Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.Open Acces

    A survey of the application of soft computing to investment and financial trading

    Get PDF
    corecore