215 research outputs found

    Towards On-line Domain-Independent Big Data Learning: Novel Theories and Applications

    Get PDF
    Feature extraction is an extremely important pre-processing step to pattern recognition, and machine learning problems. This thesis highlights how one can best extract features from the data in an exhaustively online and purely adaptive manner. The solution to this problem is given for both labeled and unlabeled datasets, by presenting a number of novel on-line learning approaches. Specifically, the differential equation method for solving the generalized eigenvalue problem is used to derive a number of novel machine learning and feature extraction algorithms. The incremental eigen-solution method is used to derive a novel incremental extension of linear discriminant analysis (LDA). Further the proposed incremental version is combined with extreme learning machine (ELM) in which the ELM is used as a preprocessor before learning. In this first key contribution, the dynamic random expansion characteristic of ELM is combined with the proposed incremental LDA technique, and shown to offer a significant improvement in maximizing the discrimination between points in two different classes, while minimizing the distance within each class, in comparison with other standard state-of-the-art incremental and batch techniques. In the second contribution, the differential equation method for solving the generalized eigenvalue problem is used to derive a novel state-of-the-art purely incremental version of slow feature analysis (SLA) algorithm, termed the generalized eigenvalue based slow feature analysis (GENEIGSFA) technique. Further the time series expansion of echo state network (ESN) and radial basis functions (EBF) are used as a pre-processor before learning. In addition, the higher order derivatives are used as a smoothing constraint in the output signal. Finally, an online extension of the generalized eigenvalue problem, derived from James Stone’s criterion, is tested, evaluated and compared with the standard batch version of the slow feature analysis technique, to demonstrate its comparative effectiveness. In the third contribution, light-weight extensions of the statistical technique known as canonical correlation analysis (CCA) for both twinned and multiple data streams, are derived by using the same existing method of solving the generalized eigenvalue problem. Further the proposed method is enhanced by maximizing the covariance between data streams while simultaneously maximizing the rate of change of variances within each data stream. A recurrent set of connections used by ESN are used as a pre-processor between the inputs and the canonical projections in order to capture shared temporal information in two or more data streams. A solution to the problem of identifying a low dimensional manifold on a high dimensional dataspace is then presented in an incremental and adaptive manner. Finally, an online locally optimized extension of Laplacian Eigenmaps is derived termed the generalized incremental laplacian eigenmaps technique (GENILE). Apart from exploiting the benefit of the incremental nature of the proposed manifold based dimensionality reduction technique, most of the time the projections produced by this method are shown to produce a better classification accuracy in comparison with standard batch versions of these techniques - on both artificial and real datasets

    Industrial Robotics

    Get PDF
    This book covers a wide range of topics relating to advanced industrial robotics, sensors and automation technologies. Although being highly technical and complex in nature, the papers presented in this book represent some of the latest cutting edge technologies and advancements in industrial robotics technology. This book covers topics such as networking, properties of manipulators, forward and inverse robot arm kinematics, motion path-planning, machine vision and many other practical topics too numerous to list here. The authors and editor of this book wish to inspire people, especially young ones, to get involved with robotic and mechatronic engineering technology and to develop new and exciting practical applications, perhaps using the ideas and concepts presented herein

    Machine Learning for Understanding Focal Epilepsy

    Get PDF
    The study of neural dysfunctions requires strong prior knowledge on brain physiology combined with expertise on data analysis, signal processing, and machine learning. One of the unsolved issues regarding epilepsy consists in the localization of pathological brain areas causing seizures. Nowadays the analysis of neural activity conducted with this goal still relies on visual inspection by clinicians and is therefore subjected to human error, possibly leading to negative surgical outcome. In absence of any evidence from standard clinical tests, medical experts resort to invasive electrophysiological recordings, such as stereoelectroencephalography to assess the pathological areas. This data is high dimensional, it could suffer from spatial and temporal correlation, as well as be affected by high variability across the population. These aspects make the automatization attempt extremely challenging. In this context, this thesis tackles the problem of characterizing drug resistant focal epilepsy. This work proposes methods to analyze the intracranial electrophysiological recordings during the interictal state, leveraging on the presurgical assessment of the pathological areas. The first contribution of the thesis consists in the design of a support tool for the identification of epileptic zones. This method relies on the multi-decomposition of the signal and similarity metrics. We built personalized models which share common usage of features across patients. The second main contribution aims at understanding if there are particular frequency bands related to the epileptic areas and if it is worthy to focus on shorter periods of time. Here we leverage on the post-surgical outcome deriving from the Engel classification. The last contribution focuses on the characterization of short patterns of activity at specific frequencies. We argue that this effort could be helpful in the clinical routine and at the same time provides useful insight for the understanding of focal epilepsy

    Innovative signal processing and data mining techniques for aquatic animal health

    Get PDF
    Problem: Aquatic animal health data is often stored in unstructured formats like text and medical images, making large-scale analysis challenging due to the complexity of processing such data. Objectives: In this thesis, we aim to develop text mining, signal processing, image processing, and machine learning techniques to analyse unstructured data effectively. These methods will enable the aggregation of information across large datasets of unstructured aquatic animal health data. Methodology: • For text analysis, we have designed an ontology-based framework for extracting and storing information from aquatic animal post-mortem reports, with a focus on gross pathology reports. While we initially applied this framework to marine mammal stranding reports, it can be adapted for various species and report types. • For medical image analysis, we have created methods for identifying and analysing lesions in whole-slide images (WSIs) of Atlantic salmon gills. Our approach includes a novel feature extraction technique utilising the empirical wavelet transform, and we enhance context-awareness by employing a variational autoencoder to identify regions of interest within histology images. Achievements: The research resulted in the development of an ontology-based framework for systematic text extraction and storage from marine mammal gross pathology reports. We showcased our framework’s performance by using it to analyse bottlenose dolphin attacks on harbour porpoises. Additionally, we created innovative methods for lesion detection in Atlantic salmon gill whole-slide images, incorporating advanced techniques such as the empirical wavelet transform, deep learning, and a variational autoencoder for context-awareness. These achievements collectively advance the analysis of unstructured aquatic animal health data, enabling more comprehensive and efficient data processing. At the time of writing, the project is the only one to apply data-driven approaches to marine mammal post-mortem reports and gill WSIs

    Content-based Information Retrieval via Nearest Neighbor Search

    Get PDF
    Content-based information retrieval (CBIR) has attracted significant interest in the past few years. When given a search query, the search engine will compare the query with all the stored information in the database through nearest neighbor search. Finally, the system will return the most similar items. We contribute to the CBIR research the following: firstly, Distance Metric Learning (DML) is studied to improve retrieval accuracy of nearest neighbor search. Additionally, Hash Function Learning (HFL) is considered to accelerate the retrieval process. On one hand, a new local metric learning framework is proposed - Reduced-Rank Local Metric Learning (R2LML). By considering a conical combination of Mahalanobis metrics, the proposed method is able to better capture information like data\u27s similarity and location. A regularization to suppress the noise and avoid over-fitting is also incorporated into the formulation. Based on the different methods to infer the weights for the local metric, we considered two frameworks: Transductive Reduced-Rank Local Metric Learning (T-R2LML), which utilizes transductive learning, while Efficient Reduced-Rank Local Metric Learning (E-R2LML)employs a simpler and faster approximated method. Besides, we study the convergence property of the proposed block coordinate descent algorithms for both our frameworks. The extensive experiments show the superiority of our approaches. On the other hand, *Supervised Hash Learning (*SHL), which could be used in supervised, semi-supervised and unsupervised learning scenarios, was proposed in the dissertation. By considering several codewords which could be learned from the data, the proposed method naturally derives to several Support Vector Machine (SVM) problems. After providing an efficient training algorithm, we also study the theoretical generalization bound of the new hashing framework. In the final experiments, *SHL outperforms many other popular hash function learning methods. Additionally, in order to cope with large data sets, we also conducted experiments running on big data using a parallel computing software package, namely LIBSKYLARK

    The Stylometric Processing of Sensory Open Source Data

    Get PDF
    This research project’s end goal is on the Lone Wolf Terrorist. The project uses an exploratory approach to the self-radicalisation problem by creating a stylistic fingerprint of a person's personality, or self, from subtle characteristics hidden in a person's writing style. It separates the identity of one person from another based on their writing style. It also separates the writings of suicide attackers from ‘normal' bloggers by critical slowing down; a dynamical property used to develop early warning signs of tipping points. It identifies changes in a person's moods, or shifts from one state to another, that might indicate a tipping point for self-radicalisation. Research into authorship identity using personality is a relatively new area in the field of neurolinguistics. There are very few methods that model how an individual's cognitive functions present themselves in writing. Here, we develop a novel algorithm, RPAS, which draws on cognitive functions such as aging, sensory processing, abstract or concrete thinking through referential activity emotional experiences, and a person's internal gender for identity. We use well-known techniques such as Principal Component Analysis, Linear Discriminant Analysis, and the Vector Space Method to cluster multiple anonymous-authored works. Here we use a new approach, using seriation with noise to separate subtle features in individuals. We conduct time series analysis using modified variants of 1-lag autocorrelation and the coefficient of skewness, two statistical metrics that change near a tipping point, to track serious life events in an individual through cognitive linguistic markers. In our journey of discovery, we uncover secrets about the Elizabethan playwrights hidden for over 400 years. We uncover markers for depression and anxiety in modern-day writers and identify linguistic cues for Alzheimer's disease much earlier than other studies using sensory processing. In using these techniques on the Lone Wolf, we can separate their writing style used before their attacks that differs from other writing

    Exploring and Evaluating the Scalability and Efficiency of Apache Spark using Educational Datasets

    Get PDF
    Research into the combination of data mining and machine learning technology with web-based education systems (known as education data mining, or EDM) is becoming imperative in order to enhance the quality of education by moving beyond traditional methods. With the worldwide growth of the Information Communication Technology (ICT), data are becoming available at a significantly large volume, with high velocity and extensive variety. In this thesis, four popular data mining methods are applied to Apache Spark, using large volumes of datasets from Online Cognitive Learning Systems to explore the scalability and efficiency of Spark. Various volumes of datasets are tested on Spark MLlib with different running configurations and parameter tunings. The thesis convincingly presents useful strategies for allocating computing resources and tuning to take full advantage of the in-memory system of Apache Spark to conduct the tasks of data mining and machine learning. Moreover, it offers insights that education experts and data scientists can use to manage and improve the quality of education, as well as to analyze and discover hidden knowledge in the era of big data
    • …
    corecore