69 research outputs found

    Informed pair selection for self-paced metric learning in Siamese neural networks.

    Get PDF
    Siamese Neural Networks (SNNs) are deep metric learners that use paired instance comparisons to learn similarity. The neural feature maps learnt in this way provide useful representations for classification tasks. Learning in SNNs is not reliant on explicit class knowledge; instead they require knowledge about the relationship between pairs. Though often ignored, we have found that appropriate pair selection is crucial to maximising training efficiency, particularly in scenarios where examples are limited. In this paper, we study the role of informed pair selection and propose a 2-phased strategy of exploration and exploitation. Random sampling provides the needed coverage for exploration, while areas of uncertainty modeled by neighbourhood properties of the pairs drive exploitation. We adopt curriculum learning to organise the ordering of pairs at training time using similarity knowledge as a heuristic for pair sorting. The results of our experimental evaluation show that these strategies are key to optimising training

    Similarity and explanation for dynamic telecommunication engineer support.

    Get PDF
    Understanding similarity between different examples is a crucial aspect of Case-Based Reasoning (CBR) systems, but learning representations optimised for similarity comparisons can be difficult. CBR systems typically rely on separate algorithms to learn representations for cases and to compare those representations, as symbolised by the vocabulary and similarity knowledge containers respectively. Deep Metric Learners (DMLs) are a branch of deep learning architectures which learn a representation optimised for similarity comparison by leveraging direct case comparisons during training. In this thesis we explore the symbiotic relationship between these two fields of research. Firstly we examine what can be learned from traditional CBR research to improve the training of DMLs through training strategies. We then examine how DMLs can fill the traditionally separate roles of the vocabulary and similarity knowledge containers. We perform this exploration on the real-world problem of experience transfer between experts and non-experts on service provisioning for telecommunication organisations. This problem is also revealing about the requirements for practical applications to be explainable to their intended user group. With that in mind, we conclude this thesis with work towards the development of an explanation framework designed to explain the recommendations of similarity-based classifiers. We support this practical contribution with an exploration of similarity knowledge to support autonomous measurement of explanation quality

    Leveraging siamese networks for one-shot intrusion detection model

    Get PDF
    The use of supervised Machine Learning (ML) to enhance Intrusion Detection Systems (IDS) has been the subject of significant research. Supervised ML is based upon learning by example, demanding significant volumes of representative instances for effective training and the need to retrain the model for every unseen cyber-attack class. However, retraining the models in-situ renders the network susceptible to attacks owing to the time-window required to acquire a sufficient volume of data. Although anomaly detection systems provide a coarse-grained defence against unseen attacks, these approaches are significantly less accurate and suffer from high false-positive rates. Here, a complementary approach referred to as “One-Shot Learning”, whereby a limited number of examples of a new attack-class is used to identify a new attack-class (out of many) is detailed. The model grants a new cyber-attack classification opportunity for classes that were not seen during training without retraining. A Siamese Network is trained to differentiate between classes based on pairs similarities, rather than features, allowing to identify new and previously unseen attacks. The performance of a pre-trained model to classify new attack-classes based only on one example is evaluated using three mainstream IDS datasets; CICIDS2017, NSL-KDD, and KDD Cup’99. The results confirm the adaptability of the model in classifying unseen attacks and the trade-off between performance and the need for distinctive class representations.</p

    Metric Selection and Metric Learning for Matching Tasks

    Get PDF
    A quarter of a century after the world-wide web was born, we have grown accustomed to having easy access to a wealth of data sets and open-source software. The value of these resources is restricted if they are not properly integrated and maintained. A lot of this work boils down to matching; finding existing records about entities and enriching them with information from a new data source. In the realm of code this means integrating new code snippets into a code base while avoiding duplication. In this thesis, we address two different such matching problems. First, we leverage the diverse and mature set of string similarity measures in an iterative semisupervised learning approach to string matching. It is designed to query a user to make a sequence of decisions on specific cases of string matching. We show that we can find almost optimal solutions after only a small amount of such input. The low labelling complexity of our algorithm is due to addressing the cold start problem that is inherent to Active Learning; by ranking queries by variance before the arrival of enough supervision information, and by a self-regulating mechanism that counteracts initial biases. Second, we address the matching of code fragments for deduplication. Programming code is not only a tool, but also a resource that itself demands maintenance. Code duplication is a frequent problem arising especially from modern development practice. There are many reasons to detect and address code duplicates, for example to keep a clean and maintainable codebase. In such more complex data structures, string similarity measures are inadequate. In their stead, we study a modern supervised Metric Learning approach to model code similarity with Neural Networks. We find that in such a model representing the elementary tokens with a pretrained word embedding is the most important ingredient. Our results show both qualitatively (by visualization) that relatedness is modelled well by the embeddings and quantitatively (by ablation) that the encoded information is useful for the downstream matching task. As a non-technical contribution, we unify the common challenges arising in supervised learning approaches to Record Matching, Code Clone Detection and generic Metric Learning tasks. We give a novel account to string similarity measures from a psychological standpoint and point out and document one longstanding naming conflict in string similarity measures. Finally, we point out the overlap of latest research in Code Clone Detection with the field of Natural Language Processing

    Re-identifying people in the crowd

    Get PDF
    Developing an automated surveillance system is of great interest for various reasons including forensic and security applications. In the case of a network of surveillance cameras with non-overlapping fields of view, person detection and tracking alone are insufficient to track a subject of interest across the network. In this case, instances of a person captured in one camera view need to be retrieved among a gallery of different people, in other camera views. This vision problem is commonly known as person re-identification (re-id). Cross-view instances of pedestrians exhibit varied levels of illumination, viewpoint, and pose variations which makes the problem very challenging. Despite recent progress towards improving accuracy, existing systems suffer from low applicability to real-world scenarios. This is mainly caused by the need for large amounts of annotated data from pairwise camera views to be available for training. Given the difficulty of obtaining such data and annotating it, this thesis aims to bring the person re-id problem a step closer to real-world deployment. In the first contribution, the single-shot protocol, where each individual is represented by a pair of images that need to be matched, is considered. Following the extensive annotation of four datasets for six attributes, an evaluation of the most widely used feature extraction schemes is conducted. The results reveal two high-performing descriptors among those evaluated, and show illumination variation to have the most impact on re-id accuracy. Motivated by the wide availability of videos from surveillance cameras and the additional visual and temporal information they provide, video-based person re-id is then investigated, and a su-pervised system is developed. This is achieved by improving and extending the best performing image-based person descriptor into three dimensions and combining it with distance metric learn-ing. The system obtained achieves state-of-the-art results on two widely used datasets. Given the cost and difficulty of obtaining labelled data from pairwise cameras in a network to train the model, an unsupervised video-based person re-id method is also developed. It is based on a set-based distance measure that leverages rank vectors to estimate the similarity scores between person tracklets. The proposed system outperforms other unsupervised methods by a large margin on two datasets while competing with deep learning methods on another large-scale dataset

    Graph learning and its applications : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, Albany, Auckland, New Zealand

    Get PDF
    Since graph features consider the correlations between two data points to provide high-order information, i.e., more complex correlations than the low-order information which considers the correlations in the individual data, they have attracted much attention in real applications. The key of graph feature extraction is the graph construction. Previous study has demonstrated that the quality of the graph usually determines the effectiveness of the graph feature. However, the graph is usually constructed from the original data which often contain noise and redundancy. To address the above issue, graph learning is designed to iteratively adjust the graph and model parameters so that improving the quality of the graph and outputting optimal model parameters. As a result, graph learning has become a very popular research topic in traditional machine learning and deep learning. Although previous graph learning methods have been applied in many fields by adding a graph regularization to the objective function, they still have some issues to be addressed. This thesis focuses on the study of graph learning aiming to overcome the drawbacks in previous methods for different applications. We list the proposed methods as follows. • We propose a traditional graph learning method under supervised learning to consider the robustness and the interpretability of graph learning. Specifically, we propose utilizing self-paced learning to assign important samples with large weights, conducting feature selection to remove redundant features, and learning a graph matrix from the low dimensional data of the original data to preserve the local structure of the data. As a consequence, both important samples and useful features are used to select support vectors in the SVM framework. • We propose a traditional graph learning method under semi-supervised learning to explore parameter-free fusion of graph learning. Specifically, we first employ the discrete wavelet transform and Pearson correlation coefficient to obtain multiple fully connected Functional Connectivity brain Networks (FCNs) for every subject, and then learn a sparsely connected FCN for every subject. Finally, the ℓ1-SVM is employed to learn the important features and conduct disease diagnosis. • We propose a deep graph learning method to consider graph fusion of graph learning. Specifically, we first employ the Simple Linear Iterative Clustering (SLIC) method to obtain multi-scale features for every image, and then design a new graph fusion method to fine-tune features of every scale. As a result, the multi-scale feature fine-tuning, graph learning, and feature learning are embedded into a unified framework. All proposed methods are evaluated on real-world data sets, by comparing to state-of-the-art methods. Experimental results demonstrate that our methods outperformed all comparison methods

    Theory-Driven Analysis of Natural Language Processing Measures of Thought Disorder Using Generative Language Modeling

    Get PDF
    BACKGROUND: Natural language processing (NLP) holds promise to transform psychiatric research and practice. A pertinent example is the success of NLP in the automatic detection of speech disorganization in formal thought disorder (FTD). However, we lack an understanding of precisely what common NLP metrics measure and how they relate to theoretical accounts of FTD. We propose tackling these questions by using deep generative language models to simulate FTD-like narratives by perturbing computational parameters instantiating theory-based mechanisms of FTD. METHODS: We simulated FTD-like narratives using Generative-Pretrained-Transformer-2 by either increasing word selection stochasticity or limiting the model's memory span. We then examined the sensitivity of common NLP measures of derailment (semantic distance between consecutive words or sentences) and tangentiality (how quickly meaning drifts away from the topic) in detecting and dissociating the 2 underlying impairments. RESULTS: Both parameters led to narratives characterized by greater semantic distance between consecutive sentences. Conversely, semantic distance between words was increased by increasing stochasticity, but decreased by limiting memory span. An NLP measure of tangentiality was uniquely predicted by limited memory span. The effects of limited memory span were nonmonotonic in that forgetting the global context resulted in sentences that were semantically closer to their local, intermediate context. Finally, different methods for encoding the meaning of sentences varied dramatically in performance. CONCLUSIONS: This work validates a simulation-based approach as a valuable tool for hypothesis generation and mechanistic analysis of NLP markers in psychiatry. To facilitate dissemination of this approach, we accompany the paper with a hands-on Python tutorial
    • …
    corecore