148 research outputs found

    Featured Anomaly Detection Methods and Applications

    Get PDF
    Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks

    Incremental hashing with sample selection using dominant sets

    Get PDF
    In the world of big data, large amounts of images are available in social media, corporate and even personal collections. A collection may grow quickly as new images are generated at high rates. The new images may cause changes in the distribution of existing classes or the emergence of new classes, resulting in the collection being dynamic and having concept drift. For efficient image retrieval from an image collection using a query, a hash table consisting of a set of hash functions is needed to transform images into binaryhash codeswhich are used as the basis to find similar images to the query. If the image collection is dynamic, the hash table built at one time step may not work well at the next due to changes in the collection as a result of new images being added. Therefore, the hash table needs to be rebuilt or updated at successive time steps. Incremental hashing (ICH) is the first effective method to deal with the concept drift problem in image retrieval from dynamic collections. In ICH, a new hash table is learned based on newly emerging images only which represent data distribution of the current data environment. The new hash table is used to generate hash codes for all images including old and new ones. Due to the dynamic nature, new images of one class may not be similar to old images of the same class. In order to learn new hash table that preserves within-class similarity in both old and new images,incremental hashing with sample selection using dominant sets(ICHDS) is proposed in this paper, which selects representative samples from each class for training the new hash table. Experimental results show that ICHDS yields better retrieval performance than existing dynamic and static hashing methods

    Image authentication using LBP-based perceptual image hashing

    Get PDF
    Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for perceptual image hashing. In feature extraction, we propose to use both sign and magnitude information of local differences. So, the algorithm utilizes a combination of gradient-based and LBP-based descriptors for feature extraction. To provide security needs, two secret keys are incorporated in feature extraction and hash generation steps. Performance of the proposed hashing method is evaluated with an important application in perceptual image hashing scheme: image authentication. Experiments are conducted to show that the present method has acceptable robustness against perceptual content-preserving manipulations. Moreover, the proposed method has this capability to localize the tampering area, which is not possible in all hashing schemes

    NNMap: A method to construct a good embedding for nearest neighbor classification

    Get PDF
    a b s t r a c t This paper aims to deal with the practical shortages of nearest neighbor classifier. We define a quantitative criterion of embedding quality assessment for nearest neighbor classification, and present a method called NNMap to construct a good embedding. Furthermore, an efficient distance is obtained in the embedded vector space, which could speed up nearest neighbor classification. The quantitative quality criterion is proposed as a local structure descriptor of sample data distribution. Embedding quality corresponds to the quality of the local structure. In the framework of NNMap, one-dimension embeddings act as weak classifiers with pseudo-losses defined on the amount of the local structure preserved by the embedding. Based on this property, the NNMap method reduces the problem of embedding construction to the classical boosting problem. An important property of NNMap is that the embedding optimization criterion is appropriate for both vector and non-vector data, and equally valid in both metric and non-metric spaces. The effectiveness of the new method is demonstrated by experiments conducted on the MNIST handwritten dataset, the CMU PIE face images dataset and the datasets from UCI machine learning repository

    A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations

    Get PDF
    Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks

    Novel optimization schemes for service composition in the cloud using learning automata-based matrix factorization

    Get PDF
    A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of Doctor of PhilosophyService Oriented Computing (SOC) provides a framework for the realization of loosely couple service oriented applications (SOA). Web services are central to the concept of SOC. They possess several benefits which are useful to SOA e.g. encapsulation, loose coupling and reusability. Using web services, an application can embed its functionalities within the business process of other applications. This is made possible through web service composition. Web services are composed to provide more complex functions for a service consumer in the form of a value added composite service. Currently, research into how web services can be composed to yield QoS (Quality of Service) optimal composite service has gathered significant attention. However, the number and services has risen thereby increasing the number of possible service combinations and also amplifying the impact of network on composite service performance. QoS-based service composition in the cloud addresses two important sub-problems; Prediction of network performance between web service nodes in the cloud, and QoS-based web service composition. We model the former problem as a prediction problem while the later problem is modelled as an NP-Hard optimization problem due to its complex, constrained and multi-objective nature. This thesis contributed to the prediction problem by presenting a novel learning automata-based non-negative matrix factorization algorithm (LANMF) for estimating end-to-end network latency of a composition in the cloud. LANMF encodes each web service node as an automaton which allows v it to estimate its network coordinate in such a way that prediction error is minimized. Experiments indicate that LANMF is more accurate than current approaches. The thesis also contributed to the QoS-based service composition problem by proposing four evolutionary algorithms; a network-aware genetic algorithm (INSGA), a K-mean based genetic algorithm (KNSGA), a multi-population particle swarm optimization algorithm (NMPSO), and a non-dominated sort fruit fly algorithm (NFOA). The algorithms adopt different evolutionary strategies coupled with LANMF method to search for low latency and QoSoptimal solutions. They also employ a unique constraint handling method used to penalize solutions that violate user specified QoS constraints. Experiments demonstrate the efficiency and scalability of the algorithms in a large scale environment. Also the algorithms outperform other evolutionary algorithms in terms of optimality and calability. In addition, the thesis contributed to QoS-based web service composition in a dynamic environment. This is motivated by the ineffectiveness of the four proposed algorithms in a dynamically hanging QoS environment such as a real world scenario. Hence, we propose a new cellular automata-based genetic algorithm (CellGA) to address the issue. Experimental results show the effectiveness of CellGA in solving QoS-based service composition in dynamic QoS environment
    • …
    corecore