28 research outputs found

    Word alignment and smoothing methods in statistical machine translation: Noise, prior knowledge and overfitting

    Get PDF
    This thesis discusses how to incorporate linguistic knowledge into an SMT system. Although one important category of linguistic knowledge is that obtained by a constituent / dependency parser, a POS / super tagger, and a morphological analyser, linguistic knowledge here includes larger domains than this: Multi-Word Expressions, Out-Of-Vocabulary words, paraphrases, lexical semantics (or non-literal translations), named-entities, coreferences, and transliterations. The first discussion is about word alignment where we propose a MWE-sensitive word aligner. The second discussion is about the smoothing methods for a language model and a translation model where we propose a hierarchical Pitman-Yor process-based smoothing method. The common grounds for these discussion are the examination of three exceptional cases from real-world data: the presence of noise, the availability of prior knowledge, and the problem of underfitting. Notable characteristics of this design are the careful usage of (Bayesian) priors in order that it can capture both frequent and linguistically important phenomena. This can be considered to provide one example to solve the problems of statistical models which often aim to learn from frequent examples only, and often overlook less frequent but linguistically important phenomena

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Impact Resistance of Hybrid Metal-Organic Frameworks/Carbon Fibers Composites

    Get PDF
    The increase in the use of carbon fiber-reinforced polymers (CFRPs) composites in the aerospace industry generated the need of improving the properties and capabilities of these composites by adding nano-reinforcements to the carbon fibers, also called hybrid fiber reinforced polymer composites. In this study, the energy absorption due to impact at low speed will be tested and simulated in four configurations of CFRPs utilizing the same [0/90]S layout throughout them. The carbon fiber configurations used during this study are de-sized, acid-activated, metal-organic frameworks (MOF), and carbon nanotubes (CNTs). Nickel (II) Nitrate, Methylimidazole, and Methanol were used to grow the MOF nano-reinforcement on the carbon fibers; On the other hand, CNTs were grown by reducing the MOF on the carbon fibers and using Ethylene (C2H4), Nitrogen (N2), and Hydrogen (H2) to grow CNTs on the carbon fibers. To evaluate the composites’ mechanical properties, such as tensile tests and impact tests were performed; Furthermore, a dynamic mechanical analysis (DMA) was performed to assess the dynamic properties of the composites manufactured. Lastly, an impact simulation was performed on LS-Dyna utilizing the properties obtained in the mechanical testing performed. The results obtained proposed that an appropriate combination and recipe of MOF growth could potentially increase the energy absorption of carbon fiber-reinforced polymers

    Towards sustainability in municipal solid waste management in South Africa : a survey of challenges and prospects

    Get PDF
    Abstract: In most developing countries, the huge amount of unmanaged municipal solid wastes and the inefficiency of the current waste management system has resulted in an unprecedented effect on human health and the quality of the environment. The drive towards sustainability in solid waste management in South Africa has led to the promulgation of several legislations and policies directed towards increased efficiency of solid waste management strategies. However, despite the progress in South Africa’s waste management systems over the years, it is still being constantly faced with some challenges and shortcomings. To achieve sustainable development through the transition from a linear economic model to a circular economy, there is a need to revamp the waste management sector. This study presents a survey of the key physical elements of integrated waste management in South Africa. The study further discusses the challenges with major emphasis on the future directions of integrated waste management. Waste management decisions are data-driven decisions. This study identifies the lack of accurate and reliable waste-related data as one of the major factors that impede the fast-track growth towards sustainable waste management in South Africa. A data-mining approach that emphasizes intelligent modeling of waste management systems is recommended to support the national waste database which will aid waste management decisions and optimizes waste management facilities and investments. Sustainability in waste management in South Africa requires a multi-sector intervention and involvement to stimulate sustainable development in waste management

    Predicting remaining useful life of rotating machinery based artificial neural network

    Get PDF
    Accurate remaining useful life (RUL) prediction of machines is important for condition based maintenance (CBM) to improve the reliability and cost of maintenance. This paper proposes artificial neural network (ANN) as a method to improve accurate RUL prediction of bearing failure. For this purpose, ANN model uses time and fitted measurements Weibull hazard rates of root mean square (RMS) and kurtosis from its present and previous points as input. Meanwhile, the normalized life percentage is selected as output. By doing that, the noise of a degradation signal from a target bearing can be minimized and the accuracy of prognosis system can be improved. The ANN RUL prediction uses FeedForward Neural Network (FFNN) with Levenberg Marquardt of training algorithm. The results from the proposed method shows that better performance is achieved in order to predict bearing failure

    Featured Anomaly Detection Methods and Applications

    Get PDF
    Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks

    Movement Analytics: Current Status, Application to Manufacturing, and Future Prospects from an AI Perspective

    Full text link
    Data-driven decision making is becoming an integral part of manufacturing companies. Data is collected and commonly used to improve efficiency and produce high quality items for the customers. IoT-based and other forms of object tracking are an emerging tool for collecting movement data of objects/entities (e.g. human workers, moving vehicles, trolleys etc.) over space and time. Movement data can provide valuable insights like process bottlenecks, resource utilization, effective working time etc. that can be used for decision making and improving efficiency. Turning movement data into valuable information for industrial management and decision making requires analysis methods. We refer to this process as movement analytics. The purpose of this document is to review the current state of work for movement analytics both in manufacturing and more broadly. We survey relevant work from both a theoretical perspective and an application perspective. From the theoretical perspective, we put an emphasis on useful methods from two research areas: machine learning, and logic-based knowledge representation. We also review their combinations in view of movement analytics, and we discuss promising areas for future development and application. Furthermore, we touch on constraint optimization. From an application perspective, we review applications of these methods to movement analytics in a general sense and across various industries. We also describe currently available commercial off-the-shelf products for tracking in manufacturing, and we overview main concepts of digital twins and their applications

    Unsupervised Anomaly Detection in Unstructured Log-Data for Root-Cause-Analysis

    Get PDF
    Anomaly detection has attracted the attention of researchers from a variety of backgrounds as it finds numerous applications in the industry. As a subfield, fault detection plays a crucial role in growing telecommunications networks since failures lead to dissatisfaction and hence financial drawbacks. It aims at identifying unusual events in the system log files. System logs are messages from the elements of the network to highlight their status. The main challenge is to cope with the rate the data volume grows. Traditional methods such as expert systems are no longer practical making machine learning approaches more valuable. In this thesis work, unsupervised anomaly (fault) detection in unstructured system logs is investigated. The effect of various feature extraction methods are investigated in terms of the gain they provide. Also, the baseline dimensionality reduction method Principal Component Analysis (PCA) and its effects are given. Additionally, autoencoders are studied as an alternative dimensionality reduction technique. Four different methods based on statistics and clustering as well as a framework to clean datasets from anomalies are discussed. A high detection (classification) rate with 99:69% precision and 0:07% false alarm rate are achieved in one of the datasets while similar results have been achieved with variations in the recall in the other dataset. The studies show that the dimensionality reduction can greatly improve the performance of the classifiers used and reduce the computational complexity in anomaly detection

    Detection of unusual fish trajectories from underwater videos

    Get PDF
    Fish behaviour analysis is a fundamental research area in marine ecology as it is helpful for detecting environmental changes by observing unusual fish patterns or new fish behaviours. The traditional way of analysing fish behaviour is by visual inspection using human observers, which is very time consuming and also limits the amount of data that can be processed. Therefore, there is a need for automatic algorithms to identify fish behaviours by using computer vision and machine learning techniques. The aim of this thesis is to help marine biologists with their work. We focus on behaviour understanding and analysis of detected and tracked fish with unusual behaviour detection approaches. Normal fish trajectories exhibit frequently observed behaviours while unusual trajectories are outliers or rare trajectories. This thesis proposes 3 approaches to detecting unusual trajectories: i) a filtering mechanism for normal fish trajectories, ii) an unusual fish trajectory classification method using clustered and labelled data and iii) an unusual fish trajectory classification approach using a clustering based hierarchical decomposition. The rule based trajectory filtering mechanism is proposed to remove normal fish trajectories which potentially helps to increase the accuracy of the unusual fish behaviour detection system. The aim is to reject normal fish trajectories as much as possible while not rejecting unusual fish trajectories. The results show that this method successfully filters out normal trajectories with a low false negative rate. This method is useful to assist building a ground truth data set from a very large fish trajectory repository, especially when the amount of normal fish trajectories greatly dominates the unusual fish trajectories. Moreover, it successfully distinguishes true fish trajectories from false fish trajectories which result from errors by the fish detection and tracking algorithms. A key contribution of this thesis is the proposed flat classifier, which uses an outlier detection method based on cluster cardinalities and a distance function to detect unusual fish trajectories. Clustered and labelled data are used to select feature sets which perform best on a training set. To describe fish trajectories 10 groups of trajectory descriptions are proposed which were not previously used for fish behaviour analysis. The proposed flat classifier improved the performance of unusual fish detection compared to the filtering approach. The performance of the flat classifier is further improved by integrating it into a hierarchical decomposition. This hierarchical decomposition method selects more specific features for different trajectory clusters which is useful considering the trajectory variety. Significantly improved results were obtained using this hierarchical decomposition in comparison to the flat classifier. This hierarchical framework is also applied to classification of more general imbalanced data sets which is a key current topic in machine learning. The experiments showed that the proposed hierarchical decomposition method is significantly better than the state of art classification methods, other outlier detection methods and unusual trajectory detection methods. Furthermore, it is successful at classifying imbalanced data sets even though the majority and minority classes contain varieties, and classes overlap which is frequently seen in real-world applications. Finally, we explored the benefits of active learning in the context of the hierarchical decomposition method, where active learning query strategies choose the most informative training data. A substantial performance gain is possible by using less labelled training data compared to learning from larger labelled data sets. Additionally, active learning with feature selection is investigated. The results show that feature selection has a positive effect on the performance of active learning. However, we show that random selection can be as effective as popular active learning query strategies in combination with active learning and feature selection, especially for imbalanced set classification

    Data Substantiation in Mobility

    Full text link
    The world is embracing the presence of connected autonomous vehicles which are expected to play a major role in the future of intelligent transport systems. Given such connectivity, vehicles in the networks are vulnerable to making incorrect decisions due to anomalous data. No sophisticated attacks are required; just a vehicle reporting anomalous speeds would be enough to disrupt the entire traffic flow. Detection of such anomalies is vital to ensure the security of a vehicular network. This thesis proposes the use of traffic flow theory for anomalous data detection in vehicular networks, by evaluating the consistency of microscopic parameters which are derived by traffic flow theory with macroscopic views of traffic under different traffic conditions. Though little attention has been given to using traffic flow properties to determine anomalous basic safety message data, the fundamental nature of traffic flow properties makes it a robust assessment tool. The aim of this thesis is to develop a robust data substantiation framework for vehicular networks using traffic flow fundamentals. The aim is fulfilled in three objectives; (1) to provide an overview of the context in terms of existing data substantiation methods, vehicular communication, and traffic flow theory, (2) to develop data substantiation models to detect anomalies irrespective of the cause of the anomality, and (3) to assess the applicability of traffic flow theory for data substantiation in vehicular networks. Chapters 1 and 2 are introductions and literature reviews respectively. The first main chapter describes the context of vehicular networks, traffic flow theory, and the intuition of applying traffic flow theory for substantiation in vehicular networks. The next three chapters elaborate, formulate, demonstrate, and evaluate the use of macroscopic views of traffic to substantiate microscopic data in vehicular networks. The first of these discusses the use of steady state conditions in traffic flow theory to substantiate data in vehicular networks, and the second describes the use of shockwave theory in traffic to substantiate data in vehicular networks. The third chapter develops a data substantiation model utilising localised views of traffic to provide an additional resolution to the previous models
    corecore