742 research outputs found

    An Improved Differential Evolution Algorithm for Data Stream Clustering

    Get PDF
    A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%

    Process Monitoring on Sequences of System Call Count Vectors

    Full text link
    We introduce a methodology for efficient monitoring of processes running on hosts in a corporate network. The methodology is based on collecting streams of system calls produced by all or selected processes on the hosts, and sending them over the network to a monitoring server, where machine learning algorithms are used to identify changes in process behavior due to malicious activity, hardware failures, or software errors. The methodology uses a sequence of system call count vectors as the data format which can handle large and varying volumes of data. Unlike previous approaches, the methodology introduced in this paper is suitable for distributed collection and processing of data in large corporate networks. We evaluate the methodology both in a laboratory setting on a real-life setup and provide statistics characterizing performance and accuracy of the methodology.Comment: 5 pages, 4 figures, ICCST 201

    Data driven methods for updating fault detection and diagnosis system in chemical processes

    Get PDF
    Modern industrial processes are becoming more complex, and consequently monitoring them has become a challenging task. Fault Detection and Diagnosis (FDD) as a key element of process monitoring, needs to be investigated because of its essential role in decision making processes. Among available FDD methods, data driven approaches are currently receiving increasing attention because of their relative simplicity in implementation. Regardless of FDD types, one of the main traits of reliable FDD systems is their ability of being updated while new conditions that were not considered at their initial training appear in the process. These new conditions would emerge either gradually or abruptly, but they have the same level of importance as in both cases they lead to FDD poor performance. For addressing updating tasks, some methods have been proposed, but mainly not in research area of chemical engineering. They could be categorized to those that are dedicated to managing Concept Drift (CD) (that appear gradually), and those that deal with novel classes (that appear abruptly). The available methods, mainly, in addition to the lack of clear strategies for updating, suffer from performance weaknesses and inefficient required time of training, as reported. Accordingly, this thesis is mainly dedicated to data driven FDD updating in chemical processes. The proposed schemes for handling novel classes of faults are based on unsupervised methods, while for coping with CD both supervised and unsupervised updating frameworks have been investigated. Furthermore, for enhancing the functionality of FDD systems, some major methods of data processing, including imputation of missing values, feature selection, and feature extension have been investigated. The suggested algorithms and frameworks for FDD updating have been evaluated through different benchmarks and scenarios. As a part of the results, the suggested algorithms for supervised handling CD surpass the performance of the traditional incremental learning in regard to MGM score (defined dimensionless score based on weighted F1 score and training time) even up to 50% improvement. This improvement is achieved by proposed algorithms that detect and forget redundant information as well as properly adjusting the data window for timely updating and retraining the fault detection system. Moreover, the proposed unsupervised FDD updating framework for dealing with novel faults in static and dynamic process conditions achieves up to 90% in terms of the NPP score (defined dimensionless score based on number of the correct predicted class of samples). This result relies on an innovative framework that is able to assign samples either to new classes or to available classes by exploiting one class classification techniques and clustering approaches.Los procesos industriales modernos son cada vez más complejos y, en consecuencia, su control se ha convertido en una tarea desafiante. La detección y el diagnóstico de fallos (FDD), como un elemento clave de la supervisión del proceso, deben ser investigados debido a su papel esencial en los procesos de toma de decisiones. Entre los métodos disponibles de FDD, los enfoques basados en datos están recibiendo una atención creciente debido a su relativa simplicidad en la implementación. Independientemente de los tipos de FDD, una de las principales características de los sistemas FDD confiables es su capacidad de actualización, mientras que las nuevas condiciones que no fueron consideradas en su entrenamiento inicial, ahora aparecen en el proceso. Estas nuevas condiciones pueden surgir de forma gradual o abrupta, pero tienen el mismo nivel de importancia ya que en ambos casos conducen al bajo rendimiento de FDD. Para abordar las tareas de actualización, se han propuesto algunos métodos, pero no mayoritariamente en el área de investigación de la ingeniería química. Podrían ser categorizados en los que están dedicados a manejar Concept Drift (CD) (que aparecen gradualmente), y a los que tratan con clases nuevas (que aparecen abruptamente). Los métodos disponibles, además de la falta de estrategias claras para la actualización, sufren debilidades en su funcionamiento y de un tiempo de capacitación ineficiente, como se ha referenciado. En consecuencia, esta tesis está dedicada principalmente a la actualización de FDD impulsada por datos en procesos químicos. Los esquemas propuestos para manejar nuevas clases de fallos se basan en métodos no supervisados, mientras que para hacer frente a la CD se han investigado los marcos de actualización supervisados y no supervisados. Además, para mejorar la funcionalidad de los sistemas FDD, se han investigado algunos de los principales métodos de procesamiento de datos, incluida la imputación de valores perdidos, la selección de características y la extensión de características. Los algoritmos y marcos sugeridos para la actualización de FDD han sido evaluados a través de diferentes puntos de referencia y escenarios. Como parte de los resultados, los algoritmos sugeridos para el CD de manejo supervisado superan el rendimiento del aprendizaje incremental tradicional con respecto al puntaje MGM (puntuación adimensional definida basada en el puntaje F1 ponderado y el tiempo de entrenamiento) hasta en un 50% de mejora. Esta mejora se logra mediante los algoritmos propuestos que detectan y olvidan la información redundante, así como ajustan correctamente la ventana de datos para la actualización oportuna y el reciclaje del sistema de detección de fallas. Además, el marco de actualización FDD no supervisado propuesto para tratar fallas nuevas en condiciones de proceso estáticas y dinámicas logra hasta 90% en términos de la puntuación de NPP (puntuación adimensional definida basada en el número de la clase de muestras correcta predicha). Este resultado se basa en un marco innovador que puede asignar muestras a clases nuevas o a clases disponibles explotando una clase de técnicas de clasificación y enfoques de agrupamientoPostprint (published version

    Enhanced Industrial Machinery Condition Monitoring Methodology based on Novelty Detection and Multi-Modal Analysis

    Get PDF
    This paper presents a condition-based monitoring methodology based on novelty detection applied to industrial machinery. The proposed approach includes both, the classical classification of multiple a priori known scenarios, and the innovative detection capability of new operating modes not previously available. The development of condition-based monitoring methodologies considering the isolation capabilities of unexpected scenarios represents, nowadays, a trending topic able to answer the demanding requirements of the future industrial processes monitoring systems. First, the method is based on the temporal segmentation of the available physical magnitudes, and the estimation of a set of time-based statistical features. Then, a double feature reduction stage based on Principal Component Analysis and Linear Discriminant Analysis is applied in order to optimize the classification and novelty detection performances. The posterior combination of a Feed-forward Neural Network and One-Class Support Vector Machine allows the proper interpretation of known and unknown operating conditions. The effectiveness of this novel condition monitoring scheme has been verified by experimental results obtained from an automotive industry machine.Postprint (published version

    Feature-based multi-class classification and novelty detection for fault diagnosis of industrial machinery

    Get PDF
    Given the strategic role that maintenance assumes in achieving profitability and competitiveness, many industries are dedicating many efforts and resources to improve their maintenance approaches. The concept of the Smart Factory and the possibility of highly connected plants enable the collection of massive data that allow equipment to be monitored continuously and real-time feedback on their health status. The main issue met by industries is the lack of data corresponding to faulty conditions, due to environmental and safety issues that failed machinery might cause, besides the production loss and product quality issues. In this paper, a complete and easy-to-implement procedure for streaming fault diagnosis and novelty detection, using different Machine Learning techniques, is applied to an industrial machinery sub-system. The paper aims to offer useful guidelines to practitioners to choose the best solution for their systems, including a model hyperparameter optimization technique that supports the choice of the best model. Results indicate that the methodology is easy, fast, and accurate. Few training data guarantee a high accuracy and a high generalization ability of the classification models, while the integration of a classifier and an anomaly detector reduces the number of false alarms and the computational time

    Advances in Streaming Novelty Detection

    Get PDF
    153 p.En primer lugar, en esta tesis se aborda un problema de confusión entre términos y problemas en el cual el mismo término es utilizado para referirse a diferentes problemas y, de manera similar, el mismo problema es llamado con diferentes términos indistintamente. Esto motiva una dificultad de avance en elcampo de conocimiento dado que es difícil encontrar literatura relacionada y propicia la repetición detrabajos. En la primera contribución se propone una asignación individual de términos a problemas y una formalización de los escenarios de aprendizaje para tratar de estandarizar el campo. En segundo lugar, se aborda el problema de Streaming Novelty Detection. En este problema, partiendo de un conjunto de datos supervisado, se aprende un modelo. A continuación, el modelo recibe nuevas instancias no etiquetadas para predecir su clase de manera online o en stream. El modelo debe actualizarse para hacer frente al concept-drift. En este escenario de clasificación, se asume que puedensurgir nuevas clases de forma dinámica. Por lo tanto, el modelo debe ser capaz de descubrir nuevas clases de manera automática y sin supervisión. En este contexto, esta tesis propone 2 contribuciones. En primerlugar una solución basada en mixturas de Guassianas donde cada clase en modelada con una de lascomponentes de la mixtura. En segundo lugar, se propone el uso de redes neuronales, tales como las redes Autoencoder, y las redes Deep Support Vector Data Description para trabajar con serie stemporales

    Iterative Information Granulation for Novelty Detection in Complex Datasets

    Get PDF
    Recognition memory in a number of mammals is usually utilised to identify novel objects that violate model predictions. In humans in particular, the recognition of novel objects is foremost associated to their ability to group objects that are highly compatible/similar. Granular computing not only mimics the human cognition to draw objects together but also mimics the ability to capture associated properties by similarity, proximity or functionality. In this paper, an iterative information granulation approach is presented, for the problem of novelty detection in complex data. Two granular compatibility measures are used, based on principles of Granular Computing, namely the multidimensional distance between the granules, as well as the granular density and volume. A two-stage iterative information granulation is proposed in this work. In the first stage, a predefined number of granular detectors are constructed. The granular detectors capture the relationships (rules) between the input-output data and then use this information in a second granulation stage in order to discriminate new samples as novel. The proposed iterative information granulation approach for novelty detection is then applied to three different benchmark problems in pattern recognition demonstrating very good performance

    Occurrence of antibiotics in mussels and clams from various FAO areas

    Get PDF
    Filter feeders, like mussels and clams, are suitable bioindicators of environmental pollution. These shellfish, when destined for human consumption, undergo a depuration step that aims to nullify their pathogenic microorganism load and decrease chemical contamination. Nevertheless, the lack of contamination by drugs may not be guaranteed. Antimicrobials are a class of drugs of particular concern due to the increasing phenomenon of antibiotic resistance. Their use in breeding and aquaculture is a major cause of this. We developed a multiclass method for the HPLC\ue2\u80\u93MS/MS analysis of 29 antimicrobials, validated according to the Commission Decision 2002/657/UE guidelines, and applied it to 50 mussel and 50 clam samples derived from various Food and Agricultural Organisation marine zones. The results obtained, indicate a negligible presence of antibiotics. Just one clam sample showed the presence of oxytetracycline at a concentration slightly higher than the European Union Maximum residue limit set for fish