2,827 research outputs found

    A logic approach for exceptions and anomalies in association rules

    Get PDF
    Association rules have been used for obtaining information hidden in a database. Recent researches have pointed out that simple associations are insu cient for representing the diverse kinds of knowledge collected in a database. The use of exceptions and anomalies deal with a di erent type of knowledge sometimes more useful than simple associations. Moreover ex- ceptions and anomalies provide a more comprehensive understanding of the information provided by a database. This work intends to go deeper in the logic model studied in [5]. In the model, association rules can be viewed as general relations between two or more attributes quanti ed by means of a convenient quanti er. Using this formulation we establish the true semantics of the distinct kinds of knowledge we can nd in the database hidden in the four folds of the contingency table. The model is also useful for providing some measures for assessing the validity of those kinds of rulesPeer Reviewe

    Spark solutions for discovering fuzzy association rules in Big Data

    Get PDF
    The research reported in this paper was partially supported the COPKIT project from the 8th Programme Framework (H2020) research and innovation programme (grant agreement No 786687) and from the BIGDATAMED projects with references B-TIC-145-UGR18 and P18-RT-2947.The high computational impact when mining fuzzy association rules grows significantly when managing very large data sets, triggering in many cases a memory overflow error and leading to the experiment failure without its conclusion. It is in these cases when the application of Big Data techniques can help to achieve the experiment completion. Therefore, in this paper several Spark algorithms are proposed to handle with massive fuzzy data and discover interesting association rules. For that, we based on a decomposition of interestingness measures in terms of α-cuts, and we experimentally demonstrate that it is sufficient to consider only 10equidistributed α-cuts in order to mine all significant fuzzy association rules. Additionally, all the proposals are compared and analysed in terms of efficiency and speed up, in several datasets, including a real dataset comprised of sensor measurements from an office building.COPKIT project from the 8th Programme Framework (H2020) research and innovation programme 786687BIGDATAMED projects B-TIC-145-UGR18 P18-RT-294

    Outlier Detection Methods for Industrial Applications

    Get PDF
    An outlier is an observation (or measurement) that is different with respect to the other values contained in a given dataset. Outliers can be due to several causes. The measurement can be incorrectly observed, recorded or entered into the process computer, the observed datum can come from a different population with respect to the normal situation and thus is correctly measured but represents a rare event. In literature different definitions of outlier exist: the most commonly referred are reported in the following: - "An outlier is an observation that deviates so much from other observations as to arouse suspicions that is was generated by a different mechanism " (Hawkins, 1980). - "An outlier is an observation (or subset of observations) which appear to be inconsistent with the remainder of the dataset" (Barnet & Lewis, 1994). - "An outlier is an observation that lies outside the overall pattern of a distribution" (Moore and McCabe, 1999). - "Outliers are those data records that do not follow any pattern in an application" (Chen and al., 2002). - "An outlier in a set of data is an observation or a point that is considerably dissimilar or inconsistent with the remainder of the data" (Ramasmawy at al., 2000). Many data mining algorithms try to minimize the influence of outliers for instance on a final model to develop, or to eliminate them in the data pre-processing phase. However, a data miner should be careful when automatically detecting and eliminating outliers because, if the data are correct, their elimination can cause the loss of important hidden information (Kantardzic, 2003). Some data mining applications are focused on outlier detection and they are the essential result of a data-analysis (Sane & Ghatol, 2006). The outlier detection techniques find applications in credit card fraud, network robustness analysis, network intrusion detection, financial applications and marketing (Han & Kamber, 2001). A more exhaustive list of applications that exploit outlier detection is provided below (Hodge, 2004): - Fraud detection: fraudulent applications for credit cards, state benefits or fraudulent usage of credit cards or mobile phones. - Loan application processing: fraudulent applications or potentially problematical customers. - Intrusion detection, such as unauthorized access in computer networks

    Literature review of machine learning techniques to analyse flight data

    Get PDF
    This paper analyses the increasing trend of using modern machine learning technologies to analyze flight data efficiently. Flight data offers an important insight into the operations of an aircraft. This paper reviews the research undertaken so far on the use of Machine Learning techniques for the analyses of flight data by evaluating various anomaly detection algorithms and the significance of feature selection in Flight Data Monitoring. These algorithms are compared to determine the best class of algorithms for highlighting significant flight anomalies. Furthermore, these algorithms are analyzed for various flight data parameters to determine which class of algorithms is sensitive to continuous parameters and which is sensitive to discrete parameters of flight data. The paper also addresses the ability of each anomaly detection algorithm to be easily adaptable to different datasets and different phases of flight, including take-off and landing.peer-reviewe

    Rule-based system to detect energy efficiency anomalies in smart buildings, a data mining approach

    Get PDF
    The rapidly growing world energy use already has concerns over the exhaustion of energy resources andheavy environmental impacts. As a result of these concerns, a trend of green and smart cities has beenincreasing. To respond to this increasing trend of smart cities with buildings every time more complex,in this paper we have proposed a new method to solve energy inefficiencies detection problem in smartbuildings. This solution is based on a rule-based system developed through data mining techniques andapplying the knowledge of energy efficiency experts. A set of useful energy efficiency indicators is alsoproposed to detect anomalies. The data mining system is developed through the knowledge extracted bya full set of building sensors. So, the results of this process provide a set of rules that are used as a partof a decision support system for the optimisation of energy consumption and the detection of anomaliesin smart buildings.Comisión Europea FP7-28522

    Web usage mining for click fraud detection

    Get PDF
    Estágio realizado na AuditMark e orientado pelo Eng.º Pedro FortunaTese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    D-FRI-Honeypot:A Secure Sting Operation for Hacking the Hackers Using Dynamic Fuzzy Rule Interpolation

    Get PDF
    As active network defence systems, honeypots are commonly used as a decoy to inspect attackers and their attack tactics in order to improve the cybersecurity infrastructure of an organisation. A honeypot may be successful provided that it disguises its identity. However, cyberattackers continuously endeavour to discover honeypots for evading any deception and bolstering their attacks. Active fingerprinting attack is one such technique that may be used to discover honeypots by sending specially designed traffic. Preventing a fingerprinting attack is possible but doing that may hinder the process of dealing with the attackers, counteracting the purpose of a honeypot. Instead, detecting an attempted fingerprinting attack in real-time can enhance a honeypot’s capability, uninterruptedly managing any immediate consequences and preventing the honeypot being identified. Nevertheless, it is difficult to detect and predict an attempted fingerprinting attack due to the challenge of isolating it from other similar attacks, particularly when imprecise observations are involved in the monitoring of the traffic. Dynamic fuzzy rule interpolation (D-FRI) enables an adaptive approach for effective reasoning with such situations by exploiting the best of both inference and interpolation. The dynamic rules produced by D-FRI facilitate approximate reasoning with perpetual changes that often occur in this type of application, where dynamic rules are required to cover new network conditions. This paper proposes a D-FRI-Honeypot, an enhanced honeypot running D-FRI framework in conjunction with Principal Component Analysis, to detect and predict an attempted fingerprinting attack on honeypots. This D-FRI-Honeypot works with a sparse rule base but is able to detect active fingerprinting attacks when it does not find any matching rules. Also, it learns from current network conditions and offers a dynamically enriched rule base to support more precise detection. This D-FRI-Honeypot is tested against five popular fingerprinting tools (namely, Nmap, Xprobe2, NetScanTools Pro, SinFP3 and Nessus), to demonstrate its successful applications

    Wind Turbine Fault Detection: an Unsupervised vs Semi-Supervised Approach

    Get PDF
    The need for renewable energy has been growing in recent years for the reasons we all know, wind power is no exception. Wind turbines are complex and expensive structures and the need for maintenance exists. Conditioning Monitoring Systems that make use of supervised machine learning techniques have been recently studied and the results are quite promising. Though, such systems still require the physical presence of professionals but with the advantage of gaining insight of the operating state of the machine in use, to decide upon maintenance interventions beforehand. The wind turbine failure is not an abrupt process but a gradual one. The main goal of this dissertation is: to compare semi-supervised methods to at tack the problem of automatic recognition of anomalies in wind turbines; to develop an approach combining the Mahalanobis Taguchi System (MTS) with two popular fuzzy partitional clustering algorithms like the fuzzy c-means and archetypal analysis, for the purpose of anomaly detection; and finally to develop an experimental protocol to com paratively study the two types of algorithms. In this work, the algorithms Local Outlier Factor (LOF), Connectivity-based Outlier Factor (COF), Cluster-based Local Outlier Factor (CBLOF), Histogram-based Outlier Score (HBOS), k-nearest-neighbours (k-NN), Subspace Outlier Detection (SOD), Fuzzy c-means (FCM), Archetypal Analysis (AA) and Local Minimum Spanning Tree (LoMST) were explored. The data used consisted of SCADA data sets regarding turbine sensorial data, 8 to tal, from a wind farm in the North of Portugal. Each data set comprises between 1070 and 1096 data cases and characterized by 5 features, for the years 2011, 2012 and 2013. The analysis of the results using 7 different validity measures show that, the CBLOF al gorithm got the best results in the semi-supervised approach while LoMST won in the unsupervised scenario. The extension of both FCM and AA got promissing results.A necessidade de produzir energia renovável tem vindo a crescer nos últimos anos pelas razões que todos sabemos, a energia eólica não é excepção. As turbinas eólicas são es truturas complexas e caras e a necessidade de manutenção existe. Sistemas de Condição Monitorizada utilizando técnicas de aprendizagem supervisionada têm vindo a ser estu dados recentemente e os resultados são bastante promissores. No entanto, estes sistemas ainda exigem a presença física de profissionais, mas com a vantagem de obter informa ções sobre o estado operacional da máquina em uso, para decidir sobre intervenções de manutenção antemão. O principal objetivo desta dissertação é: comparar métodos semi-supervisionados para atacar o problema de reconhecimento automático de anomalias em turbinas eólicas; desenvolver um método que combina o Mahalanobis Taguchi System (MTS) com dois mé todos de agrupamento difuso bem conhecidos como fuzzy c-means e archetypal analysis, no âmbito de deteção de anomalias; e finalmente desenvolver um protocolo experimental onde é possível o estudo comparativo entre os dois diferentes tipos de algoritmos. Neste trabalho, os algoritmos Local Outlier Factor (LOF), Connectivity-based Outlier Factor (COF), Cluster-based Local Outlier Factor (CBLOF), Histogram-based Outlier Score (HBOS), k-nearest-neighbours (k-NN), Subspace Outlier Detection (SOD), Fuzzy c-means (FCM), Archetypal Analysis (AA) and Local Minimum Spanning Tree (LoMST) foram explorados. Os conjuntos de dados utilizados provêm do sistema SCADA, referentes a dados sen soriais de turbinas, 8 no total, com origem num parque eólico no Norte de Portugal. Cada um está compreendendido entre 1070 e 1096 observações e caracterizados por 5 caracte rísticas, para os anos 2011, 2012 e 2013. A ánalise dos resultados através de 7 métricas de validação diferentes mostraram que, o algoritmo CBLOF obteve os melhores resultados na abordagem semi-supervisionada enquanto que o LoMST ganhou na abordagem não supervisionada. A extensão do FCM e do AA originou resultados promissores

    Security in Data Mining- A Comprehensive Survey

    Get PDF
    Data mining techniques, while allowing the individuals to extract hidden knowledge on one hand, introduce a number of privacy threats on the other hand. In this paper, we study some of these issues along with a detailed discussion on the applications of various data mining techniques for providing security. An efficient classification technique when used properly, would allow an user to differentiate between a phishing website and a normal website, to classify the users as normal users and criminals based on their activities on Social networks (Crime Profiling) and to prevent users from executing malicious codes by labelling them as malicious. The most important applications of Data mining is the detection of intrusions, where different Data mining techniques can be applied to effectively detect an intrusion and report in real time so that necessary actions are taken to thwart the attempts of the intruder. Privacy Preservation, Outlier Detection, Anomaly Detection and PhishingWebsite Classification are discussed in this paper
    • …
    corecore