183 research outputs found

    A Study Of Factors Contributing To Self-reported Anomalies In Civil Aviation

    Get PDF
    A study investigating what factors are present leading to pilots submitting voluntary anomaly reports regarding their flight performance was conducted. The study employed statistical methods, text mining, clustering, and dimensional reduction techniques in an effort to determine relationships between factors and anomalies. A review of the literature was conducted to determine what factors are contributing to these anomalous incidents, as well as what research exists on human error, its causes, and its management. Data from the NASA Aviation Safety Reporting System (ASRS) was analyzed using traditional statistical methods such as frequencies and multinomial logistic regression. Recently formalized approaches in text mining such as Knowledge Based Discovery (KBD) and Literature Based Discovery (LBD) were employed to create associations between factors and anomalies. These methods were also used to generate predictive models. Finally, advances in dimensional reduction techniques identified concepts or keywords within records, thus creating a framework for an unsupervised document classification system. Findings from this study reinforced established views on contributing factors to civil aviation anomalies. New associations between previously unrelated factors and conditions were also found. Dimensionality reduction also demonstrated the possibility of identifying salient factors from unstructured text records, and was able to classify these records using these identified features

    Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape

    Get PDF
    Anomaly detection aims at identifying unexpected fluctuations in the expected behavior of a given system. It is acknowledged as a reliable answer to the identification of zero-day attacks to such extent, several ML algorithms that suit for binary classification have been proposed throughout years. However, the experimental comparison of a wide pool of unsupervised algorithms for anomaly-based intrusion detection against a comprehensive set of attacks datasets was not investigated yet. To fill such gap, we exercise seventeen unsupervised anomaly detection algorithms on eleven attack datasets. Results allow elaborating on a wide range of arguments, from the behavior of the individual algorithm to the suitability of the datasets to anomaly detection. We conclude that algorithms as Isolation Forests, One-Class Support Vector Machines and Self-Organizing Maps are more effective than their counterparts for intrusion detection, while clustering algorithms represent a good alternative due to their low computational complexity. Further, we detail how attacks with unstable, distributed or non-repeatable behavior as Fuzzing, Worms and Botnets are more difficult to detect. Ultimately, we digress on capabilities of algorithms in detecting anomalies generated by a wide pool of unknown attacks, showing that achieved metric scores do not vary with respect to identifying single attacks.Comment: Will be published on ACM Transactions Data Scienc

    A recommendation framework based on automated ranking for selecting negotiation agents. Application to a water market

    Full text link
    This thesis presents an approach which relies on automatic learning and data mining techniques in order to search the best group of items from a set, according to the behaviour observed in previous groups. The approach is applied to a framework of a water market system, which aims to develop negotiation processes, where trading tables are built in order to trade water rights from users. Our task will focus on predicting which agents will show the most appropriate behaviour when are invited to participate in a trading table, with the purpose of achieving the most bene cial agreement. This way, a model is developed and learns from past transactions occurred in the market. Then, when a new trading table is opened in order to trade a water right, the model predicts, taking into account the individual features of the trading table, the behaviour of all the agents that can be invited to join the negotiation, and thus, becoming potential buyers of the water right. Once the model has made the predictions for a trading table, the agents are ranked according to their probability (which has been assigned by the model) of becoming a buyer in that negotiation. Two di erent methods are proposed in the thesis for dealing with the ranked participants. Depending on the method used, from this ranking we can select the desired number of participants for making the group, or choose only the top user of the list and rebuild the model adding some aggregate information in order to throw a more detailed prediction.Dura Garcia, EM. (2011). A recommendation framework based on automated ranking for selecting negotiation agents. Application to a water market. http://hdl.handle.net/10251/15875Archivo delegad

    Heterogeneous neural networks: theory and applications

    Get PDF
    Aquest treball presenta una classe de funcions que serveixen de models neuronals generalitzats per ser usats en xarxes neuronals artificials. Es defineixen com una mesura de similitud que actúa com una definició flexible de neurona vista com un reconeixedor de patrons. La similitud proporciona una marc conceptual i serveix de cobertura unificadora de molts models neuronals de la literatura i d'exploració de noves instàncies de models de neurona. La visió basada en similitud porta amb naturalitat a integrar informació heterogènia, com ara quantitats contínues i discretes (nominals i ordinals), i difuses ó imprecises. Els valors perduts es tracten de manera explícita. Una neurona d'aquesta classe s'anomena neurona heterogènia i qualsevol arquitectura neuronal que en faci ús serà una Xarxa Neuronal Heterogènia.En aquest treball ens concentrem en xarxes neuronals endavant, com focus inicial d'estudi. Els algorismes d'aprenentatge són basats en algorisms evolutius, especialment extesos per treballar amb informació heterogènia. En aquesta tesi es descriu com una certa classe de neurones heterogènies porten a xarxes neuronals que mostren un rendiment molt satisfactori, comparable o superior al de xarxes neuronals tradicionals (com el perceptró multicapa ó la xarxa de base radial), molt especialment en presència d'informació heterogènia, usual en les bases de dades actuals.This work presents a class of functions serving as generalized neuron models to be used in artificial neural networks. They are cast into the common framework of computing a similarity function, a flexible definition of a neuron as a pattern recognizer. The similarity endows the model with a clear conceptual view and serves as a unification cover for many of the existing neural models, including those classically used for the MultiLayer Perceptron (MLP) and most of those used in Radial Basis Function Networks (RBF). These families of models are conceptually unified and their relation is clarified. The possibilities of deriving new instances are explored and several neuron models --representative of their families-- are proposed. The similarity view naturally leads to further extensions of the models to handle heterogeneous information, that is to say, information coming from sources radically different in character, including continuous and discrete (ordinal) numerical quantities, nominal (categorical) quantities, and fuzzy quantities. Missing data are also explicitly considered. A neuron of this class is called an heterogeneous neuron and any neural structure making use of them is an Heterogeneous Neural Network (HNN), regardless of the specific architecture or learning algorithm. Among them, in this work we concentrate on feed-forward networks, as the initial focus of study. The learning procedures may include a great variety of techniques, basically divided in derivative-based methods (such as the conjugate gradient)and evolutionary ones (such as variants of genetic algorithms).In this Thesis we also explore a number of directions towards the construction of better neuron models --within an integrant envelope-- more adapted to the problems they are meant to solve.It is described how a certain generic class of heterogeneous models leads to a satisfactory performance, comparable, and often better, to that of classical neural models, especially in the presence of heterogeneous information, imprecise or incomplete data, in a wide range of domains, most of them corresponding to real-world problems.Postprint (published version

    Context-Specific Preference Learning of One Dimensional Quantitative Geospatial Attributes Using a Neuro-Fuzzy Approach

    Get PDF
    Change detection is a topic of great importance for modern geospatial information systems. Digital aerial imagery provides an excellent medium to capture geospatial information. Rapidly evolving environments, and the availability of increasing amounts of diverse, multiresolutional imagery bring forward the need for frequent updates of these datasets. Analysis and query of spatial data using potentially outdated data may yield results that are sometimes invalid. Due to measurement errors (systematic, random) and incomplete knowledge of information (uncertainty) it is ambiguous if a change in a spatial dataset has really occurred. Therefore we need to develop reliable, fast, and automated procedures that will effectively report, based on information from a new image, if a change has actually occurred or this change is simply the result of uncertainty. This thesis introduces a novel methodology for change detection in spatial objects using aerial digital imagery. The uncertainty of the extraction is used as a quality estimate in order to determine whether change has occurred. For this goal, we develop a fuzzy-logic system to estimate uncertainty values fiom the results of automated object extraction using active contour models (a.k.a. snakes). The differential snakes change detection algorithm is an extension of traditional snakes that incorporates previous information (i.e., shape of object and uncertainty of extraction) as energy functionals. This process is followed by a procedure in which we examine the improvement of the uncertainty at the absence of change (versioning). Also, we introduce a post-extraction method for improving the object extraction accuracy. In addition to linear objects, in this thesis we extend differential snakes to track deformations of areal objects (e.g., lake flooding, oil spills). From the polygonal description of a spatial object we can track its trajectory and areal changes. Differential snakes can also be used as the basis for similarity indices for areal objects. These indices are based on areal moments that are invariant under general affine transformation. Experimental results of the differential snakes change detection algorithm demonstrate their performance. More specifically, we show that the differential snakes minimize the false positives in change detection and track reliably object deformations

    Data mining using neural networks

    Get PDF
    Data mining is about the search for relationships and global patterns in large databases that are increasing in size. Data mining is beneficial for anyone who has a huge amount of data, for example, customer and business data, transaction, marketing, financial, manufacturing and web data etc. The results of data mining are also referred to as knowledge in the form of rules, regularities and constraints. Rule mining is one of the popular data mining methods since rules provide concise statements of potentially important information that is easily understood by end users and also actionable patterns. At present rule mining has received a good deal of attention and enthusiasm from data mining researchers since rule mining is capable of solving many data mining problems such as classification, association, customer profiling, summarization, segmentation and many others. This thesis makes several contributions by proposing rule mining methods using genetic algorithms and neural networks. The thesis first proposes rule mining methods using a genetic algorithm. These methods are based on an integrated framework but capable of mining three major classes of rules. Moreover, the rule mining processes in these methods are controlled by tuning of two data mining measures such as support and confidence. The thesis shows how to build data mining predictive models using the resultant rules of the proposed methods. Another key contribution of the thesis is the proposal of rule mining methods using supervised neural networks. The thesis mathematically analyses the Widrow-Hoff learning algorithm of a single-layered neural network, which results in a foundation for rule mining algorithms using single-layered neural networks. Three rule mining algorithms using single-layered neural networks are proposed for the three major classes of rules on the basis of the proposed theorems. The thesis also looks at the problem of rule mining where user guidance is absent. The thesis proposes a guided rule mining system to overcome this problem. The thesis extends this work further by comparing the performance of the algorithm used in the proposed guided rule mining system with Apriori data mining algorithm. Finally, the thesis studies the Kohonen self-organization map as an unsupervised neural network for rule mining algorithms. Two approaches are adopted based on the way of self-organization maps applied in rule mining models. In the first approach, self-organization map is used for clustering, which provides class information to the rule mining process. In the second approach, automated rule mining takes the place of trained neurons as it grows in a hierarchical structure

    Multivariate discretization of continuous valued attributes.

    Get PDF
    The area of Knowledge discovery and data mining is growing rapidly. Feature Discretization is a crucial issue in Knowledge Discovery in Databases (KDD), or Data Mining because most data sets used in real world applications have features with continuously values. Discretization is performed as a preprocessing step of the data mining to make data mining techniques useful for these data sets. This thesis addresses discretization issue by proposing a multivariate discretization (MVD) algorithm. It begins withal number of common discretization algorithms like Equal width discretization, Equal frequency discretization, Naïve; Entropy based discretization, Chi square discretization, and orthogonal hyper planes. After that comparing the results achieved by the multivariate discretization (MVD) algorithm with the accuracy results of other algorithms. This thesis is divided into six chapters, covering a few common discretization algorithms and tests these algorithms on a real world datasets which varying in size and complexity, and shows how data visualization techniques will be effective in determining the degree of complexity of the given data set. We have examined the multivariate discretization (MVD) algorithm with the same data sets. After that we have classified discrete data using artificial neural network single layer perceptron and multilayer perceptron with back propagation algorithm. We have trained the Classifier using the training data set, and tested its accuracy using the testing data set. Our experiments lead to better accuracy results with some data sets and low accuracy results with other data sets, and this is subject ot the degree of data complexity then we have compared the accuracy results of multivariate discretization (MVD) algorithm with the results achieved by other discretization algorithms. We have found that multivariate discretization (MVD) algorithm produces good accuracy results in comparing with the other discretization algorithm

    An Adaptive Landscape Classification Procedure using Geoinformatics and Artificial Neural Networks

    Full text link

    Artificial Intelligence in geospatial analysis: applications of self-organizing maps in the context of geographic information science.

    Get PDF
    A thesis submitted in partial fulfillment of the requirements for the degree of Doctor in Information Management, specialization in Geographic Information SystemsThe size and dimensionality of available geospatial repositories increases every day, placing additional pressure on existing analysis tools, as they are expected to extract more knowledge from these databases. Most of these tools were created in a data poor environment and thus rarely address concerns of efficiency, dimensionality and automatic exploration. In addition, traditional statistical techniques present several assumptions that are not realistic in the geospatial data domain. An example of this is the statistical independence between observations required by most classical statistics methods, which conflicts with the well-known spatial dependence that exists in geospatial data. Artificial intelligence and data mining methods constitute an alternative to explore and extract knowledge from geospatial data, which is less assumption dependent. In this thesis, we study the possible adaptation of existing general-purpose data mining tools to geospatial data analysis. The characteristics of geospatial datasets seems to be similar in many ways with other aspatial datasets for which several data mining tools have been used with success in the detection of patterns and relations. It seems, however that GIS-minded analysis and objectives require more than the results provided by these general tools and adaptations to meet the geographical information scientist‟s requirements are needed. Thus, we propose several geospatial applications based on a well-known data mining method, the self-organizing map (SOM), and analyse the adaptations required in each application to fulfil those objectives and needs. Three main fields of GIScience are covered in this thesis: cartographic representation; spatial clustering and knowledge discovery; and location optimization.(...
    corecore