585 research outputs found
Data granulation by the principles of uncertainty
Researches in granular modeling produced a variety of mathematical models,
such as intervals, (higher-order) fuzzy sets, rough sets, and shadowed sets,
which are all suitable to characterize the so-called information granules.
Modeling of the input data uncertainty is recognized as a crucial aspect in
information granulation. Moreover, the uncertainty is a well-studied concept in
many mathematical settings, such as those of probability theory, fuzzy set
theory, and possibility theory. This fact suggests that an appropriate
quantification of the uncertainty expressed by the information granule model
could be used to define an invariant property, to be exploited in practical
situations of information granulation. In this perspective, a procedure of
information granulation is effective if the uncertainty conveyed by the
synthesized information granule is in a monotonically increasing relation with
the uncertainty of the input data. In this paper, we present a data granulation
framework that elaborates over the principles of uncertainty introduced by
Klir. Being the uncertainty a mesoscopic descriptor of systems and data, it is
possible to apply such principles regardless of the input data type and the
specific mathematical setting adopted for the information granules. The
proposed framework is conceived (i) to offer a guideline for the synthesis of
information granules and (ii) to build a groundwork to compare and
quantitatively judge over different data granulation procedures. To provide a
suitable case study, we introduce a new data granulation technique based on the
minimum sum of distances, which is designed to generate type-2 fuzzy sets. We
analyze the procedure by performing different experiments on two distinct data
types: feature vectors and labeled graphs. Results show that the uncertainty of
the input data is suitably conveyed by the generated type-2 fuzzy set models.Comment: 16 pages, 9 figures, 52 reference
Role based behavior analysis
Tese de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de CiĂŞncias, 2009Nos nossos dias, o sucesso de uma empresa depende da sua agilidade e capacidade de se adaptar a condições que se alteram rapidamente. Dois requisitos para esse sucesso sĂŁo trabalhadores proactivos e uma infra-estrutura ágil de Tecnologias de InformacĂŁo/Sistemas de Informação (TI/SI) que os consiga suportar. No entanto, isto nem sempre sucede. Os requisitos dos utilizadores ao nĂvel da rede podem nao ser completamente conhecidos, o que causa atrasos nas mudanças de local e reorganizações. AlĂ©m disso, se nĂŁo houver um conhecimento preciso dos requisitos, a infraestrutura de TI/SI poderá ser utilizada de forma ineficiente, com excessos em algumas áreas e deficiĂŞncias noutras. Finalmente, incentivar a proactividade nĂŁo implica acesso completo e sem restrições, uma vez que pode deixar os sistemas vulneráveis a ameaças externas e internas. O objectivo do trabalho descrito nesta tese Ă© desenvolver um sistema que consiga caracterizar o comportamento dos utilizadores do ponto de vista da rede. Propomos uma arquitectura de sistema modular para extrair informação de fluxos de rede etiquetados. O processo Ă© iniciado com a criação de perfis de utilizador a partir da sua informação de fluxos de rede. Depois, perfis com caracterĂsticas semelhantes sĂŁo agrupados automaticamente, originando perfis de grupo. Finalmente, os perfis individuais sĂŁo comprados com os perfis de grupo, e os que diferem significativamente sĂŁo marcados como anomalias para análise detalhada posterior. Considerando esta arquitectura, propomos um modelo para descrever o comportamento de rede dos utilizadores e dos grupos. Propomos ainda mĂ©todos de visualização que permitem inspeccionar rapidamente toda a informação contida no modelo. O sistema e modelo foram avaliados utilizando um conjunto de dados reais obtidos de um operador de telecomunicações. Os resultados confirmam que os grupos projectam com precisĂŁo comportamento semelhante. AlĂ©m disso, as anomalias foram as esperadas, considerando a população subjacente. Com a informação que este sistema consegue extrair dos dados em bruto, as necessidades de rede dos utilizadores podem sem supridas mais eficazmente, os utilizadores suspeitos sĂŁo assinalados para posterior análise, conferindo uma vantagem competitiva a qualquer empresa que use este sistema.In our days, the success of a corporation hinges on its agility and ability to adapt to fast changing conditions. Proactive workers and an agile IT/IS infrastructure that can support them is a requirement for this success. Unfortunately, this is not always the case. The user’s network requirements may not be fully understood, which slows down relocation and reorganization. Also, if there is no grasp on the real requirements, the IT/IS infrastructure may not be efficiently used, with waste in some areas and deficiencies in others. Finally, enabling proactivity does not mean full unrestricted access, since this may leave the systems vulnerable to outsider and insider threats. The purpose of the work described on this thesis is to develop a system that can characterize user network behavior. We propose a modular system architecture to extract information from tagged network flows. The system process begins by creating user profiles from their network flows’ information. Then, similar profiles are automatically grouped into clusters, creating role profiles. Finally, the individual profiles are compared against the roles, and the ones that differ significantly are flagged as anomalies for further inspection. Considering this architecture, we propose a model to describe user and role network behavior. We also propose visualization methods to quickly inspect all the information contained in the model. The system and model were evaluated using a real dataset from a large telecommunications operator. The results confirm that the roles accurately map similar behavior. The anomaly results were also expected, considering the underlying population. With the knowledge that the system can extract from the raw data, the users network needs can be better fulfilled, the anomalous users flagged for inspection, giving an edge in agility for any company that uses it
Aggregation of classifiers: a justifiable information granularity approach.
In this paper, we introduced a new approach of combining multiple classifiers in a heterogeneous ensemble system. Instead of using numerical membership values when combining, we constructed interval membership values for each class prediction from the meta-data of observation by using the concept of information granule. In the proposed method, the uncertainty (diversity) of the predictions produced by the base classifiers is quantified by the interval-based information granules. The decision model is then generated by considering both bound and length of the intervals. Extensive experimentation using the UCI datasets has demonstrated the superior performance of our algorithm over other algorithms including six fixed combining methods, one trainable combining method, AdaBoost, bagging, and random subspace
A multi-objective optimization approach for the synthesis of granular computing-based classification systems in the graph domain
The synthesis of a pattern recognition system usually aims at the optimization of a given performance index. However, in many real-world scenarios, there exist other desired facets to take into account. In this regard, multi-objective optimization acts as the main tool for the optimization of different (and possibly conflicting) objective functions in order to seek for potential trade-offs among them. In this paper, we propose a three-objective optimization problem for the synthesis of a granular computing-based pattern recognition system in the graph domain. The core pattern recognition engine searches for suitable information granules (i.e., recurrent and/or meaningful subgraphs from the training data) on the top of which the graph embedding procedure towards the Euclidean space is performed. In the latter, any classification system can be employed. The optimization problem aims at jointly optimizing the performance of the classifier, the number of information granules and the structural complexity of the classification model. Furthermore, we address the problem of selecting a suitable number of solutions from the resulting Pareto Fronts in order to compose an ensemble of classifiers to be tested on previously unseen data. To perform such selection, we employed a multi-criteria decision making routine by analyzing different case studies that differ on how much each objective function weights in the ranking process. Results on five open-access datasets of fully labeled graphs show that exploiting the ensemble is effective (especially when the structural complexity of the model plays a minor role in the decision making process) if compared against the baseline solution that solely aims at maximizing the performances
Recommended from our members
Granular computing approach for intelligent classifier design
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Granular computing facilitates dealing with information by providing a theoretical framework to deal with information as granules at different levels of granularity (different levels of specificity/abstraction). It aims to provide an abstract explainable description of the data by forming granules that represent the features or the
underlying structure of corresponding subsets of the data. In this thesis, a granular computing approach to the design of intelligent classification systems is proposed. The proposed approach is employed for different
classification systems to investigate its efficiency. Fuzzy inference systems, neural networks, neuro-fuzzy systems and classifier ensembles are considered to evaluate the efficiency of the proposed approach. Each of the considered systems is designed using the proposed approach and classification performance is evaluated and compared to that of the standard system. The proposed approach is based on constructing information granules from data at multiple levels of granularity. The granulation process is performed using a modified fuzzy c-means algorithm that takes classification problem into account. Clustering is followed by a coarsening process that involves merging small clusters into large ones to form a lower granularity level. The resulted granules are used to build each of the considered binary classifiers in different settings and approaches.
Granules produced by the proposed granulation method are used to build a fuzzy classifier for each granulation level or set of levels. The performance of the classifiers is evaluated using real life data sets and measured by two classification performance measures: accuracy and area under receiver operating characteristic curve. Experimental results show that fuzzy systems constructed using the proposed method achieved better classification performance. In addition, the proposed approach is used for the design of neural network classifiers. Resulted granules from one or more granulation levels are used to train the classifiers at different levels of specificity/abstraction. Using this approach, the classification problem is broken down into the modelling of classification rules represented by the information granules resulting in more interpretable system. Experimental results show that neural network classifiers trained using the proposed approach have better classification performance for most of the data sets. In a similar manner, the proposed approach is used for the training of neuro-fuzzy systems resulting in similar improvement in classification performance. Lastly, neural networks built using the proposed approach are used to construct a classifier ensemble. Information granules are used to generate and train the base classifiers. The final ensemble output is produced by a weighted sum combiner. Based on the experimental results, the proposed approach has improved the classification performance of the base classifiers for most of the data sets. Furthermore, a genetic algorithm is used to determine the combiner weights automatically.Higher Committee for Education Development in Iraq (HCED
CHAracterization of Relevant Attributes using Cyber Trajectory Similarities
On secure networks, even sophisticated cyber hackers must perform multiple steps to implement attacks on sensitive data and critical servers hidden behind layers of firewalls. Therefore, there is a need to study these attacks at a higher multi-stage level. Traditional taxonomy of cyber attacks focuses on analyzing the final stage and overall effects of an attack but, not the characteristics of an attack movement or `trajectory\u27 on a network.
This work proposes to investigate trajectory similarities between multi-stage attacks, allowing for the characterization of both a hacker\u27s behavior and vulnerable attack paths within a network.
Currently, Intrusion Detection Systems (IDS) report alerts to a network analyst when a malicious activity is suspected to have occurred on a network. Previous work in this field has used IDS alerts as evidence of multi-stage attacks, and has thus been able to group correlated alerts into cyber attack tracks.
The main contribution of this work is to use a revised Longest Common Subsequence(LCS) algorithm to analyze attack tracks as trajectories. This allows a systematic analysis to determine which alert attributes within a track are of great value to the characterization of multi-stage attacks.
The basic LCS algorithm, which looks for the longest common sequence in two strings of data, is extended to support the non-uniformity of alert data using a time windowing system.
In addition, a normalization method will be applied to ensure that the attack track similarity measure is not adversely affected by differences in attack track length. By applying the revised LCS algorithm, attack trajectories defined in terms of various IDS alert attributes are analyzed. The results provide strong indicators of how multidimensional cyber attack trajectories can be used to differentiate attack tracks
A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.
The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic
A method for re-modularising legacy code
This thesis proposes a method for the re-modularisation of legacy COBOL. Legacy code often performs a number of functions that if split, would improve software maintainability. For instance, program comprehension would benefit from a reduction in the size of the code modules. The method aims to identify potential reuse candidates from the functions re-modularised, and to ensure clear interfaces are present between the new modules. Furthermore, functionality is often replicated across applications and so the re-modularisation process can also seek to reduce commonality and hence the overall amount of a company's code requiring maintenance. A 10 step method is devised which assembles a number of new and existing techniques into an approach suitable for use by staff not having significant reengineering experience. Three main approaches are used throughout the method; that is the analysis of the PERFORM structure, the analysis of the data, and the use of graphical representations. Both top-down and bottom-up strategies to program comprehension are incorporated within the method as are automatable, and user controlled processes to reuse candidate selection. Three industrial case studies are used to demonstrate and evaluate the method. The case studies range in size to gain an indication of the scalability of the method. The case studies are used to evaluate the method on a step by step basis; both strong points and deficiencies are identified, as well as potential solutions to the deficiencies. A review is also presented to assesses the three main approaches of the methods; the analysis of the PERFORM and data structures, and the use of graphical representations. The review uses the process of software evolution for its evaluation using successive versions of COBOL software. The method is retrospectively applied to the earliest version and the known changes identified from the following versions are used to evaluate the re-modularisations. Within the evaluation chapters a new link within the dominance tree is proposed as is an approach for dealing with multiple dominance trees. The results show that «ach approach provides an important contribution to the method as well as giving a useful insight (in the form of graphical representations) of the process of software evolution
- …