376 research outputs found

    The mining and visualisation of application services data

    Get PDF
    Many network monitoring tools do not provide sufficiently in-depth and useful reports on network usage, particularly in the domain of application services data. The optimisation of network performance is only possible if the networks are monitored effectively. Techniques that identify patterns of network usage can assist in the successful monitoring of network performance. The main goal of this research was to propose a model to mine and visualise application services data in order to support effective network management. To demonstrate the effectiveness of the model, a prototype, called NetPatterns, was developed using data for the Integrated Tertiary Software (ITS) application service collected by a network monitoring tool on the NMMU South Campus network. Three data mining algorithms for application services data were identified for the proposed model. The data mining algorithms used are classification (decision tree), clustering (K-Means) and association (correlation). Classifying application services data serves to categorise combinations of network attributes to highlight areas of poor network performance. The clustering of network attributes serves to indicate sparse and dense regions within the application services data. Association indicates the existence of any interesting relationships between different network attributes. Three visualisation techniques were selected to visualise the results of the data mining algorithms. The visualisation techniques selected were the organisation chart, bubble chart and scatterplots. Colour and a variety of other visual cues are used to complement the selected visualisation techniques. The effectiveness and usefulness of NetPatterns was determined by means of user testing. The results of the evaluation clearly show that the participants were highly satisfied with the visualisation of network usage presented by NetPatterns. All participants successfully completed the prescribed tasks and indicated that NetPatterns is a useful tool for the analysis of network usage patterns

    Run-time Variability with First-class Contexts

    Get PDF
    Software must be regularly updated to keep up with changing requirements. Unfortunately, to install an update, the system must usually be restarted, which is inconvenient and costly. In this dissertation, we aim at overcoming the need for restart by enabling run-time changes at the programming language level. We argue that the best way to achieve this goal is to improve the support for encapsulation, information hiding and late binding by contextualizing behavior. In our approach, behavioral variations are encapsulated into context objects that alter the behavior of other objects locally. We present three contextual language features that demonstrate our approach. First, we present a feature to evolve software by scoping variations to threads. This way, arbitrary objects can be substituted over time without compromising safety. Second, we present a variant of dynamic proxies that operate by delegation instead of forwarding. The proxies can be used as building blocks to implement contextualization mechanisms from within the language. Third, we contextualize the behavior of objects to intercept exchanges of references between objects. This approach scales information hiding from objects to aggregates. The three language features are supported by formalizations and case studies, showing their soundness and practicality. With these three complementary language features, developers can easily design applications that can accommodate run-time changes

    Reconfigurable computing for large-scale graph traversal algorithms

    Get PDF
    This thesis proposes a reconfigurable computing approach for supporting parallel processing in large-scale graph traversal algorithms. Our approach is based on a reconfigurable hardware architecture which exploits the capabilities of both FPGAs (Field-Programmable Gate Arrays) and a multi-bank parallel memory subsystem. The proposed methodology to accelerate graph traversal algorithms has been applied to three case studies, revealing that application-specific hardware customisations can benefit performance. A summary of our four contributions is as follows. First, a reconfigurable computing approach to accelerate large-scale graph traversal algorithms. We propose a reconfigurable hardware architecture which decouples computation and communication while keeping multiple memory requests in flight at any given time, taking advantage of the high bandwidth of multi-bank memory subsystems. Second, a demonstration of the effectiveness of our approach through two case studies: the breadth-first search algorithm, and a graphlet counting algorithm from bioinformatics. Both case studies involve graph traversal, but each of them adopts a different graph data representation. Third, a method for using on-chip memory resources in FPGAs to reduce off-chip memory accesses for accelerating graph traversal algorithms, through a case-study of the All-Pairs Shortest-Paths algorithm. This case study has been applied to process human brain network data. Fourth, an evaluation of an approach based on instruction-set extension for FPGA design against many-core GPUs (Graphics Processing Units), based on a set of benchmarks with different memory access characteristics. It is shown that while GPUs excel at streaming applications, the proposed approach can outperform GPUs in applications with poor locality characteristics, such as graph traversal problems.Open Acces

    Dynamic data placement and discovery in wide-area networks

    Get PDF
    The workloads of online services and applications such as social networks, sensor data platforms and web search engines have become increasingly global and dynamic, setting new challenges to providing users with low latency access to data. To achieve this, these services typically leverage a multi-site wide-area networked infrastructure. Data access latency in such an infrastructure depends on the network paths between users and data, which is determined by the data placement and discovery strategies. Current strategies are static, which offer low latencies upon deployment but worse performance under a dynamic workload. We propose dynamic data placement and discovery strategies for wide-area networked infrastructures, which adapt to the data access workload. We achieve this with data activity correlation (DAC), an application-agnostic approach for determining the correlations between data items based on access pattern similarities. By dynamically clustering data according to DAC, network traffic in clusters is kept local. We utilise DAC as a key component in reducing access latencies for two application scenarios, emphasising different aspects of the problem: The first scenario assumes the fixed placement of data at sites, and thus focusses on data discovery. This is the case for a global sensor discovery platform, which aims to provide low latency discovery of sensor metadata. We present a self-organising hierarchical infrastructure consisting of multiple DAC clusters, maintained with an online and distributed split-and-merge algorithm. This reduces the number of sites visited, and thus latency, during discovery for a variety of workloads. The second scenario focusses on data placement. This is the case for global online services that leverage a multi-data centre deployment to provide users with low latency access to data. We present a geo-dynamic partitioning middleware, which maintains DAC clusters with an online elastic partition algorithm. It supports the geo-aware placement of partitions across data centres according to the workload. This provides globally distributed users with low latency access to data for static and dynamic workloads.Open Acces

    Mining User Behavior in Social Environments

    Get PDF
    The growth of the Web 2.0 has brought to a widespread use of social media systems and to an increasing number of active users. This phenomenon implies that each user interacts with too many users and is overwhelmed by a huge amount of content, leading to the well know “social interaction overload” problem. In order to address this problem several research communities study Social Recommender Systems, which are information filtering systems that operate in the social media domain and aim at suggesting to the users items that are supposed to be interesting for them. Social Recommender Systems usually filter content by exploiting the social graph or by mining the user content. Since the social domain is characterized by a continuous and quick growth of the the amount of content and users, both these approaches face some problems to produce accurate and up-to-date recommendations. This PhD thesis proposes some social recommendation approaches based on the mining of the user behavior, i.e., on the exploitation of the activity of the users in social environments, in order to produce accurate and up-to-date recommendations

    A more efficient document retrieval method for TEXPROS

    Get PDF
    Document processing is a critical element of office automation. Through document classification, extraction and filing, documents are automatically placed into a knowledge base according to certain rules. Document retrieval is a process to get a document back according to a user\u27s requirements and to show the results to the user. Hence, a good user-interface and an efficient retrieval algorithm become core parts of document retrieval. Unlike previous browsers that have been proposed for this purpose, this dissertation develops a new browser that has a user interface with more tools, and one that has a more efficient retrieval algorithm that can deal with a wide variety of retrieval situations. In this dissertation, from the view of an interface, the new browser provides more functions such as zoom in and zoom out , (i.e. automatic scaling of the portion of a graph that is of interest to a user), and help. These functions give users an easier way to view a large graph in one window and provide users with help during the retrieval process. The new browser also provides an algorithm that makes retrieval more efficient by using a reusable base. The Reusable Base is used to hold information that is most related to the user previous desires and the information stored in the Reusable Base is more easily used to form the OP-Net than that in the System Catalog. Hence, it eliminates the need to go to the System Catalog to find the results. This speeds up the retrieval significantly -at least two times faster than without the Reusable Base. Further, the new browser provides information about the folder organization and the document type hierarchy that is in addition to the OP-Net. If users know the type of documents they want, or which folder they are interested in, they can go to the particular document type or the particular folder directly

    A survey of Bayesian Network structure learning

    Get PDF

    LC an effective classification based association rule mining algorithm

    Get PDF
    Classification using association rules is a research field in data mining that primarily uses association rule discovery techniques in classification benchmarks. It has been confirmed by many research studies in the literature that classification using association tends to generate more predictive classification systems than traditional classification data mining techniques like probabilistic, statistical and decision tree. In this thesis, we introduce a novel data mining algorithm based on classification using association called “Looking at the Class” (LC), which can be used in for mining a range of classification data sets. Unlike known algorithms in classification using the association approach such as Classification based on Association rule (CBA) system and Classification based on Predictive Association (CPAR) system, which merge disjoint items in the rule learning step without anticipating the class label similarity, the proposed algorithm merges only items with identical class labels. This saves too many unnecessary items combining during the rule learning step, and consequently results in large saving in computational time and memory. Furthermore, the LC algorithm uses a novel prediction procedure that employs multiple rules to make the prediction decision instead of a single rule. The proposed algorithm has been evaluated thoroughly on real world security data sets collected using an automated tool developed at Huddersfield University. The security application which we have considered in this thesis is about categorizing websites based on their features to legitimate or fake which is a typical binary classification problem. Also, experimental results on a number of UCI data sets have been conducted and the measures used for evaluation is the classification accuracy, memory usage, and others. The results show that LC algorithm outperformed traditional classification algorithms such as C4.5, PART and Naïve Bayes as well as known classification based association algorithms like CBA with respect to classification accuracy, memory usage, and execution time on most data sets we consider

    Ant Colony Optimization

    Get PDF
    Ant Colony Optimization (ACO) is the best example of how studies aimed at understanding and modeling the behavior of ants and other social insects can provide inspiration for the development of computational algorithms for the solution of difficult mathematical problems. Introduced by Marco Dorigo in his PhD thesis (1992) and initially applied to the travelling salesman problem, the ACO field has experienced a tremendous growth, standing today as an important nature-inspired stochastic metaheuristic for hard optimization problems. This book presents state-of-the-art ACO methods and is divided into two parts: (I) Techniques, which includes parallel implementations, and (II) Applications, where recent contributions of ACO to diverse fields, such as traffic congestion and control, structural optimization, manufacturing, and genomics are presented


    Get PDF
    This dissertation is a simulation study of factors and techniques involved in designing hyperlink recommender systems that recommend to users, web pages that past users with similar navigation behaviors found interesting. The methodology involves identification of pertinent factors or techniques, and for each one, addresses the following questions: (a) room for improvement; (b) better approach, if any; and (c) performance characteristics of the technique in environments that hyperlink recommender systems operate in. The following four problems are addressed:Web Page Classification. A new metric (PageRank × Inverse Links-to-Word count ratio) is proposed for classifying web pages as content or navigation, to help in the discovery of user navigation behaviors from web user access logs. Results of a small user study suggest that this metric leads to desirable results.Data Mining. A new apriori algorithm for mining association rules from large databases is proposed. The new algorithm addresses the problem of scaling of the classical apriori algorithm by eliminating an expensive joinstep, and applying the apriori property to every row of the database. In this study, association rules show the correlation relationships between user navigation behaviors and web pages they find interesting. The new algorithm has better space complexity than the classical one, and better time efficiency under some conditionsand comparable time efficiency under other conditions.Prediction Models for User Interests. We demonstrate that association rules that show the correlation relationships between user navigation patterns and web pages they find interesting can be transformed intocollaborative filtering data. We investigate collaborative filtering prediction models based on two approaches for computing prediction scores: using simple averages and weighted averages. Our findings suggest that theweighted averages scheme more accurately computes predictions of user interests than the simple averages scheme does.Clustering. Clustering techniques are frequently applied in the design of personalization systems. We studied the performance of the CLARANS clustering algorithm in high dimensional space in relation to the PAM and CLARA clustering algorithms. While CLARA had the best time performance, CLARANS resulted in clusterswith the lowest intra-cluster dissimilarities, and so was most effective in this regard
