1,547 research outputs found

    Automatic network configuration with dynamic churn prediction

    Full text link
    Peer-to-Peer (P2P) systems have been deployed on millions of nodes worldwide in environments that range from static to very dynamic and therefore exhibit different churn levels. Typically, P2P systems introduce redundancy to cope with loss of nodes. In distributed hash tables, redundancy often fixed during development or at initial deployment of the system. This can limit the applicability of the system to stable environments or make them inefficient in such environments. Automatic network configuration can make a system more adaptable to changing environments and reduce manual configuration tasks. Therefore, this paper proposes an automatic replication configuration based on churn prediction that automatically adapts its replication configuration to its environment. The mechanism termed dynamic replication mechanism (dynamic RM) developed and evaluated in this paper is based on exponential moving averages to predict churn that is used itself to determine a replication factor meeting a certain reliability threshold. Simulations with synthetic data and experiments with data from torrent trackers show that the behavior can be predicted accurately in any environment, from low churn rates to diurnal and high churn rates

    Why (and How) Networks Should Run Themselves

    Full text link
    The proliferation of networked devices, systems, and applications that we depend on every day makes managing networks more important than ever. The increasing security, availability, and performance demands of these applications suggest that these increasingly difficult network management problems be solved in real time, across a complex web of interacting protocols and systems. Alas, just as the importance of network management has increased, the network has grown so complex that it is seemingly unmanageable. In this new era, network management requires a fundamentally new approach. Instead of optimizations based on closed-form analysis of individual protocols, network operators need data-driven, machine-learning-based models of end-to-end and application performance based on high-level policy goals and a holistic view of the underlying components. Instead of anomaly detection algorithms that operate on offline analysis of network traces, operators need classification and detection algorithms that can make real-time, closed-loop decisions. Networks should learn to drive themselves. This paper explores this concept, discussing how we might attain this ambitious goal by more closely coupling measurement with real-time control and by relying on learning for inference and prediction about a networked application or system, as opposed to closed-form analysis of individual protocols

    Proceedings of the 4th Student-STAFF Research Conference 2020 School of Computer Science and Engineering SSRC2020

    Get PDF
    This volume contains the proceedings of the 4th Student-STAFF Research Conference of the School of Computer Science and Engineering (SSRC2020). This is a traditional, annual forum which brings together, for an one-day intensive programme, established and young researchers from different areas of research, doctoral researchers, postgraduate and undergraduate alumni, and covers both traditional and emerging topics, disseminates achieved results or work in progress. During informal discussions at conference sessions, the attendees share their research findings with an open audience of academics, doctoral, postgraduate and undergraduate students. The SSRCS2020 was held on-line. The specifics of this year's conference was the participation of alumni from the Informatics Institute of Technology (IIT Sri Lanka) and Westminster International University in Tashkent (WIUT, Uzbekistan). The event met great interest - it had more than 200 on-line participants, with one session accommodating the audience of 156! The presenters whether they are established researchers or just at the start of their career, not only share their work but also gain invaluable feedback during the conference sessions. Twenty one abstracts of the Proceedings contributed by the speakers at the SSRC2020 are assembled in order of their presentation at the conference. The abstracts cover a wide spectre of topics including the development of on-line knowledge and learning repositories, data analysis, applications of machine learning in fraud detection, bankruptcy prediction, patients mortality, image synthesis, graph DB, image analysis for medical diagnostics, mobile app developments, user experience design, wide area networking, adaptive agent algorithms, plagiarism detection, process mining techniques for behavioural patterns, data mining for reablement, Cloud Computing, Networking and linguistic profiling

    Built to Last or Built Too Fast? Evaluating Prediction Models for Build Times

    Full text link
    Automated builds are integral to the Continuous Integration (CI) software development practice. In CI, developers are encouraged to integrate early and often. However, long build times can be an issue when integrations are frequent. This research focuses on finding a balance between integrating often and keeping developers productive. We propose and analyze models that can predict the build time of a job. Such models can help developers to better manage their time and tasks. Also, project managers can explore different factors to determine the best setup for a build job that will keep the build wait time to an acceptable level. Software organizations transitioning to CI practices can use the predictive models to anticipate build times before CI is implemented. The research community can modify our predictive models to further understand the factors and relationships affecting build times.Comment: 4 paged version published in the Proceedings of the IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) Pages 487-490. MSR 201

    Self-management for large-scale distributed systems

    Get PDF
    Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control

    A Prediction Model for Bank Loans Using Agglomerative Hierarchical Clustering with Classification Approach

    Get PDF
    Businesses depend on banks for financing and other services. The success or failure of a company depends in large part on the ability of the industry to identify credit risk. As a result, banks must analyze whether or not a loan application will default in the future. To evaluate if a loan application was eligible for one, financial firms used highly competent personnel in the past. Machine learning algorithms and neural networks have been used to train class-sifters to forecast an individual's credit score based on their prior credit history, preventing loans from being provided to individuals who have failed on their obligations but these machine learning approaches require modification to solve difficulties such as class imbalance, noise, time complexity. Customers leaving a bank to go to a competitor is known as churn. Customers who can be predicted in advance to leave provide a firm an edge in client retention and growth. Banks may use machine learning to predict the behavior of trusted customers by assessing past data. To retain the trust of those clients, they may also introduce several unique deals. This study employed agglomerative hierarchical clustering, Decision Trees, and Random Forest Classification techniques. The data with decision tree obtained an accuracy of 84%, the data with the Random Forest obtained an accuracy of 85% and the clustered data passed through the agglomerative hierarchical clustering obtained an accuracy of 98.3% using random forest classifier and an accuracy of 98.1 % using decision tree classifier

    Candoia: A Platform and an Ecosystem for Building and Deploying Versatile Mining Software Repositories Tools

    Get PDF
    Research on mining software repositories (MSR) has shown great promise during the last decade in solving many challenging software engineering problems. There exists, however, a ‘valley of death’ between these significant innovations in the MSR research and their deployment in practice. The significant cost of converting a prototype to software; need to provide support for a wide variety of tools and technologies e.g. CVS, SVN, Git, Bugzilla, Jira, Issues, etc, to improve applicability; and the high cost of customizing tools to practitioner-specific settings are some key hurdles in transition to practice. We describe Candoia, a platform and an ecosystem that is aimed at bridging this valley of death between innovations in MSR research and their deployment in practice. We have implemented Candoia and provide facilities to build and publish MSR ideas as Candoia apps. Our evaluation demonstrates that Candoia drastically reduces the cost of converting an idea to an app, thus reducing the barrier to transitioning research findings into practice. We also see versatility, in Candoia app’s ability to work with a variety of tools and technologies that the platform supports. Finally, we find that customizing Candoia app to fit project-specific needs is often well within the grasp of developers

    Distributed Correlation-Based Feature Selection in Spark

    Get PDF
    CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data applications. Two versions of the algorithm were implemented and compared using the Apache Spark cluster computing model, currently gaining popularity due to its much faster processing times than Hadoop's MapReduce model. We tested our algorithms on four publicly available datasets, each consisting of a large number of instances and two also consisting of a large number of features. The results show that our algorithms were superior in terms of both time-efficiency and scalability. In leveraging a computer cluster, they were able to handle larger datasets than the non-distributed WEKA version while maintaining the quality of the results, i.e., exactly the same features were returned by our algorithms when compared to the original algorithm available in WEKA.Comment: 25 pages, 5 figure
    • 

    corecore