781 research outputs found

    On Observability and Monitoring of Distributed Systems: An Industry Interview Study

    Full text link
    Business success of companies heavily depends on the availability and performance of their client applications. Due to modern development paradigms such as DevOps and microservice architectural styles, applications are decoupled into services with complex interactions and dependencies. Although these paradigms enable individual development cycles with reduced delivery times, they cause several challenges to manage the services in distributed systems. One major challenge is to observe and monitor such distributed systems. This paper provides a qualitative study to understand the challenges and good practices in the field of observability and monitoring of distributed systems. In 28 semi-structured interviews with software professionals we discovered increasing complexity and dynamics in that field. Especially observability becomes an essential prerequisite to ensure stable services and further development of client applications. However, the participants mentioned a discrepancy in the awareness regarding the importance of the topic, both from the management as well as from the developer perspective. Besides technical challenges, we identified a strong need for an organizational concept including strategy, roles and responsibilities. Our results support practitioners in developing and implementing systematic observability and monitoring for distributed systems

    An adaptive anomaly request detection framework based on dynamic web application profiles

    Get PDF
    Web application firewall is a highly effective application in protecting the application layer and database layer of websites from attack access. This paper proposes a new web application firewall deploying method based on Dynamic Web application profiling (DWAP) analysis technique. This is a method to deploy a firewall based on analyzing website access data. DWAP is improved to integrate deeply into the structure of the website to increase the compatibility of the anomaly detection system into each website, thereby improving the ability to detect abnormal requests. To improve the compatibility of the web application firewall with protected objects, the proposed system consists of two parts with the main tasks are: i) Detect abnormal access in web application (WA) access; ii) Semi-automatic update the attack data to the abnormal access detection system during WA access. This new method is applicable in real-time detection systems where updating of new attack data is essential since web attacks are increasingly complex and sophisticated

    Doctor of Philosophy

    Get PDF
    dissertationThe next generation mobile network (i.e., 5G network) is expected to host emerging use cases that have a wide range of requirements; from Internet of Things (IoT) devices that prefer low-overhead and scalable network to remote machine operation or remote healthcare services that require reliable end-to-end communications. Improving scalability and reliability is among the most important challenges of designing the next generation mobile architecture. The current (4G) mobile core network heavily relies on hardware-based proprietary components. The core networks are expensive and therefore are available in limited locations in the country. This leads to a high end-to-end latency due to the long latency between base stations and the mobile core, and limitations in having innovations and an evolvable network. Moreover, at the protocol level the current mobile network architecture was designed for a limited number of smart-phones streaming a large amount of high quality traffic but not a massive number of low-capability devices sending small and sporadic traffic. This results in high-overhead control and data planes in the mobile core network that are not suitable for a massive number of future Internet-of-Things (IoT) devices. In terms of reliability, network operators already deployed multiple monitoring sys- tems to detect service disruptions and fix problems when they occur. However, detecting all service disruptions is challenging. First, there is a complex relationship between the network status and user-perceived service experience. Second, service disruptions could happen because of reasons that are beyond the network itself. With technology advancements in Software-defined Network (SDN) and Network Func- tion Virtualization (NFV), the next generation mobile network is expected to be NFV-based and deployed on NFV platforms. However, in contrast to telecom-grade hardware with built-in redundancy, commodity off-the-shell (COTS) hardware in NFV platforms often can't be comparable in term of reliability. Availability of Telecom-grade mobile core network hardwares is typically 99.999% (i.e., "five-9s" availability) while most NFV platforms only guarantee "three-9s" availability - orders of magnitude less reliable. Therefore, an NFV-based mobile core network needs extra mechanisms to guarantee its availability. This Ph.D. dissertation focuses on using SDN/NFV, data analytics and distributed system techniques to enhance scalability and reliability of the next generation mobile core network. The dissertation makes the following contributions. First, it presents SMORE, a practical offloading architecture that reduces end-to-end latency and enables new functionalities in mobile networks. It then presents SIMECA, a light-weight and scalable mobile core network designed for a massive number of future IoT devices. Second, it presents ABSENCE, a passive service monitoring system using customer usage and data analytics to detect silent failures in an operational mobile network. Lastly, it presents ECHO, a distributed mobile core network architecture to improve availability of NFV-based mobile core network in public clouds

    DISco: a Distributed Information Store for network Challenges and their Outcome

    Full text link
    We present DISco, a storage and communication middleware designed to enable distributed and task-centric autonomic control of networks. DISco is designed to enable multi-agent identification of anomalous situations -- so-called "challenges" -- and assist coordinated remediation that maintains degraded -- but acceptable -- service level, while keeping a track of the challenge evolution in order to enable human-assisted diagnosis of flaws in the network. We propose to use state-of-art peer-to-peer publish/subscribe and distributed storage as core building blocks for the DISco service

    Distributed services across the network from edge to core

    Get PDF
    The current internet architecture is evolving from a simple carrier of bits to a platform able to provide multiple complex services running across the entire Network Service Provider (NSP) infrastructure. This calls for increased flexibility in resource management and allocation to provide dedicated, on-demand network services, leveraging a distributed infrastructure consisting of heterogeneous devices. More specifically, NSPs rely on a plethora of low-cost Customer Premise Equipment (CPE), as well as more powerful appliances at the edge of the network and in dedicated data-centers. Currently a great research effort is spent to provide this flexibility through Fog computing, Network Functions Virtualization (NFV), and data plane programmability. Fog computing or Edge computing extends the compute and storage capabilities to the edge of the network, closer to the rapidly growing number of connected devices and applications that consume cloud services and generate massive amounts of data. A complementary technology is NFV, a network architecture concept targeting the execution of software Network Functions (NFs) in isolated Virtual Machines (VMs), potentially sharing a pool of general-purpose hosts, rather than running on dedicated hardware (i.e., appliances). Such a solution enables virtual network appliances (i.e., VMs executing network functions) to be provisioned, allocated a different amount of resources, and possibly moved across data centers in little time, which is key in ensuring that the network can keep up with the flexibility in the provisioning and deployment of virtual hosts in today’s virtualized data centers. Moreover, recent advances in networking hardware have introduced new programmable network devices that can efficiently execute complex operations at line rate. As a result, NFs can be (partially or entirely) folded into the network, speeding up the execution of distributed services. The work described in this Ph.D. thesis aims at showing how various network services can be deployed throughout the NSP infrastructure, accommodating to the different hardware capabilities of various appliances, by applying and extending the above-mentioned solutions. First, we consider a data center environment and the deployment of (virtualized) NFs. In this scenario, we introduce a novel methodology for the modelization of different NFs aimed at estimating their performance on different execution platforms. Moreover, we propose to extend the traditional NFV deployment outside of the data center to leverage the entire NSP infrastructure. This can be achieved by integrating native NFs, commonly available in low-cost CPEs, with an existing NFV framework. This facilitates the provision of services that require NFs close to the end user (e.g., IPsec terminator). On the other hand, resource-hungry virtualized NFs are run in the NSP data center, where they can take advantage of the superior computing and storage capabilities. As an application, we also present a novel technique to deploy a distributed service, specifically a web filter, to leverage both the low latency of a CPE and the computational power of a data center. We then show that also the core network, today dedicated solely to packet routing, can be exploited to provide useful services. In particular, we propose a novel method to provide distributed network services in core network devices by means of task distribution and a seamless coordination among the peers involved. The aim is to transform existing network nodes (e.g., routers, switches, access points) into a highly distributed data acquisition and processing platform, which will significantly reduce the storage requirements at the Network Operations Center and the packet duplication overhead. Finally, we propose to use new programmable network devices in data center networks to provide much needed services to distributed applications. By offloading part of the computation directly to the networking hardware, we show that it is possible to reduce both the network traffic and the overall job completion time

    Investigating system intrusions with data provenance analytics

    Get PDF
    To aid threat detection and investigation, enterprises are increasingly relying on commercially available security solutions, such as Intrusion Detection Systems (IDS) and Endpoint Detection and Response (EDR) tools. These security solutions first collect and analyze audit logs throughout the enterprise and then generate threat alerts when suspicious activities occur. Later, security analysts investigate those threat alerts to separate false alarms from true attacks by extracting contextual history from the audit logs, i.e., the trail of events that caused the threat alert. Unfortunately, investigating threats in enterprises is a notoriously difficult task, even for expert analysts, due to two main challenges. First, existing enterprise security solutions are optimized to miss as few threats as possible – as a result, they generate an overwhelming volume of false alerts, creating a backlog of investigation tasks. Second, modern computing systems are operationally complex that produce an enormous volume of audit logs per day, making it difficult to correlate events for threats that span across multiple processes, applications, and hosts. In this dissertation, I propose leveraging data provenance analytics to address the challenges mentioned above. I present five provenance-based techniques that enable system defenders to effectively and efficiently investigate malicious behaviors in enterprise settings. First, I present NoDoze, an alert triage system that automatically prioritizes generated alerts based on their anomalous contextual history. Following that, RapSheet brings benefits of data provenance to commercial EDR tools and provides compact visualization of multi-stage attacks to system defenders. Swift then realized a provenance graph database that generates contextual history around generated alerts in real-time even when analyzing audit logs containing tens of millions of events. Finally, OmegaLog and Zeek Agent introduced the vision of universal provenance analysis, which unifies all forensically relevant provenance information on the system regardless of their layer of origin, improving investigation capabilities

    Isolation of malicious external inputs in a security focused adaptive execution environment

    Get PDF
    pre-printReliable isolation of malicious application inputs is necessary for preventing the future success of an observed novel attack after the initial incident. In this paper we describe, measure and analyze, Input-Reduction, a technique that can quickly isolate malicious external inputs that embody unforeseen and potentially novel attacks, from other benign application inputs. The Input-Reduction technique is integrated into an advanced, security-focused, and adaptive execution environment that automates diagnosis and repair. In experiments we show that Input-Reduction is highly accurate and efficient in isolating attack inputs and determining casual relations between inputs. We also measure and show that the cost incurred by key services that support reliable reproduction and fast attack isolation is reasonable in the adaptive execution environment
    • …
    corecore