3,064 research outputs found

    Learning to Customize Network Security Rules

    Full text link
    Security is a major concern for organizations who wish to leverage cloud computing. In order to reduce security vulnerabilities, public cloud providers offer firewall functionalities. When properly configured, a firewall protects cloud networks from cyber-attacks. However, proper firewall configuration requires intimate knowledge of the protected system, high expertise and on-going maintenance. As a result, many organizations do not use firewalls effectively, leaving their cloud resources vulnerable. In this paper, we present a novel supervised learning method, and prototype, which compute recommendations for firewall rules. Recommendations are based on sampled network traffic meta-data (NetFlow) collected from a public cloud provider. Labels are extracted from firewall configurations deemed to be authored by experts. NetFlow is collected from network routers, avoiding expensive collection from cloud VMs, as well as relieving privacy concerns. The proposed method captures network routines and dependencies between resources and firewall configuration. The method predicts IPs to be allowed by the firewall. A grouping algorithm is subsequently used to generate a manageable number of IP ranges. Each range is a parameter for a firewall rule. We present results of experiments on real data, showing ROC AUC of 0.92, compared to 0.58 for an unsupervised baseline. The results prove the hypothesis that firewall rules can be automatically generated based on router data, and that an automated method can be effective in blocking a high percentage of malicious traffic.Comment: 5 pages, 5 figures, one tabl

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    mARC: Memory by Association and Reinforcement of Contexts

    Full text link
    This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

    The Impact of Expert Knowledge on User Behavior in Recommender Systems

    Get PDF
    Using experts in recommender systems can improve the accuracy of recommendations as well as other quality aspects of recommendations. Most studies have tested the impact of expert knowledge in offline tests. However, it is still unclear how user behavior changes when experts are used for recommendation in an online scenario. We therefore deploy a live recommender system based on rules built by employed experts on the video-on-demand platform of a large television network. We find that expert-built rules lead to a similar amount of clip views and platform visits as a standard recommender. However, experts have an influence on the consumed content, focusing users on a few popular categories
    • …
    corecore