3,064 research outputs found
Recommended from our members
Generic system architecture for context-aware, distributed recommendation
In the existing literature on recommender systems, it is difficult to find an architecture for large-scale implementation. Often, the architectures proposed in papers are specific to an algorithm implementation or a domain. Thus, there is no clear architectural starting point for a new recommender system. This paper presents an architecture blueprint for a context-aware recommender system that provides scalability, availability, and security for its users. The architecture also contributes the dynamic ability to switch between single-device (offline), client-server (online), and fully distributed implementations. From this blueprint, a new recommender system could be built with minimal design and implementation effort regardless of the application.Electrical and Computer Engineerin
Learning to Customize Network Security Rules
Security is a major concern for organizations who wish to leverage cloud
computing. In order to reduce security vulnerabilities, public cloud providers
offer firewall functionalities. When properly configured, a firewall protects
cloud networks from cyber-attacks. However, proper firewall configuration
requires intimate knowledge of the protected system, high expertise and
on-going maintenance.
As a result, many organizations do not use firewalls effectively, leaving
their cloud resources vulnerable. In this paper, we present a novel supervised
learning method, and prototype, which compute recommendations for firewall
rules. Recommendations are based on sampled network traffic meta-data (NetFlow)
collected from a public cloud provider. Labels are extracted from firewall
configurations deemed to be authored by experts. NetFlow is collected from
network routers, avoiding expensive collection from cloud VMs, as well as
relieving privacy concerns.
The proposed method captures network routines and dependencies between
resources and firewall configuration. The method predicts IPs to be allowed by
the firewall. A grouping algorithm is subsequently used to generate a
manageable number of IP ranges. Each range is a parameter for a firewall rule.
We present results of experiments on real data, showing ROC AUC of 0.92,
compared to 0.58 for an unsupervised baseline. The results prove the hypothesis
that firewall rules can be automatically generated based on router data, and
that an automated method can be effective in blocking a high percentage of
malicious traffic.Comment: 5 pages, 5 figures, one tabl
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
The Impact of Expert Knowledge on User Behavior in Recommender Systems
Using experts in recommender systems can improve the accuracy of recommendations as well as other quality aspects of recommendations. Most studies have tested the impact of expert knowledge in offline tests. However, it is still unclear how user behavior changes when experts are used for recommendation in an online scenario. We therefore deploy a live recommender system based on rules built by employed experts on the video-on-demand platform of a large television network. We find that expert-built rules lead to a similar amount of clip views and platform visits as a standard recommender. However, experts have an influence on the consumed content, focusing users on a few popular categories
- …