Search CORE

31 research outputs found

Large-scale Wireless Local-area Network Measurement and Privacy Analysis

Author: Tan Keren
Publication venue: Dartmouth Digital Commons
Publication date: 01/08/2011
Field of study

The edge of the Internet is increasingly becoming wireless. Understanding the wireless edge is therefore important for understanding the performance and security aspects of the Internet experience. This need is especially necessary for enterprise-wide wireless local-area networks (WLANs) as organizations increasingly depend on WLANs for mission- critical tasks. To study a live production WLAN, especially a large-scale network, is a difficult undertaking. Two fundamental difficulties involved are (1) building a scalable network measurement infrastructure to collect traces from a large-scale production WLAN, and (2) preserving user privacy while sharing these collected traces to the network research community. In this dissertation, we present our experience in designing and implementing one of the largest distributed WLAN measurement systems in the United States, the Dartmouth Internet Security Testbed (DIST), with a particular focus on our solutions to the challenges of efficiency, scalability, and security. We also present an extensive evaluation of the DIST system. To understand the severity of some potential trace-sharing risks for an enterprise-wide large-scale wireless network, we conduct privacy analysis on one kind of wireless network traces, a user-association log, collected from a large-scale WLAN. We introduce a machine-learning based approach that can extract and quantify sensitive information from a user-association log, even though it is sanitized. Finally, we present a case study that evaluates the tradeoff between utility and privacy on WLAN trace sanitization

Dartmouth Digital Commons (Dartmouth College)

Reference models for network trace anonymization

Author: Gattani Shantanu
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2008
Field of study

Network security research can benefit greatly from testing environments that are capable of generating realistic, repeatable and configurable background traffic. In order to conduct network security experiments on systems such as Intrusion Detection Systems and Intrusion Prevention Systems, researchers require isolated testbeds capable of recreating actual network environments, complete with infrastructure and traffic details. Unfortunately, due to privacy and flexibility concerns, actual network traffic is rarely shared by organizations as sensitive information, such as IP addresses, device identity and behavioral information can be inferred from the traffic. Trace data anonymization is one solution to this problem. The research community has responded to this sanitization problem with anonymization tools that aim to remove sensitive information from network traces, and attacks on anonymized traces that aim to evaluate the efficacy of the anonymization schemes. However there is continued lack of a comprehensive model that distills all elements of the sanitization problem in to a functional reference model.;In this thesis we offer such a comprehensive functional reference model that identifies and binds together all the entities required to formulate the problem of network data anonymization. We build a new information flow model that illustrates the overly optimistic nature of inference attacks on anonymized traces. We also provide a probabilistic interpretation of the information model and develop a privacy metric for anonymized traces. Finally, we develop the architecture for a highly configurable, multi-layer network trace collection and sanitization tool. In addition to addressing privacy and flexibility concerns, our architecture allows for uniformity of anonymization and ease of data aggregation

Digital Repository @ Iowa State University (ISU)

Recommended from our members

STAND: Sanitization Tool for ANomaly Detection

Author: Cretu Gabriela F.
Keromytis Angelos D.
Stavrou Angelos
Stolfo Salvatore
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

The efficacy of Anomaly Detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these "micro-models"Â to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as "attack-free"Â and "regular"Â as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site

Columbia University Academic Commons

Recommended from our members

Casting Out Demons: Sanitizing Training Data for Anomaly Sensors

Author: Cretu Gabriela F.
Keromytis Angelos D.
Locasto Michael E.
Stavrou Angelos
Stolfo Salvatore J.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

The efficacy of anomaly detection (AD) sensors depends heavily on the quality of the data used to train them. Artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets are dirty; that is, they contain a number of attacks or anomalous events. The size of these high-quality training data sets makes manual removal or labeling of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. We propose extending the training phase of AD sensors (in a manner agnostic to the underlying AD algorithm) to include a sanitization phase. This phase generates multiple models conditioned on small slices of the training data. We use these "micro-models" to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as "attack-free" and "regular" as possible in the absence of absolute ground truth. We also show how a collaborative approach that combines models from different networks or domains can further refine the sanitization process to thwart targeted training or mimicry attacks against a single site

Columbia University Academic Commons

A Critical Look at the Evaluation of Knowledge Graph Question Answering

Author: Linjordet Trond
Publication venue: University of Stavanger, Norway
Publication date: 01/01/2022
Field of study

PhD thesis in Information technologyThe field of information retrieval (IR) is concerned with systems that “make a given stored collection of information items available to a user population” [111]. The way in which information is made available to the user depends on the formulation of this broad concern of IR into specific tasks by which a system should address a user’s information need [85]. The specific IR task also dictates how the user may express their information need. The classic IR task is ad hoc retrieval, where the user issues a query to the system and gets in return a list of documents ranked by estimated relevance of each document to the query [85]. However, it has long been acknowledged that users are often looking for answers to questions, rather than an entire document or ranked list of documents [17, 141]. Question answering (QA) is thus another IR task; it comes in many flavors, but overall consists of taking in a user’s natural language (NL) question and returning an answer. This thesis describes work done within the scope of the QA task. The flavor of QA called knowledge graph question answering (KGQA) is taken as the primary focus, which enables QA with factual questions against structured data in the form of a knowledge graph (KG). This means the KGQA system addresses a structured representation of knowledge rather than—as in other QA flavors—an unstructured prose context. KGs have the benefit that given some identified entities or predicates, all associated properties are available and relationships can be utilized. KGQA then enables users to access structured data using only NL questions and without requiring formal query language expertise. Even so, the construction of satisfactory KGQA systems remains a challenge. Machine learning with deep neural networks (DNNs) is a far more promising approach than manually engineering retrieval models [29, 56, 130]. The current era dominated by DNNs began with seminal work on computer vision, where the deep learning paradigm demonstrated its first cases of “superhuman” performance [32, 71]. Subsequent work in other applications has also demonstrated “superhuman” performance with DNNs [58, 87]. As a result of its early position and hence longer history as a leading application of deep learning, computer vision with DNNs has been bolstered with much work on different approaches towards augmenting [120] or synthesizing [94] additional training data. The difficulty with machine learning approaches to KGQA appears to rest in large part with the limited volume, quality, and variety of available datasets for this task. Compared to labeled image data for computer vision, the problems of data collection, augmentation, and synthesis are only to a limited extent solved for QA, and especially for KGQA. There are few datasets for KGQA overall, and little previous work that has found unsupervised or semi-supervised learning approaches to address the sparsity of data. Instead, neural network approaches to KGQA rely on either fully or weakly supervised learning [29]. We are thus concerned with neural models trained in a supervised setting to perform QA tasks, especially of the KGQA flavor. Given a clear task to delegate to a computational system, it seems clear that we want the task performed as well as possible. However, what methodological elements are important to ensure good system performance within the chosen scope? How should the quality of system performance be assessed? This thesis describes work done to address these overarching questions through a number of more specific research questions. Altogether, we designate the topic of this thesis as KGQA evaluation, which we address in a broad sense, encompassing four subtopics from (1) the impact on performance due to volume of training data provided and (2) the information leakage between training and test splits due to unhygienic data partitioning, through (3) the naturalness of NL questions resulting from a common approach for generating KGQA datasets, to (4) the axiomatic analysis and development of evaluation measures for a specific flavor of the KGQA task. Each of the four subtopics is informed by previous work, but we aim in this thesis to critically examine the assumptions of previous work to uncover, verify, or address weaknesses in current practices surrounding KGQA evaluation

NORA - Norwegian Open Research Archives

UiS Brage

Efficient and Flexible Discovery of PHP Application Vulnerabilities

Author: Backes Michael
Rieck Konrad
Skoruppa Malte
Stock Ben
Yamaguchi Fabian
Publication venue
Publication date: 26/04/2017
Field of study

The Web today is a growing universe of pages and applications teeming with interactive content. The security of such applications is of the utmost importance, as exploits can have a devastating impact on personal and economic levels. The number one programming language in Web applications is PHP, powering more than 80% of the top ten million websites. Yet it was not designed with security in mind, and, today, bears a patchwork of fixes and inconsistently designed functions with often unexpected and hardly predictable behavior that typically yield a large attack surface. Consequently, it is prone to different types of vulnerabilities, such as SQL Injection or Cross-Site Scripting. In this paper, we present an interprocedural analysis technique for PHP applications based on code property graphs that scales well to large amounts of code and is highly adaptable in its nature. We implement our prototype using the latest features of PHP 7, leverage an efficient graph database to store code property graphs for PHP, and subsequently identify different types of Web application vulnerabilities by means of programmable graph traversals. We show the efficacy and the scalability of our approach by reporting on an analysis of 1,854 popular open-source projects, comprising almost 80 million lines of code

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

Collaborative, Trust-Based Security Mechanisms for a National Utility Intranet

Author: Coates Gregory M.
Publication venue: AFIT Scholar
Publication date: 01/09/2007
Field of study

This thesis investigates security mechanisms for utility control and protection networks using IP-based protocol interaction. It proposes flexible, cost-effective solutions in strategic locations to protect transitioning legacy and full IP-standards architectures. It also demonstrates how operational signatures can be defined to enact organizationally-unique standard operating procedures for zero failure in environments with varying levels of uncertainty and trust. The research evaluates layering encryption, authentication, traffic filtering, content checks, and event correlation mechanisms over time-critical primary and backup control/protection signaling to prevent disruption by internal and external malicious activity or errors. Finally, it shows how a regional/national implementation can protect private communities of interest and foster a mix of both centralized and distributed emergency prediction, mitigation, detection, and response with secure, automatic peer-to-peer notifications that share situational awareness across control, transmission, and reliability boundaries and prevent wide-spread, catastrophic power outages

AFTI Scholar (Air Force Institute of Technology)

MOVING BEYOND “THEORY T”: THE CASE OF QUANTUM FIELD THEORY

Author: Li Bihui
Publication venue
Publication date: 22/06/2015
Field of study

A standard approach towards interpreting physical theories proceeds by first identifying the theory with a set of mathematical objects, where such objects are defined according to mathematicians’ standards of rigor. In making this identification, philosophers rule out the relevance of many inferential methods that physicists use, as these often do not meet mathematicians’ standards of rigor. Philosophers thus sanitize physical theories of all math- ematically messy or ambiguous parts before interpreting them. My dissertation argues against this sanitized approach towards interpreting theories using the example of quantum field theory (QFT). When we look at the details of QFT, we find that the mathematical objects it requires differ according to the specific systems the theory is being applied to in ways that advocates of the sanitized approach do not anticipate. Furthermore, the mathematical objects required for successful application are still being developed in some applicational contexts, so it would be unwise to determine in advance which objects constitute the theory. During this ongoing developmental process, physicists interpret the mathematics using strategies that violate the standards of pure mathematics. In contrast to the sanitized approach, these strategies are more sensitive to the ways in which the mathematics required for the relevant contexts is still under development. I argue that these strategies are not merely instrumental. They suggest alternative approaches to interpretation that philosophers should take into account

D-Scholarship@Pitt

Recommended from our members

Learning to de-anonymize social networks

Author: Sharad Kumar
Publication venue: University of Cambridge
Publication date: 15/11/2016
Field of study

Releasing anonymized social network data for analysis has been a popular idea among data providers. Despite evidence to the contrary the belief that anonymization will solve the privacy problem in practice refuses to die. This dissertation contributes to the field of social graph de-anonymization by demonstrating that even automated models can be quite successful in breaching the privacy of such datasets. We propose novel machine-learning based techniques to learn the identities of nodes in social graphs, thereby automating manual, heuristic-based attacks. Our work extends the vast literature of social graph de-anonymization attacks by systematizing them. We present a random-forests based classifier which uses structural node features based on neighborhood degree distribution to predict their similarity. Using these simple and efficient features we design versatile and expressive learning models which can learn the de-anonymization task just from a few examples. Our evaluation establishes their efficacy in transforming de-anonymization to a learning problem. The learning is transferable in that the model can be trained to attack one graph when trained on another. Moving on, we demonstrate the versatility and greater applicability of the proposed model by using it to solve the long-standing problem of benchmarking social graph anonymization schemes. Our framework bridges a fundamental research gap by making cheap, quick and automated analysis of anonymization schemes possible, without even requiring their full description. The benchmark is based on comparison of structural information leakage vs. utility preservation. We study the trade-off of anonymity vs. utility for six popular anonymization schemes including those promising k-anonymity. Our analysis shows that none of the schemes are fit for the purpose. Finally, we present an end-to-end social graph de-anonymization attack which uses the proposed machine learning techniques to recover node mappings across intersecting graphs. Our attack enhances the state of art in graph de-anonymization by demonstrating better performance than all the other attacks including those that use seed knowledge. The attack is seedless and heuristic free, which demonstrates the superiority of machine learning techniques as compared to hand-selected parametric attacks

Apollo (Cambridge)