42 research outputs found
Recommended from our members
Symbolic Model Learning: New Algorithms and Applications
In this thesis, we study algorithms which can be used to extract, or learn, formal mathematical models from software systems and then using these models to test whether the given software systems satisfy certain security properties such as robustness against code injection attacks. Specifically, we focus on studying learning algorithms for automata and transducers and the symbolic extensions of these models, namely symbolic finite automata (SFAs). In a high level, this thesis contributes the following results:
1. In the first part of the thesis, we present a unified treatment of many common variations of the seminal L* algorithm for learning deterministic finite automata (DFAs) as a congruence learning algorithm for the underlying Nerode congruence which forms the basis of automata theory. Under this formulation the basic data structures used by different variations are unified as different ways to implement the Nerode congruence using queries.
2. Next, building on the new formulation of L*-style algorithms we proceed to develop new algorithms for learning transducer models. Firstly, we present the first algorithm for learning deterministic partial transducers. Furthermore, we extend my algorithm into non-deterministic models by introducing a novel, generalized congruence relation over string transformations which is able to capture a subclass of string transformations with regular lookahead. We demonstrate that this class is able to capture many practical string transformation from the domain of string sanitizers in Web applications.
3. Classical learning algorithms for automata and transducers operate over finite alphabets and have a query complexity that scales linearly with the size of the alphabet. However, in practice, this dependence on the alphabet size hinders the performance of the algorithms. To address this issue, we develop the MAT* algorithm for learning symbolic finite state automata (SFAs) which operate over infinite alphabets. In practice, the MAT* learning algorithm allow us to plug custom transition learning algorithms which will efficiently infer the predicates in the transitions of the SFA without querying the whole alphabet set.
4. Finally, we use our learning algorithm toolbox as the basis for the development of a set of black-box testing algorithms. More specifically, we present Grammar Oriented Filter Auditing (GOFA), a novel technique which allows one to utilize my learning algorithms to evaluate the robustness of a string sanitizer or filter against a set of attack strings given as a context-free grammar. Furthermore, because such grammars are many times unavailable, we developed sfadiff a differential testing technique based on symbolic automata learning which can be used in order to perform differential testing of two different parser implementations using SFA learning algorithms and we demonstrate how our algorithm can be used to develop program fingerprints. We evaluate our algorithms against state-of-the-art Web Application Firewalls and discover over 15 previously unknown vulnerabilities which result in evading the firewalls and performing code injection attacks in the backend Web application. Finally, we show how our learning algorithms can uncover vulnerabilities which are missed by other black-box methods such as fuzzing and grammar-based testing
DECO: Liberating Web Data Using Decentralized Oracles for TLS
Thanks to the widespread deployment of TLS, users can access private data
over channels with end-to-end confidentiality and integrity. What they cannot
do, however, is prove to third parties the {\em provenance} of such data, i.e.,
that it genuinely came from a particular website. Existing approaches either
introduce undesirable trust assumptions or require server-side modifications.
As a result, the value of users' private data is locked up in its point of
origin. Users cannot export their data with preserved integrity to other
applications without help and permission from the current data holder.
We propose DECO (short for \underline{dec}entralized \underline{o}racle) to
address the above problems. DECO allows users to prove that a piece of data
accessed via TLS came from a particular website and optionally prove statements
about such data in zero-knowledge, keeping the data itself secret. DECO is the
first such system that works without trusted hardware or server-side
modifications.
DECO can liberate data from centralized web-service silos, making it
accessible to a rich spectrum of applications. To demonstrate the power of
DECO, we implement three applications that are hard to achieve without it: a
private financial instrument using smart contracts, converting legacy
credentials to anonymous credentials, and verifiable claims against price
discrimination.Comment: This is the extended version of the CCS'20 pape
Leveraging Client Processing for Location Privacy in Mobile Local Search
Usage of mobile services is growing rapidly. Most Internet-based services targeted for PC based browsers now have mobile counterparts. These mobile counterparts often are enhanced when they use user\u27s location as one of the inputs. Even some PC-based services such as point of interest Search, Mapping, Airline tickets, and software download mirrors now use user\u27s location in order to enhance their services. Location-based services are exactly these, that take the user\u27s location as an input and enhance the experience based on that. With increased use of these services comes the increased risk to location privacy. The location is considered an attribute that user\u27s hold as important to their privacy. Compromise of one\u27s location, in other words, loss of location privacy can have several detrimental effects on the user ranging from trivial annoyance to unreasonable persecution.
More and more companies in the Internet economy rely exclusively on the huge data sets they collect about users. The more detailed and accurate the data a company has about its users, the more valuable the company is considered. No wonder that these companies are often the same companies that offer these services for free. This gives them an opportunity to collect more accurate location information. Research community in the location privacy protection area had to reciprocate by modeling an adversary that could be the service provider itself. To further drive this point, we show that a well-equipped service provider can infer user\u27s location even if the location information is not directly available by using other information he collects about the user.
There is no dearth of proposals of several protocols and algorithms that protect location privacy. A lot of these earlier proposals require a trusted third party to play as an intermediary between the service provider and the user. These protocols use anonymization and/or obfuscation techniques to protect user\u27s identity and/or location. This requirement of trusted third parties comes with its own complications and risks and makes these proposals impractical in real life scenarios. Thus it is preferable that protocols do not require a trusted third party.
We look at existing proposals in the area of private information retrieval. We present a brief survey of several proposals in the literature and implement two representative algorithms. We run experiments using different sizes of databases to ascertain their practicability and performance features. We show that private information retrieval based protocols still have long ways to go before they become practical enough for local search applications.
We propose location privacy preserving mechanisms that take advantage of the processing power of modern mobile devices and provide configurable levels of location privacy. We propose these techniques both in the single query scenario and multiple query scenario. In single query scenario, the user issues a query to the server and obtains the answer. In the multiple query scenario, the user keeps sending queries as she moves about in the area of interest. We show that the multiple query scenario increases the accuracy of adversary\u27s determination of user\u27s location, and hence improvements are needed to cope with this situation. So, we propose an extension of the single query scenario that addresses this riskier multiple query scenario, still maintaining the practicability and acceptable performance when implemented on a modern mobile device. Later we propose a technique based on differential privacy that is inspired by differential privacy in statistical databases. All three mechanisms proposed by us are implemented in realistic hardware or simulators, run against simulated but real life data and their characteristics ascertained to show that they are practical and ready for adaptation.
This dissertation study the privacy issues for location-based services in mobile environment and proposes a set of new techniques that eliminate the need for a trusted third party by implementing efficient algorithms on modern mobile hardware
Computer Aided Verification
This open access two-volume set LNCS 13371 and 13372 constitutes the refereed proceedings of the 34rd International Conference on Computer Aided Verification, CAV 2022, which was held in Haifa, Israel, in August 2022. The 40 full papers presented together with 9 tool papers and 2 case studies were carefully reviewed and selected from 209 submissions. The papers were organized in the following topical sections: Part I: Invited papers; formal methods for probabilistic programs; formal methods for neural networks; software Verification and model checking; hyperproperties and security; formal methods for hardware, cyber-physical, and hybrid systems. Part II: Probabilistic techniques; automata and logic; deductive verification and decision procedures; machine learning; synthesis and concurrency. This is an open access book
Visual Analysis of Variability and Features of Climate Simulation Ensembles
This PhD thesis is concerned with the visual analysis of time-dependent scalar field ensembles as occur in climate simulations.
Modern climate projections consist of multiple simulation runs (ensemble members) that vary in parameter settings and/or initial values, which leads to variations in the resulting simulation data.
The goal of ensemble simulations is to sample the space of possible futures under the given climate model and provide quantitative information about uncertainty in the results.
The analysis of such data is challenging because apart from the spatiotemporal data, also variability has to be analyzed and communicated.
This thesis presents novel techniques to analyze climate simulation ensembles visually.
A central question is how the data can be aggregated under minimized information loss.
To address this question, a key technique applied in several places in this work is clustering.
The first part of the thesis addresses the challenge of finding clusters in the ensemble simulation data.
Various distance metrics lend themselves for the comparison of scalar fields which are explored theoretically and practically.
A visual analytics interface allows the user to interactively explore and compare multiple parameter settings for the clustering and investigate the resulting clusters, i.e. prototypical climate phenomena.
A central contribution here is the development of design principles for analyzing variability in decadal climate simulations, which has lead to a visualization system centered around the new Clustering Timeline.
This is a variant of a Sankey diagram that utilizes clustering results to communicate climatic states over time coupled with ensemble member agreement.
It can reveal
several interesting properties of the dataset, such as:
into how many inherently similar groups the ensemble can be divided at any given time,
whether the ensemble diverges in general,
whether there are different phases in the time lapse, maybe periodicity, or outliers.
The Clustering Timeline is also used to compare multiple climate simulation models and assess their performance.
The Hierarchical Clustering Timeline is an advanced version of the above.
It introduces the concept of a cluster hierarchy that may group the whole dataset down to the individual static scalar fields into clusters of various sizes and densities recording the nesting relationship between them.
One more contribution of this work in terms of visualization research is, that ways are investigated how to practically utilize a hierarchical clustering of time-dependent scalar fields to analyze the data.
To this end, a system of different views is proposed which are linked through various interaction possibilities.
The main advantage of the system is that a dataset can now be inspected at an arbitrary level of detail without having to recompute a clustering with different parameters.
Interesting branches of the simulation can be expanded to reveal smaller differences in critical clusters or folded to show only a coarse representation of the less interesting parts of the dataset.
The last building block of the suit of visual analysis methods developed for this thesis aims at a robust, (largely) automatic detection and tracking of certain features in a scalar field ensemble.
Techniques are presented that I found can identify and track super- and sub-levelsets.
And I derive “centers of action” from these sets which mark the location of extremal climate phenomena that govern the weather (e.g. Icelandic Low and Azores High).
The thesis also presents visual and quantitative techniques to evaluate the temporal change of the positions of these centers; such a displacement would be likely to manifest in changes in weather.
In a preliminary analysis with my collaborators, we indeed observed changes in the loci of the centers of action in a simulation with increased greenhouse gas concentration as compared to pre-industrial concentration levels
A Labelling Technique Comparison for Indexing Large XML Database
The flexibility nature of XML documents has motivated researchers to use it for data transmission and storage in different domains. The hierarchical structure of XML documents is an attractive point to be researched for processing a user query based on labelling where each label describes the node structure in the tree. In this study, three categories of XML node labelling will be analysed to address the open problem of each category. A number of experiments are executed to compare performance of time execution and storage space required for labelling XML tree
A Network Science perspective of Graph Convolutional Networks: A survey
The mining and exploitation of graph structural information have been the
focal points in the study of complex networks. Traditional structural measures
in Network Science focus on the analysis and modelling of complex networks from
the perspective of network structure, such as the centrality measures, the
clustering coefficient, and motifs and graphlets, and they have become basic
tools for studying and understanding graphs. In comparison, graph neural
networks, especially graph convolutional networks (GCNs), are particularly
effective at integrating node features into graph structures via neighbourhood
aggregation and message passing, and have been shown to significantly improve
the performances in a variety of learning tasks. These two classes of methods
are, however, typically treated separately with limited references to each
other. In this work, aiming to establish relationships between them, we provide
a network science perspective of GCNs. Our novel taxonomy classifies GCNs from
three structural information angles, i.e., the layer-wise message aggregation
scope, the message content, and the overall learning scope. Moreover, as a
prerequisite for reviewing GCNs via a network science perspective, we also
summarise traditional structural measures and propose a new taxonomy for them.
Finally and most importantly, we draw connections between traditional
structural approaches and graph convolutional networks, and discuss potential
directions for future research
Recommended from our members
Enabling Data Security and Privacy for Database Services in the Cloud
Substantial advances in cloud technologies have made outsourcing data to the cloud highly beneficial today (e.g., costs savings, scalability, provisioning time). However, strong concerns from private companies and public institutions about the security of the outsourced data still hamper the adoption of cloud solutions. This reluctance is fed by frequent massive data breaches either caused by external attacks against cloud service providers or by negligent or opaque practices from the service provider itself. For broader adoption of cloud services, this dissertation addresses the data security and privacy concerns in the cloud setting. The goal is to ensure security and privacy of outsourced data while maintaining the ability to execute queries efficiently. Security/privacy comes at a cost of functionality/performance. Therefore, we seek for a proper balance in the space of security, privacy, functionality, and performance. This dissertation works the problems of range query execution over encrypted data, privacy preserving data mining in the context of environmental sustainability studies, and access privacy in the cloud. To enable efficient and secure range query processing over traditional databases, we introduce PINED-RQ, a highly efficient and differentially private range query execution framework that constructs a novel differentially private index over an outsourced database. Second, this dissertation presents a comprehensive study of the environmental sustainability metrics. Our contributions in this context are twofold: 1) to better evaluate the environmental impacts of the industrial processes privately, we formally define privacy preserving certification paradigm and develop a framework that enables untrusted third party to certify parties based on a well agreed upon set of criteria. 2) to explore the privacy concerns over publicizing the industrial activities in the form of life cycle assessment (LCA) computations, which is a standard way of evaluating an impact of a product and service. This dissertation initiates a study to explore privacy and security challenges that prevent organizations from making public disclosures about their activities. Finally, this dissertation explores access privacy in the cloud setting. We design and develop TaoStore, a highly efficient and practical cloud data store, which secures data confidentiality and hides access patterns from adversaries. Additionally, we propose a new ORAM security model, called aaob-security, which considers completely asynchronous network communication and concurrent processing of requests. This dissertation shows that it is possible to deliver practical and high-performance data services in the cloud without sacrificing securityand privacy if the requirements of each application are analyzed correctly and a correct balance is found in the space of security, privacy, functionality, and performance