32 research outputs found
Data-Driven, Personalized Usable Privacy
We live in the "inverse-privacy" world, where service providers derive insights from users' data that the users do not even know about. This has been fueled by the advancements in machine learning technologies, which allowed providers to go beyond the superficial analysis of users' transactions to the deep inspection of users' content. Users themselves have been facing several problems in coping with this widening information discrepancy. Although the interfaces of apps and websites are generally equipped with privacy indicators (e.g., permissions, policies, ...), this has not been enough to create the counter-effect. We particularly identify three of the gaps that hindered the effectiveness and usability of privacy indicators: - Scale Adaptation: The scale at which service providers are collecting data has been growing on multiple fronts. Users, on the other hand, have limited time, effort, and technological resources to cope with this scale. - Risk Communication: Although providers utilize privacy indicators to announce what and (less often) why they need particular pieces of information, they rarely relay what can be potentially inferred from this data. Without this knowledge, users are less equipped to make informed decisions when they sign in to a site or install an application. - Language Complexity: The information practices of service providers are buried in complex, long privacy policies. Generally, users do not have the time and sometimes the skills to decipher such policies, even when they are interested in knowing particular pieces of it. In this thesis, we approach usable privacy from a data perspective. Instead of static privacy interfaces that are obscure, recurring, or unreadable, we develop techniques that bridge the understanding gap between users and service providers. Towards that, we make the following contributions: - Crowdsourced, data-driven privacy decision-making: In an effort to combat the growing scale of data exposure, we consider the context of files uploaded to cloud services. We propose C3P, a framework for automatically assessing the sensitivity of files, thus enabling realtime, fine-grained policy enforcement on top of unstructured data. - Data-driven app privacy indicators: We introduce PrivySeal, which involves a new paradigm of dynamic, personalized app privacy indicators that bridge the risk under- standing gap between users and providers. Through PrivySeal's online platform, we also study the emerging problem of interdependent privacy in the context of cloud apps and provide a usable privacy indicator to mitigate it. - Automated question answering about privacy practices: We introduce PriBot, the first automated question-answering system for privacy policies, which allows users to pose their questions about the privacy practices of any company with their own language. Through a user study, we show its effectiveness at achieving high accuracy and relevance for users, thus narrowing the complexity gap in navigating privacy policies. A core aim of this thesis is paving the road for a future where privacy indicators are not bound by a specific medium or pre-scripted wording. We design and develop techniques that enable privacy to be communicated effectively in an interface that is approachable to the user. For that, we go beyond textual interfaces to enable dynamic, visual, and hands-free privacy interfaces that are fit for the variety of emerging technologies
280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification
We propose a simple, yet effective, approach towards inducing multilingual
taxonomies from Wikipedia. Given an English taxonomy, our approach leverages
the interlanguage links of Wikipedia followed by character-level classifiers to
induce high-precision, high-coverage taxonomies in other languages. Through
experiments, we demonstrate that our approach significantly outperforms the
state-of-the-art, heuristics-heavy approaches for six languages. As a
consequence of our work, we release presumably the largest and the most
accurate multilingual taxonomic resource spanning over 280 languages
The Curious Case of the PDF Converter that Likes Mozart: Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps
Third party apps that work on top of personal cloud services such as Google
Drive and Dropbox, require access to the user's data in order to provide some
functionality. Through detailed analysis of a hundred popular Google Drive apps
from Google's Chrome store, we discover that the existing permission model is
quite often misused: around two thirds of analyzed apps are over-privileged,
i.e., they access more data than is needed for them to function. In this work,
we analyze three different permission models that aim to discourage users from
installing over-privileged apps. In experiments with 210 real users, we
discover that the most successful permission model is our novel ensemble method
that we call Far-reaching Insights. Far-reaching Insights inform the users
about the data-driven insights that apps can make about them (e.g., their
topics of interest, collaboration and activity patterns etc.) Thus, they seek
to bridge the gap between what third parties can actually know about users and
users perception of their privacy leakage. The efficacy of Far-reaching
Insights in bridging this gap is demonstrated by our results, as Far-reaching
Insights prove to be, on average, twice as effective as the current model in
discouraging users from installing over-privileged apps. In an effort for
promoting general privacy awareness, we deploy a publicly available privacy
oriented app store that uses Far-reaching Insights. Based on the knowledge
extracted from data of the store's users (over 115 gigabytes of Google Drive
data from 1440 users with 662 installed apps), we also delineate the ecosystem
for third-party cloud apps from the standpoint of developers and cloud
providers. Finally, we present several general recommendations that can guide
other future works in the area of privacy for the cloud
Scalable and Secure Aggregation in Distributed Networks
We consider the problem of computing an aggregation function in a
\emph{secure} and \emph{scalable} way. Whereas previous distributed solutions
with similar security guarantees have a communication cost of , we
present a distributed protocol that requires only a communication complexity of
, which we prove is near-optimal. Our protocol ensures perfect
security against a computationally-bounded adversary, tolerates
malicious nodes for any constant (not
depending on ), and outputs the exact value of the aggregated function with
high probability
Data-Driven Privacy Indicators
Third party applications work on top of existing platforms that host usersâ data. Although these apps access this data to provide users with specific services, they can also use it for monetization or profiling purposes. In practice, there is a significant gap between usersâ privacy expectations and the actual access levels of 3rd party apps, which are often over-privileged. Due to weaknesses in the existing privacy indicators, users are generally not well-informed on what data these apps get. Even more, we are witnessing the rise of inverse privacy: 3rd parties collect data that enables them to know information about users that users do not know, cannot remember, or cannot reach. In this paper, we describe our recent experiences with the design and evaluation of Data-Driven Privacy Indicators (DDPIs), an approach attempting to reduce the aforementioned privacy gap. DDPIs are realized through analyzing userâs data by a trusted party (e.g., the app platform) and integrating the analysis results in the privacy indicatorâs interface. We discuss DDPIs in the context of 3rd party apps on cloud platforms, such as Google Drive and Dropbox. Specifically, we present our recent work on Far-reaching Insights, which show users the insights that apps can infer about them (e.g., their topics of interest, collaboration and activity patterns etc.). Then we present History-based insights, a novel privacy indicator which informs the user on what data is already accessible by an app vendor, based on previous app installations by the user or her collaborators. We further discuss future ideas on new DDPIs, and we outline the challenges facing the wide-scale deployment of such indicators
PBCOV: a property-based coverage criterion
Coverage criteria aim at satisfying test requirements and compute metrics values that quantify the adequacy of test suites at revealing defects in programs. Typically, a test requirement is a structural program element, and the coverage metric value represents the percentage of elements covered by a test suite. Empirical studies show that existing criteria might characterize a test suite as highly adequate, while it does not actually reveal some of the existing defects. In other words, existing structural coverage criteria are not always sensitive to the presence of defects. This paper presents PBCOV, a Property-Based COVerage criterion, and empirically demonstrates its effectiveness. Given a program with properties therein, static analysis techniques, such as model checking, leverage formal properties to find defects. PBCOV is a dynamic analysis technique that also leverages properties and is characterized by the following: (a) It considers the state space of first-order logic properties as the test requirements to be covered; (b) it uses logic synthesis to compute the state space; and (c) it is practical, i.e., computable, because it considers an over-approximation of the reachable state space using a cut-based abstraction.We evaluated PBCOV using programs with test suites comprising passing and failing test cases. First, we computed metrics values for PBCOV and structural coverage using the full test suites. Second, in order to quantify the sensitivity of the metrics to the absence of failing test cases, we computed the values for all considered metrics using only the passing test cases. In most cases, the structural metrics exhibited little or no decrease in their values, while PBCOV showed a considerable decrease. This suggests that PBCOV is more sensitive to the absence of failing test cases, i.e., it is more effective at characterizing test suite adequacy to detect defects, and at revealing deficiencies in test suites
ACCOP: Adaptive Cost-Constrained and Delay- Optimized Data Allocation over Parallel Opportunistic Networks
As wireless and mobile technologies are becoming increasingly pervasive, an uninterrupted connectivity in mobile devices is becoming a necessity rather than a luxury. When dealing with challenged networking environments, this necessity becomes harder to achieve in the absence of end-to-end paths from servers to mobiles. One of the main techniques employed to such conditions is to simultaneously use parallel available networks. In this work, we tackle the problem of data allocation to parallel networks in challenged environments, targeting a minimized delay while abiding by user preset budget. We propose ACCOP, an Adaptive, Cost-Constrained, and delay-OPtimized data-to-channel allocation scheme that efficiently exploits parallel channels typically accessible from the mobile devices. Our technique replaces the traditional, inefficient, and brute-force schemes through employing Lagrange multipliers to minimize the delivery delay. Furthermore, we show how ACCOP can dynamically adjust to the changing network conditions. Through analytical and experimental tools, we demonstrate that our system achieves faster delivery and higher performance while remaining computationally inexpensive