129,404 research outputs found
Extracting corpus specific knowledge bases from Wikipedia
Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval
The Curious Case of the PDF Converter that Likes Mozart: Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps
Third party apps that work on top of personal cloud services such as Google
Drive and Dropbox, require access to the user's data in order to provide some
functionality. Through detailed analysis of a hundred popular Google Drive apps
from Google's Chrome store, we discover that the existing permission model is
quite often misused: around two thirds of analyzed apps are over-privileged,
i.e., they access more data than is needed for them to function. In this work,
we analyze three different permission models that aim to discourage users from
installing over-privileged apps. In experiments with 210 real users, we
discover that the most successful permission model is our novel ensemble method
that we call Far-reaching Insights. Far-reaching Insights inform the users
about the data-driven insights that apps can make about them (e.g., their
topics of interest, collaboration and activity patterns etc.) Thus, they seek
to bridge the gap between what third parties can actually know about users and
users perception of their privacy leakage. The efficacy of Far-reaching
Insights in bridging this gap is demonstrated by our results, as Far-reaching
Insights prove to be, on average, twice as effective as the current model in
discouraging users from installing over-privileged apps. In an effort for
promoting general privacy awareness, we deploy a publicly available privacy
oriented app store that uses Far-reaching Insights. Based on the knowledge
extracted from data of the store's users (over 115 gigabytes of Google Drive
data from 1440 users with 662 installed apps), we also delineate the ecosystem
for third-party cloud apps from the standpoint of developers and cloud
providers. Finally, we present several general recommendations that can guide
other future works in the area of privacy for the cloud
- …