6,764 research outputs found
Entity-Linking via Graph-Distance Minimization
Entity-linking is a natural-language-processing task that consists in
identifying the entities mentioned in a piece of text, linking each to an
appropriate item in some knowledge base; when the knowledge base is Wikipedia,
the problem comes to be known as wikification (in this case, items are
wikipedia articles). One instance of entity-linking can be formalized as an
optimization problem on the underlying concept graph, where the quantity to be
optimized is the average distance between chosen items. Inspired by this
application, we define a new graph problem which is a natural variant of the
Maximum Capacity Representative Set. We prove that our problem is NP-hard for
general graphs; nonetheless, under some restrictive assumptions, it turns out
to be solvable in linear time. For the general case, we propose two heuristics:
one tries to enforce the above assumptions and another one is based on the
notion of hitting distance; we show experimentally how these approaches perform
with respect to some baselines on a real-world dataset.Comment: In Proceedings GRAPHITE 2014, arXiv:1407.7671. The second and third
authors were supported by the EU-FET grant NADINE (GA 288956
Knowledge Refinement via Rule Selection
In several different applications, including data transformation and entity
resolution, rules are used to capture aspects of knowledge about the
application at hand. Often, a large set of such rules is generated
automatically or semi-automatically, and the challenge is to refine the
encapsulated knowledge by selecting a subset of rules based on the expected
operational behavior of the rules on available data. In this paper, we carry
out a systematic complexity-theoretic investigation of the following rule
selection problem: given a set of rules specified by Horn formulas, and a pair
of an input database and an output database, find a subset of the rules that
minimizes the total error, that is, the number of false positive and false
negative errors arising from the selected rules. We first establish
computational hardness results for the decision problems underlying this
minimization problem, as well as upper and lower bounds for its
approximability. We then investigate a bi-objective optimization version of the
rule selection problem in which both the total error and the size of the
selected rules are taken into account. We show that testing for membership in
the Pareto front of this bi-objective optimization problem is DP-complete.
Finally, we show that a similar DP-completeness result holds for a bi-level
optimization version of the rule selection problem, where one minimizes first
the total error and then the size
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Novel approaches to anonymity and privacy in decentralized, open settings
The Internet has undergone dramatic changes in the last two decades, evolving from a mere communication network to a global multimedia platform in which billions of users actively exchange information. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy that existing technology is failing to keep pace with. In this dissertation, we present the results of two lines of research that developed two novel approaches to anonymity and privacy in decentralized, open settings. First, we examine the issue of attribute and identity disclosure in open settings and develop the novel notion of (k,d)-anonymity for open settings that we extensively study and validate experimentally. Furthermore, we investigate the relationship between anonymity and linkability using the notion of (k,d)-anonymity and show that, in contrast to the traditional closed setting, anonymity within one online community does necessarily imply unlinkability across different online communities in the decentralized, open setting. Secondly, we consider the transitive diffusion of information that is shared in social networks and spread through pairwise interactions of user connected in this social network. We develop the novel approach of exposure minimization to control the diffusion of information within an open network, allowing the owner to minimize its exposure by suitably choosing who they share their information with. We implement our algorithms and investigate the practical limitations of user side exposure minimization in large social networks. At their core, both of these approaches present a departure from the provable privacy guarantees that we can achieve in closed settings and a step towards sound assessments of privacy risks in decentralized, open settings.Das Internet hat in den letzten zwei Jahrzehnten eine drastische Transformation erlebt und entwickelte sich dabei von einem einfachen Kommunikationsnetzwerk zu einer globalen Multimedia Plattform auf der Milliarden von Nutzern aktiv Informationen austauschen. Diese Transformation hat zwar einen gewaltigen Nutzen und vielfältige Vorteile für die Gesellschaft mit sich gebracht, hat aber gleichzeitig auch neue Herausforderungen und Gefahren für online Privacy mit sich gebracht mit der die aktuelle Technologie nicht mithalten kann. In dieser Dissertation präsentieren wir zwei neue Ansätze für Anonymität und Privacy in dezentralisierten und offenen Systemen. Mit unserem ersten Ansatz untersuchen wir das Problem der Attribut- und Identitätspreisgabe in offenen Netzwerken und entwickeln hierzu den Begriff der (k, d)-Anonymität für offene Systeme welchen wir extensiv analysieren und anschließend experimentell validieren. Zusätzlich untersuchen wir die Beziehung zwischen Anonymität und Unlinkability in offenen Systemen mithilfe des Begriff der (k, d)-Anonymität und zeigen, dass, im Gegensatz zu traditionell betrachteten, abgeschlossenen Systeme, Anonymität innerhalb einer Online Community nicht zwingend die Unlinkability zwischen verschiedenen Online Communitys impliziert. Mit unserem zweiten Ansatz untersuchen wir die transitive Diffusion von Information die in Sozialen Netzwerken geteilt wird und sich dann durch die paarweisen Interaktionen von Nutzern durch eben dieses Netzwerk ausbreitet. Wir entwickeln eine neue Methode zur Kontrolle der Ausbreitung dieser Information durch die Minimierung ihrer Exposure, was dem Besitzer dieser Information erlaubt zu kontrollieren wie weit sich deren Information ausbreitet indem diese initial mit einer sorgfältig gewählten Menge von Nutzern geteilt wird. Wir implementieren die hierzu entwickelten Algorithmen und untersuchen die praktischen Grenzen der Exposure Minimierung, wenn sie von Nutzerseite für große Netzwerke ausgeführt werden soll. Beide hier vorgestellten Ansätze verbindet eine Neuausrichtung der Aussagen die diese bezüglich Privacy treffen: wir bewegen uns weg von beweisbaren Privacy Garantien für abgeschlossene Systeme, und machen einen Schritt zu robusten Privacy Risikoeinschätzungen für dezentralisierte, offene Systeme in denen solche beweisbaren Garantien nicht möglich sind
On the Measurement of Privacy as an Attacker's Estimation Error
A wide variety of privacy metrics have been proposed in the literature to
evaluate the level of protection offered by privacy enhancing-technologies.
Most of these metrics are specific to concrete systems and adversarial models,
and are difficult to generalize or translate to other contexts. Furthermore, a
better understanding of the relationships between the different privacy metrics
is needed to enable more grounded and systematic approach to measuring privacy,
as well as to assist systems designers in selecting the most appropriate metric
for a given application.
In this work we propose a theoretical framework for privacy-preserving
systems, endowed with a general definition of privacy in terms of the
estimation error incurred by an attacker who aims to disclose the private
information that the system is designed to conceal. We show that our framework
permits interpreting and comparing a number of well-known metrics under a
common perspective. The arguments behind these interpretations are based on
fundamental results related to the theories of information, probability and
Bayes decision.Comment: This paper has 18 pages and 17 figure
- …