1,945 research outputs found
Concealment Conserving the Data Mining of Groups & Individual
We present an overview of privacy preserving data mining, one of the most popular directions in the data mining research community. In the first part of the chapter, we presented approaches that have been proposed for the protection of either the sensitive data itself in the course of data mining or the sensitive data mining results, in the context of traditional (relational) datasets. Following that, in the second part of the chapter, we focused our attention on one of the most recent as well as prominent directions in privacy preserving data mining: the mining of user mobility data. Although still in its infancy, privacy preserving data mining of mobility data has attracted a lot of research attention and already counts a number of methodologies both with respect to sensitive data protection and to sensitive knowledge hiding. Finally, in the end of the chapter, we provided some roadmap along the field of privacy preserving mobility data mining as well as the area of privacy preserving data mining at large
When and Where: Predicting Human Movements Based on Social Spatial-Temporal Events
Predicting both the time and the location of human movements is valuable but
challenging for a variety of applications. To address this problem, we propose
an approach considering both the periodicity and the sociality of human
movements. We first define a new concept, Social Spatial-Temporal Event (SSTE),
to represent social interactions among people. For the time prediction, we
characterise the temporal dynamics of SSTEs with an ARMA (AutoRegressive Moving
Average) model. To dynamically capture the SSTE kinetics, we propose a Kalman
Filter based learning algorithm to learn and incrementally update the ARMA
model as a new observation becomes available. For the location prediction, we
propose a ranking model where the periodicity and the sociality of human
movements are simultaneously taken into consideration for improving the
prediction accuracy. Extensive experiments conducted on real data sets validate
our proposed approach
Video summarization by group scoring
In this paper a new model for user-centered video summarization is presented. Involvement of more than one expert in generating the final video summary should be regarded as the main use case for this algorithm. This approach consists of three major steps. First, the video frames are scored by a group of operators. Next, these assigned scores are averaged to produce a singular value for each frame and lastly, the highest scored video frames alongside the corresponding audio and textual contents are extracted to be inserted into the summary. The effectiveness of this approach has been evaluated by comparing the video summaries generated by this system against the results from a number of automatic summarization tools that use different modalities for abstraction
Location Privacy in Spatial Crowdsourcing
Spatial crowdsourcing (SC) is a new platform that engages individuals in
collecting and analyzing environmental, social and other spatiotemporal
information. With SC, requesters outsource their spatiotemporal tasks to a set
of workers, who will perform the tasks by physically traveling to the tasks'
locations. This chapter identifies privacy threats toward both workers and
requesters during the two main phases of spatial crowdsourcing, tasking and
reporting. Tasking is the process of identifying which tasks should be assigned
to which workers. This process is handled by a spatial crowdsourcing server
(SC-server). The latter phase is reporting, in which workers travel to the
tasks' locations, complete the tasks and upload their reports to the SC-server.
The challenge is to enable effective and efficient tasking as well as reporting
in SC without disclosing the actual locations of workers (at least until they
agree to perform a task) and the tasks themselves (at least to workers who are
not assigned to those tasks). This chapter aims to provide an overview of the
state-of-the-art in protecting users' location privacy in spatial
crowdsourcing. We provide a comparative study of a diverse set of solutions in
terms of task publishing modes (push vs. pull), problem focuses (tasking and
reporting), threats (server, requester and worker), and underlying technical
approaches (from pseudonymity, cloaking, and perturbation to exchange-based and
encryption-based techniques). The strengths and drawbacks of the techniques are
highlighted, leading to a discussion of open problems and future work
Trajectory-Based Spatiotemporal Entity Linking
Trajectory-based spatiotemporal entity linking is to match the same moving
object in different datasets based on their movement traces. It is a
fundamental step to support spatiotemporal data integration and analysis. In
this paper, we study the problem of spatiotemporal entity linking using
effective and concise signatures extracted from their trajectories. This
linking problem is formalized as a k-nearest neighbor (k-NN) query on the
signatures. Four representation strategies (sequential, temporal, spatial, and
spatiotemporal) and two quantitative criteria (commonality and unicity) are
investigated for signature construction. A simple yet effective dimension
reduction strategy is developed together with a novel indexing structure called
the WR-tree to speed up the search. A number of optimization methods are
proposed to improve the accuracy and robustness of the linking. Our extensive
experiments on real-world datasets verify the superiority of our approach over
the state-of-the-art solutions in terms of both accuracy and efficiency.Comment: 15 pages, 3 figures, 15 table
Combinatorial Algorithms for String Sanitization
String data are often disseminated to support applications such as
location-based service provision or DNA sequence analysis. This dissemination,
however, may expose sensitive patterns that model confidential knowledge. In
this paper, we consider the problem of sanitizing a string by concealing the
occurrences of sensitive patterns, while maintaining data utility, in two
settings that are relevant to many common string processing tasks.
In the first setting, we aim to generate the minimal-length string that
preserves the order of appearance and frequency of all non-sensitive patterns.
Such a string allows accurately performing tasks based on the sequential nature
and pattern frequencies of the string. To construct such a string, we propose a
time-optimal algorithm, TFS-ALGO. We also propose another time-optimal
algorithm, PFS-ALGO, which preserves a partial order of appearance of
non-sensitive patterns but produces a much shorter string that can be analyzed
more efficiently. The strings produced by either of these algorithms are
constructed by concatenating non-sensitive parts of the input string. However,
it is possible to detect the sensitive patterns by ``reversing'' the
concatenation operations. In response, we propose a heuristic, MCSR-ALGO, which
replaces letters in the strings output by the algorithms with carefully
selected letters, so that sensitive patterns are not reinstated, implausible
patterns are not introduced, and occurrences of spurious patterns are
prevented. In the second setting, we aim to generate a string that is at
minimal edit distance from the original string, in addition to preserving the
order of appearance and frequency of all non-sensitive patterns. To construct
such a string, we propose an algorithm, ETFS-ALGO, based on solving specific
instances of approximate regular expression matching.Comment: Extended version of a paper accepted to ECML/PKDD 201
Local Suppression and Splitting Techniques for Privacy Preserving Publication of Trajectories
postprin
Chronic infection: punctuated interpenetration and pathogen virulence
We apply an information dynamics formalism to the Levens and Lewontin vision of biological interpenetration between a 'cognitive condensation' including immune function embedded in social and cultural structure on the one hand, and an established, highly adaptive, parasite population on the other. We iterate the argument, beginning with direct interaction between cognitive condensation and pathogen, then extend the analysis to second order 'mutator' mechanisms inherent both to immune function and to certain forms of rapid pathogen antigenic variability.
The methodology, based on the Large Deviations Program of applied probability, produces synergistic cognitive/adaptive 'learning plateaus' that represent stages of chronic infection, and, for human populations, is able to encompass the fundamental biological reality of culture omitted by other approaches.
We conclude that, for 'evolution machine' pathogens like HIV and malaria, simplistic magic bullet 'medical' drug, vaccine, or behavior modification interventions which do not address the critical context of overall living and working conditions may constitute selection pressures triggering adaptations in life history strategy resulting in marked increase of pathogen virulenc
- …