222,713 research outputs found

    Privacy-preserving distributed data mining

    Get PDF
    This thesis is concerned with privacy-preserving distributed data mining algorithms. The main challenges in this setting are inference attacks and the formation of collusion groups. The inference problem is the reconstruction of sensitive data by attackers from non-sensitive sources, such as intermediate results, exchanged messages, or public information. Moreover, in a distributed scenario, malicious insiders can organize collusion groups to deploy more effective inference attacks. This thesis shows that existing privacy measures do not adequately protect privacy against inference and collusion. Therefore, in this thesis, new measures based on information theory are developed to overcome the identiffied limitations. Furthermore, a new distributed data clustering algorithm is presented. The clustering approach is based on a kernel density estimates approximation that generates a controlled amount of ambiguity in the density estimates and provides privacy to original data. Besides, this thesis also introduces the first privacy-preserving algorithms for frequent pattern discovery in a distributed time series. Time series are transformed into a set of n-dimensional data points and finding frequent patterns reduced to finding local maxima in the n-dimensional density space. The proposed algorithms are linear in the size of the dataset with low communication costs, validated by experimental evaluation using different datasets.Diese Arbeit befasst sich mit vertraulichkeitsbewahrendem Data Mining in verteilten Umgebungen mit Schwerpunkt auf ausgewählten N-Agenten-Angriffsszenarien für das Inferenzproblem im Data-Clustering und der Zeitreihenanalyse. Dabei handelt es sich um Angriffe von einzelnen oder Teilgruppen von Agenten innerhalb einer verteilten Data Mining-Gruppe oder von einem einzelnen Agenten außerhalb dieser Gruppe. Zunächst werden in dieser Arbeit zwei neue Privacy-Maße vorgestellt, die im Gegensatz zu bislang existierenden, die im verteilten Data Mining allgemein geforderte Eigenschaften zur Vertraulichkeitsbewahrung erfüllen und bei denen sich der gemessene Grad der Vertraulichkeit auf die verwendete Datenanalysemethode und die Anzahl von Angreifern bezieht. Für den Zweck eines vertraulichkeitsbewahrenden, verteilten Data-Clustering wird ein neues Kernel-Dichteabschätzungsbasiertes Verfahren namens KDECS vorgestellt. KDECS verwendet eine Approximation der originalen, lokalen Kernel-Dichteschätzung, so dass die ursprünglichen Daten anderer Agenten in der Data Mining-Gruppe mit einer höheren Wahrscheinlichkeit als einem hierfür vorgegebenen Wert nicht mehr zu rekonstruieren sind. Das Verfahren ist nachweislich sicherer als Data-Clustering mit generativen Mixture Modellen und SMC-basiert sicherem k-means Data-Clustering. Zusätzlich stellen wir neue Verfahren, namens DPD-TS, DPD-HE und DPDFS, für eine vertraulichkeitsbewahrende, verteilte Mustererkennung in Zeitreihen vor, deren Komplexität und Sicherheitsgrad wir mit den zuvor erwähnten neuen Privacy-Maßen analysieren. Dabei hängt ein von einzelnen Agenten einer Data Mining-Gruppe jeweils vorgegebener, minimaler Sicherheitsgrad von DPD-TS und DPD-FS nur von der Dimensionsreduktion der Zeitreihenwerte und ihrer Diskretisierung ab und kann leicht überprüft werden. Einen noch besseren Schutz von sensiblen Daten bietet das Verfahren DPD HE mit Hilfe von homomorpher Verschlüsselung. Neben der theoretischen Analyse wurden die experimentellen Leistungsbewertungen der entwickelten Verfahren mit verschiedenen, öffentlich verfügbaren Datensätzen durchgeführt

    Unsupervised Classification of Neolithic Pottery From the Northern Alpine Space Using t-SNE and HDBSCAN

    Get PDF
    Terms of “Neolithic cultures” are still used to describe spatial and temporal differences in pottery styles across central Europe. These terms date back to research periods when absolute dating methods were lacking and typological classification was used to establish chronologies. Those terms are charged with problematic, biasing notions of social configurations: cultural homogeneity, spatial boundedness, and immobility. In this article, we present an alternative approach to pottery classification by using ceramics from dendrochronologically and C14-dated sites of the 40th–38th c. BC located in the northern Alpine Foreland. The newly developed methodology uses a computational unsupervised classification based on profile shape and additional nominal characteristics using t-Distributed Stochastic Neighbour Embedding and Hierarchical Density-Based Spatial Clustering of Applications with Noise for cluster analyses. Its role in our project was to provide a quantitative, algorithm-based approach to classify large datasets of pottery while simultaneously account for a large number of variables. This enabled us to find similarity structures that would escape human cognitive capacities on which typological classification is based on. It formed one pilar of a mixed method research approach combining qualitative and quantitative methods of pottery classification. Our results show that the premises of cultural homogeneity are untenable but can be methodologically overcome by using the proposed classification approaches

    Scaling and Placing Distributed Services on Vehicle Clusters in Urban Environments

    Get PDF
    Many vehicles spend a significant amount of time in urban traffic congestion. Due to the evolution of autonomous vehicles, driver assistance systems, and in-vehicle entertainment, these vehicles have plentiful computational and communication capacity. How can we deploy data collection and processing tasks on these (slowly) moving vehicles to productively use any spare resources? To answer this question, we study the efficient placement of distributed services on a moving vehicle cluster. We present a macroscopic flow model for an intersection in Dublin, Ireland, using real vehicle density data. We show that such aggregate flows are highly predictable (even though the paths of individual vehicles are not known in advance), making it viable to deploy services harnessing vehicles’ sensing capabilities. After studying the feasibility of using these vehicle clusters as infrastructure, we introduce a detailed mathematical specification for a task-based, distributed service placement model. The distributed service scales according to the resource requirements and is robust to the changes caused by the mobility of the cluster. We formulate this as a constrained optimization problem, with the objective of minimizing overall processing and communication costs. Our results show that jointly scaling tasks and finding a mobility-aware, optimal placement results in reduced processing and communication costs compared to the two schemes in the literature. We compare our approach to an autonomous vehicular edge computing-based naive solution and a clustering-based solution
    • …
    corecore