Search CORE

296 research outputs found

Privacy preserving spatio-temporal clustering on horizontally partitioned data

Author: İnan Ali
Inan Ali
Saygin Yucel
Saygın Yücel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2006
Field of study

Time-stamped location information is regarded as spatio-temporal data and, by its nature, such data is highly sensitive from the perspective of privacy. In this paper, we propose a privacy preserving spatio-temporal clustering method for horizontally partitioned data which, to the best of our knowledge, was not done before. Our methods are based on building the dissimilarity matrix through a series of secure multi-party trajectory comparisons managed by a third party. Our trajectory comparison protocol complies with most trajectory comparison functions and complexity analysis of our methods shows that our protocol does not introduce extra overhead when constructing dissimilarity matrix, compared to the centralized approach. This work was funded by the Information Society Technologies programme of the European Commission, Future and Emerging Technologies under IST-014915 GeoPKDD project

Sabanci University Research Database

Secret charing vs. encryption-based techniques for privacy preserving data mining

Author: Pedersen Thomas Brochmann
Savas Erkay
Savaş Erkay
Saygin Yucel
Saygın Yücel
Publication venue: Eurostat
Publication date: 01/12/2007
Field of study

Privacy preserving querying and data publishing has been studied in the context of statistical databases and statistical disclosure control. Recently, large-scale data collection and integration efforts increased privacy concerns which motivated data mining researchers to investigate privacy implications of data mining and how data mining can be performed without violating privacy. In this paper, we first provide an overview of privacy preserving data mining focusing on distributed data sources, then we compare two technologies used in privacy preserving data mining. The first technology is encryption based, and it is used in earlier approaches. The second technology is secret-sharing which is recently being considered as a more efficient approach

Sabanci University Research Database

Privacy preserving distributed spatio-temporal data mining

Author: İnan Ali
Publication venue
Publication date: 01/01/2006
Field of study

Time-stamped location information is regarded as spatio-temporal data due to its time and space dimensions and, by its nature, is highly vulnerable to misuse. Privacy issues related to collection, use and distribution of individuals’ location information are the main obstacles impeding knowledge discovery in spatio-temporal data. Suppressing identifiers from the data does not suffice since movement trajectories can easily be linked to individuals using publicly available information such as home or work addresses. Yet another solution could be employing existing privacy preserving data mining techniques. However these techniques are not suitable since time-stamped location observations of an object are not plain, independent attributes of this object. Therefore, new privacy preserving data mining techniques are required to handle spatio-temporal data specifically. In this thesis, we propose a privacy preserving data mining technique and two preprocessing steps for data mining related to privacy preservation in spatio-temporal datasets: (1) Distributed clustering, (2) Centralized anonymization and (3) Distributed anonymization. We also provide security and efficiency analysis of our algorithms which shows that under reasonable conditions, achieving privacy preservation with minimal sensitive information leakage is possible for data mining purposes

Sabanci University Research Database

Learning structure and schemas from heterogeneous domains in networked systems: a survey

Author: Biba Marenglen
Xhafa Xhafa Fatos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible to manually organize such documents. Additionally, most of the document sexist in an unstructured form and do not follow any schemas. Therefore, research efforts in this direction are being dedicated to automatically infer structure and schemas. This is essential in order to better organize huge collections as well as to effectively and efficiently retrieve documents in heterogeneous domains in networked system. This paper presents a survey of the state-of-the-art methods for inferring structure from documents and schemas in networked environments. The survey is organized around the most important application domains, namely, bio-informatics, sensor networks, social networks, P2Psystems, automation and control, transportation and privacy preserving for which we analyze the recent developments on dealing with unstructured data in such domains.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Efficient distributed privacy preserving clustering

Author: Doganay Mahir Can
Doğanay Mahir Can
Publication venue
Publication date: 01/01/2008
Field of study

With recent growing concerns about data privacy, researchers have focused their attention to developing new algorithms to perform privacy preserving data mining. However, methods proposed until now are either very inefficient to deal with large datasets, or compromise privacy with accuracy of data mining results. Secure multiparty computation helps researchers develop privacy preserving data mining algorithms without having to compromise quality of data mining results with data privacy. Also it provides formal guarantees about privacy. On the other hand, algorithms based on secure multiparty computation often rely on computationally expensive cryptographic operations, thus making them infeasible to use in real world scenarios. In this thesis, we study the problem of privacy preserving distributed clustering and propose an efficient and secure algorithm for this problem based on secret sharing and compare it to the state of the art. Experiments show that our algorithm has a lower communication overhead and a much lower computation overhead than the state of the art

Sabanci University Research Database

Privacy leaks in spatio-temporal trajectory publishing

Author: Kaplan Emre
Publication venue
Publication date: 01/01/2009
Field of study

Trajectories are spatio-temporal traces of moving objects which contain valuable information to be harvested by spatio-temporal data mining techniques. Applications like city traffic planning, identification of evacuation routes, trend detection, and many more can benefit from trajectory mining. However, the trajectories of individuals often contain private and sensitive information, so anyone who possess trajectory data must take special care when disclosing this data. Removing identifiers from trajectories before the release is not effective against linkage type attacks, and rich sources of background information make it even worse. An alternative is to apply transformation techniques to map the given set of trajectories into another set where the distances are preserved. This way, the actual trajectories are not released, but the distance information can still be used for data mining techniques such as clustering. In this thesis, we show that an unknown private trajectory can be reconstructed using the available background information together with the mutual distances released for data mining purposes. The background knowledge is in the form of known trajectories and extra information such as the speed limit. We provide analytical results which bound the number of the known trajectories needed to reconstruct private trajectories. Experiments performed on real trajectory data sets show that the number of known samples is surprisingly smaller than the actual theoretical bounds

Sabanci University Research Database

Proxy-secure computation model: application to k-means clustering implementation, analysis and improvements

Author: Pattuk Erman
Publication venue
Publication date: 01/01/2010
Field of study

Distributed privacy preserving data mining applications, where data is divided among several parties, require high amounts of network communication. In order to overcome this overhead, we propose a scheme that reduces remote computations in distributed data mining applications into local computations on a trusted hardware. Cell BE is used to realize the trusted hardware acting as a proxy for the parties. We design a secure two-party computation protocol that can be instrumental in realizing non-colluding parties in privacy-preserving data mining applications. Each party is represented with a signed and encrypted thread on a separate core of Cell BE running in an isolated mode, whereby its execution and data are secured by hardware means. Our implementations and experiments demonstrate that a significant speed up is gained through the new scheme. It is also possible to increase the number of non-colluding parties on Cell BE, which extends the proposed technique to implement most distributed privacy-preserving data mining protocols proposed in literature that require several non-colluding parties

Sabanci University Research Database

Towards understanding privacy-aware artificial intelligence

Author: Tritsarolis Andreas
Τριτσαρώλης Ανδρέας
Publication venue
Publication date: 18/11/2022
Field of study

Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση

DSpace at NTUA