16 research outputs found

    PrivaTree: Collaborative Privacy-Preserving Training of Decision Trees on Biomedical Data

    Get PDF
    Biomedical data generation and collection have become faster and more ubiquitous. Consequently, datasets are increasingly spread across hospitals, research institutions, or other entities. Exploiting such distributed datasets simultaneously can be beneficial; in particular, classification using machine learning models such as decision trees is becoming increasingly common and important. However, given that biomedical data is highly sensitive, sharing data records across entities or centralizing them in one location are often prohibited due to privacy concerns or regulations. We design PrivaTree, an efficient and privacy-preserving protocol for collaborative training of decision tree models on distributed, horizontally partitioned, biomedical datasets. Although decision tree models may not always be as accurate as neural networks, they have better interpretability and are helpful in decision-making processes, which are crucial for biomedical applications. PrivaTree follows a federated learning approach, where raw data is not shared, and where every data provider computes updates to a global decision tree model being trained, on their private dataset. This is followed by privacy-preserving aggregation of these updates using additive secret-sharing, in order to collaboratively update the model. We implement PrivaTree, and evaluate its computational and communication efficiency on three different biomedical datasets, as well as the accuracy of the resulting models. Compared to the model centrally trained on all data records, the obtained collaborative model presents a modest loss of accuracy, while consistently outperforming the accuracy of the local models, trained separately by each data provider. Moreover, PrivaTree is more efficient than existing solutions, which makes it usable for training decision trees with numerous nodes, on large complex datasets, with both continuous and categorical attributes, as often found in the biomedical field

    Privacy-Preserving Decision Tree Training and Prediction against Malicious Server

    Get PDF
    Privacy-preserving machine learning enables secure outsourcing of machine learning tasks to an untrusted service provider (server) while preserving the privacy of the user\u27s data (client). Attaining good concrete efficiency for complicated machine learning tasks, such as training decision trees, is one of the challenges in this area. Prior works on privacy-preserving decision trees required the parties to have comparable computational resources, and instructed the client to perform computation proportional to the complexity of the entire task. In this work we present new protocols for privacy-preserving decision trees, for both training and prediction, achieving the following desirable properties: 1. Efficiency: the client\u27s complexity is independent of the training-set size during training, and of the tree size during prediction. 2. Security: privacy holds against malicious servers. 3. Practical usability: high accuracy, fast prediction, and feasible training demonstrated on standard UCI datasets, encrypted with fully homomorphic encryption. To the best of our knowledge, our protocols are the first to offer all these properties simultaneously. The core of our work consists of two technical contributions. First, a new low-degree polynomial approximation for functions, leading to faster protocols for training and prediction on encrypted data. Second, a design of an easy-to-use mechanism for proving privacy against malicious adversaries that is suitable for a wide family of protocols, and in particular, our protocols; this mechanism could be of independent interest

    A survey of state-of-the-art methods for securing medical databases

    Get PDF
    This review article presents a survey of recent work devoted to advanced state-of-the-art methods for securing of medical databases. We concentrate on three main directions, which have received attention recently: attribute-based encryption for enabling secure access to confidential medical databases distributed among several data centers; homomorphic encryption for providing answers to confidential queries in a secure manner; and privacy-preserving data mining used to analyze data stored in medical databases for verifying hypotheses and discovering trends. Only the most recent and significant work has been included

    Aggregating privatized medical data for secure querying applications

    Full text link
     This thesis analyses and examines the challenges of aggregation of sensitive data and data querying on aggregated data at cloud server. This thesis also delineates applications of aggregation of sensitive medical data in several application scenarios, and tests privatization techniques to assist in improving the strength of privacy and utility

    Design and Implementation of a Scalable Crowdsensing Platform for Geospatial Data

    Get PDF
    In the recent years smart devices and small low-powered sensors are becoming ubiquitous and nowadays everything is connected altogether, which is a promising foundation for crowdsensing of data related to various environmental and societal phenomena. Very often, such data is especially meaningful when related to time and location, which is possible by already equipped GPS capabilities of modern smart devices. However, in order to gain knowledge from high-volume crowd-sensed data, it has to be collected and stored in a central platform, where it can be processed and transformed for various use cases. Conventional approaches built around classical relational databases and monolithic backends, that load and process the geospatial data on a per-request basis are not suitable for supporting the data requests of a large crowd willing to visualize phenomena. The possibly millions of data points introduce challenges for calculation, data-transfer and visualization on smartphones with limited graphics performance. We have created an architectural design, which combines a cloud-native approach with Big Data concepts used in the Internet of Things. The architectural design can be used as a generic foundation to implement a scalable backend for a platform, that covers aspects important for crowdsensing, such as social- and incentive features, as well as a sophisticated stream processing concept to calculate incoming measurement data and store pre-aggregated results. The calculation is based on a global grid system to index geospatial data for efficient aggregation and building a hierarchical geospatial relationship of averaged values, that can be directly used to rapidly and efficiently provide data on requests for visualization. We introduce the Noisemap project as an exemplary use case of such a platform and elaborate on certain requirements and challenges also related to frontend implementations. The goal of the project is to collect crowd-sensed noise measurements via smartphones and provide users information and a visualization of noise levels in their environment, which requires storing and processing in a central platform. A prototypic implementation for the measurement context of the Noisemap project is showing that the architectural design is indeed feasible to realize

    An evaluation of the challenges of Multilingualism in Data Warehouse development

    Get PDF
    In this paper we discuss Business Intelligence and define what is meant by support for Multilingualism in a Business Intelligence reporting context. We identify support for Multilingualism as a challenging issue which has implications for data warehouse design and reporting performance. Data warehouses are a core component of most Business Intelligence systems and the star schema is the approach most widely used to develop data warehouses and dimensional Data Marts. We discuss the way in which Multilingualism can be supported in the Star Schema and identify that current approaches have serious limitations which include data redundancy and data manipulation, performance and maintenance issues. We propose a new approach to enable the optimal application of multilingualism in Business Intelligence. The proposed approach was found to produce satisfactory results when used in a proof-of-concept environment. Future work will include testing the approach in an enterprise environmen

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok
    corecore