125 research outputs found

    A method for treating dependencies between variables in a simulation risk analysis model

    Get PDF
    This thesis explores the need to recognise and represent accurately the interdependencies between uncertain quantitative components in a simulation model. Therefore, helping to fill the gap between acknowledging the importance of modelling correlation and the actual specification and implementation of a procedure for modelling accurate measures of Pearson's correlation became the main aim of this research. Two principal objectives are stated for the developed Research Correlation Model ("RCM"): (1) it is to generate Pearson-correlated paired samples of two continuous variables for which the sample correlation is a good approximation to the target correlation; and (2) the sampled values of the two individual variables must have very accurate means and variances. The research results conclude that the samples from the four chosen distributions that have been generated by the RCM have highly acceptable levels of precision when tested using x2 tests and others. The results also show that an average improvement in precision of correlation modelling was over 96 percent. Even with samples as small as 10 the worst case correction factor is only just less than 90 percent, with the average correction factor being over 96 percent overall, so that the contribution made by the RCM here is quite impressive. Overall the analysis shows that in the case when the sample size is 10, the RCM consistently generates samples whose correlation is so much more precise than that generated by @RISK. The smallest of all the observed ratios of improvements of the RCM in comparison with the use of @RISK is 2.3:1, in just one case when the medians were being compared. The average improvement ratio exceeded 100. It is concluded that the aim of specifying, formulating and developing a Pearson correlation model between a pair of continuous variables which can be incorporated into simulation models of complex applications has been achieved successfully

    Applying Secure Multi-party Computation in Practice

    Get PDF
    In this work, we present solutions for technical difficulties in deploying secure multi-party computation in real-world applications. We will first give a brief overview of the current state of the art, bring out several shortcomings and address them. The main contribution of this work is an end-to-end process description of deploying secure multi-party computation for the first large-scale registry-based statistical study on linked databases. Involving large stakeholders like government institutions introduces also some non-technical requirements like signing contracts and negotiating with the Data Protection Agency

    Privacy preserving data publishing with multiple sensitive attributes

    Get PDF
    Data mining is the process of extracting hidden predictive information from large databases, it has a great potential to help governments, researchers and companies focus on the most significant information in their data warehouses. High quality data and effective data publishing are needed to gain a high impact from data mining process. However there is a clear need to preserve individual privacy in the released data. Privacy-preserving data publishing is a research topic of eliminating privacy threats. At the same time it provides useful information in the released data. Normally datasets include many sensitive attributes; it may contain static data or dynamic data. Datasets may need to publish multiple updated releases with different time stamps. As a concrete example, public opinions include highly sensitive information about an individual and may reflect a person's perspective, understanding, particular feelings, way of life, and desires. On one hand, public opinion is often collected through a central server which keeps a user profile for each participant and needs to publish this data for researchers to deeply analyze. On the other hand, new privacy concerns arise and user's privacy can be at risk. The user's opinion is sensitive information and it must be protected before and after data publishing. Opinions are about a few issues, while the total number of issues is huge. In this case we will deal with multiple sensitive attributes in order to develop an efficient model. Furthermore, opinions are gathered and published periodically, correlations between sensitive attributes in different releases may occur. Thus the anonymization technique must care about previous releases as well as the dependencies between released issues. This dissertation identifies a new privacy problem of public opinions. In addition it presents two probabilistic anonymization algorithms based on the concepts of k-anonymity [1, 2] and l-diversity [3, 4] to solve the problem of both publishing datasets with multiple sensitive attributes and publishing dynamic datasets. Proposed algorithms provide a heuristic solution for multidimensional quasi-identifier and multidimensional sensitive attributes using probabilistic l-diverse definition. Experimental results show that these algorithms clearly outperform the existing algorithms in term of anonymization accuracy

    Modelling and inference for dynamic traffic networks : a thesis presented for the degree of Doctor of Philosophy in Statistics at Massey University, Palmerston North, New Zealand

    Get PDF
    Nowadays, traffic congestion is a significant problem in the world. With the noticeable rise in vehicle usage in recent years and therefore congestion, there has been a wealth of study into possible ways that this congestion can be eased and the flow of traffic on the road improved. Controlling traffic congestion relies on good mathematical models of traffic systems. Creating accurate and reliable traffic control systems is one of the crucial steps for active congestion control. These traffic systems generally use algorithms that depend on mathematical models of traffic. Day-to-day dynamic assignment models play a critical role in transport management and planning. These models can be either deterministic or stochastic and can be used to describe the day-to-day evolution of traffic flow across the network. This doctoral research is dedicated to understanding the difference between deterministic models and stochastic models. Deterministic models have been studied well, but the properties of stochastic models are less well understood. We investigate how predictions of the long term properties of the system differ between deterministic models and stochastic models. We find that in contrast to systems with a unique equilibrium where the deterministic model can be a good approximation for the mean of the stochastic model, for a system with multiple equilibria, the situation is more complicated. In such a case even when deterministic and stochastic models appear to have comparable properties over a significant time frame, they may still behave very differently in the long-run. Markov models are popular for stochastic day-to-day assignment. Properties of such models are difficult to analyse theoretically, so there has been an interest in approximations which are more mathematically tractable. However, it is di cult to tell when approximation will work well, both in a stationary state and during transient periods following a network disruption. The coefficient of reactivity introduced by Hazelton (2002) measures the degree to which a system reacts to a disruption. We propose that it can be used as a guide to when approximation models will work well. We study this issue through a raft of numerical experiments. We find that the value of the coefficient of reactivity is useful in predicting the accuracy of approximation models. However, the detailed interpretation of the coefficient of reactivity depends to a modest degree on properties of the network such as its size and number of routes

    Multivariate equi-width data swapping for private data publication

    No full text
    Also published in: Advances in knowledge discovery and data mining: 14th Pacific-Asia Conference, PAKDD 2010, Hyderabad, India, June 21-24, 2010: Proceedings, Part I / Mohammed J. Zaki, Jeffrey Xu Yu, B. Ravindran and Vikram Pudi (eds.), pp. 208-215In many privacy preserving applications, specific variables are required to be disturbed simultaneously in order to guarantee correlations among them. Multivariate Equi-Depth Swapping (MEDS) is a natural solution in such cases, since it provides uniform privacy protection for each data tuple. However, this approach performs ineffectively not only in computational complexity (basically O(n 3) for n data tuples), but in data utility for distance-based data analysis. This paper discusses the utilisation of Multivariate Equi-Width Swapping (MEWS) to enhance the utility preservation for such cases. With extensive theoretical analysis and experimental results, we show that, MEWS can achieve a similar performance in privacy preservation to that of MEDS and has only O(n) computational complexity.Yidong Li and Hong She

    Artificial intelligence for imaging in immunotherapy

    Get PDF

    Turvalise ühisarvutuse rakendamine

    Get PDF
    Andmetest on kasu vaid siis kui neid saab kasutada. Eriti suur lisandväärtus tekib siis, kui ühendada andmed erinevatest allikatest. Näiteks, liites kokku maksu- ja haridusandmed, saab riik läbi viia kõrghariduse erialade tasuvusanalüüse. Sama kehtib ka erasektoris - ühendades pankade maksekohustuste andmebaasid, saab efektiivsemalt tuvastada kõrge krediidiriskiga kliente. Selline andmekogude ühendamine on aga tihti konfidentsiaalsus- või privaatsusnõuete tõttu keelatud. Õigustatult, sest suuremahulised ühendatud andmekogud on atraktiivsed sihtmärgid nii häkkeritele kui ka ametnikele ja andmebaaside administraatoritele, kes oma õigusi kuritarvitada võivad. Seda sorti rünnete vastus aitab turvalise ühisarvutuse tehnoloogia kasutamine, mis võimaldab mitmed osapoolel andmeid ühiselt analüüsida, ilma et keegi neist pääseks ligi üksikutele kirjetele. Oma esimesest rakendamisest praktikas 2008. aastal on turvalise ühisarvutuse tehnoloogia praeguseks jõudnud seisu, kus seda juurutatakse hajusates rakendustes üle interneti ning seda pakutakse ka osana teistest teenustest. Käesolevas töös keskendume turvalise ühisarvutuse praktikas rakendamise tehnilistele küsimustele. Alustuseks tutvustame esimesi selle tehnoloogia rakendusi, tuvastame veel lahendamata probleeme ning pakume töö käigus välja lahendusi. Töö põhitulemus on samm-sammuline ülevaade sellise juurutuse elutsüklist, kasutades näitena esimest turvalise ühisarvutuse abil läbi viidud suuremahulisi registriandmeid hõlmavat uuringut. Sealhulgas anname ülevaate ka mittetehnilistest toimingutest nagu lepingute sõlmimine ja Andmekaitse Inspektsiooniga suhtlemine, mis tulenevad suurte organisatsioonide kaasamisest nagu seda on riigiasutused. Tulevikku vaadates pakume välja lahenduse, mis ühendab endas födereeritud andmevahetusplatvormi ja turvalise ühisarvutuse tehnoloogiat. Konkreetse lahendusena pakume Eesti riigi andmevahetuskihi X-tee täiustamist turvalise ühisarvutuse teenusega Sharemind. Selline arhitektuur võimaldaks mitmeid olemasolevaid andmekogusid uuringuteks liita efektiivselt ja turvaliselt, ilma üksikisikute privaatsust rikkumata.Data is useful only when used. This is especially true if one is able to combine several data sets. For example, combining income and educational data, it is possible for a government to get a return of investment overview of educational investments. The same is true in private sector. Combining data sets of financial obligations of their customers, banks could issue loans with lower credit risks. However, this kind of data sharing is often forbidden as citizens and customers have their privacy expectations. Moreover, such a combined database becomes an interesting target for both hackers as well as nosy officials and administrators taking advantage of their position. Secure multi-party computation is a technology that allows several parties to collaboratively analyse data without seeing any individual values. This technology is suitable for the above mentioned scenarios protecting user privacy from both insider and outsider attacks. With first practical applications using secure multi-party computation developed in 2000s, the technology is now mature enough to be used in distributed deployments and even offered as part of a service. In this work, we present solutions for technical difficulties in deploying secure multi-party computation in real-world applications. We will first give a brief overview of the current state of the art, bring out several shortcomings and address them. The main contribution of this work is an end-to-end process description of deploying secure multi-party computation for the first large-scale registry-based statistical study on linked databases. Involving large stakeholders like government institutions introduces also some non-technical requirements like signing contracts and negotiating with the Data Protection Agency. Looking into the future, we propose to deploy secure multi-party computation technology as a service on a federated data exchange infrastructure. This allows privacy-preserving analysis to be carried out faster and more conveniently, thus promoting a more informed government

    Special Topics in Information Technology

    Get PDF
    This open access book presents thirteen outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the thirteen best theses defended in 2019-20 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

    Aggregating privatized medical data for secure querying applications

    Full text link
     This thesis analyses and examines the challenges of aggregation of sensitive data and data querying on aggregated data at cloud server. This thesis also delineates applications of aggregation of sensitive medical data in several application scenarios, and tests privatization techniques to assist in improving the strength of privacy and utility
    corecore