16 research outputs found

    Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Get PDF
    Background: Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Methods: Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Results: Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher than the F-measure using calculated probabilities. Further, the threshold estimation yielded results for F-measure that were only slightly below the highest possible for those probabilities. Conclusions: The method appears highly accurate across a spectrum of datasets with varying degrees of error. As there are few alternatives for parameter estimation, the approach is a major step towards providing a complete operational approach for probabilistic linkage of privacy-preserved datasets

    A Practical, Accurate, Information Criterion for Nth Order Markov Processes

    Get PDF
    The recent increase in the breath of computational methodologies has been matched with a corresponding increase in the difficulty of comparing the relative explanatory power of models from different methodological lineages. In order to help address this problem a Markovian information criterion (MIC) is developed that is analogous to the Akaike information criterion (AIC) in its theoretical derivation and yet can be applied to any model able to generate simulated or predicted data, regardless of its methodology. Both the AIC and proposed MIC rely on the Kullback–Leibler (KL) distance between model predictions and real data as a measure of prediction accuracy. Instead of using the maximum likelihood approach like the AIC, the proposed MIC relies instead on the literal interpretation of the KL distance as the inefficiency of compressing real data using modelled probabilities, and therefore uses the output of a universal compression algorithm to obtain an estimate of the KL distance. Several Monte Carlo tests are carried out in order to (a) confirm the performance of the algorithm and (b) evaluate the ability of the MIC to identify the true data-generating process from a set of alternative models

    Cryptanalysis of Masked Ciphers: A not so Random Idea

    Get PDF
    A new approach to the security analysis of hardware-oriented masked ciphers against second-order side-channel attacks is developed. By relying on techniques from symmetric-key cryptanalysis, concrete security bounds are obtained in a variant of the probing model that allows the adversary to make only a bounded, but possibly very large, number of measurements. Specifically, it is formally shown how a bounded-query variant of robust probing security can be reduced to the linear cryptanalysis of masked ciphers. As a result, the compositional issues of higher-order threshold implementations can be overcome without relying on fresh randomness. From a practical point of view, the aforementioned approach makes it possible to transfer many of the desirable properties of first-order threshold implementations, such as their low randomness usage, to the second-order setting. For example, a straightforward application to the block cipher LED results in a masking using less than 700 random bits including the initial sharing. In addition, the cryptanalytic approach introduced in this paper provides additional insight into the design of masked ciphers and allows for a quantifiable trade-off between security and performance

    Heterogeneous system GI/GI(n)/∞ with random customers capacities

    Get PDF
    In the paper, we consider a queuing system with n types of customers. We assume that each customer arrives at the queue according to a renewal process and takes a random resource amount, independent of their service time. We write Kolmogorov integro-differential equation, which, in general, cannot be analytically solved. Hence, we look for the solution under the condition of infinitely growing a service time, and we obtain multi-dimensional asymptotic approximations. We show that the n-dimensional probability distribution of the total resource amounts is asymptotically Gaussian, and we look at its accuracy via Kolmogorov distance

    Optimal estimation of the states of synchronous generalized flow of events of the second order under its complete observability

    No full text
    We consider the optimal estimation problem of the states of synchronous generalized flow of events of the second order with two states; it is one of the mathematical models for an incoming stream of claims (events) in digital integral servicing networks and which is related to the class of Markov chains. The observation conditions for this flow are such that each event is accessible to observation. We offer the optimal estimation algorithm for the flow states, where the decision about the flow state is made by criterion of a posteriori probability maximum. The results of the analytical calculations of a posteriori probability and the simulation experiments with numerical results are presented

    Optimal state estimation of semi-synchronous event flow of the second order under its complete observability

    No full text
    We consider the optimal estimation problem for the states of a semi-synchronous event flow of the second order with two states; it is one of the adequate mathematical models for an incoming stream of claims (events) in modern digital integral servicing networks, telecommunication systems, satellite communication networks. We find an explicit form for posterior probabilities of the flow states. The decision about the flow state is made with the maximal a posteriori criterion

    Estimation of the probability density parameters of the interval duration between events in correlated semi-synchronous event flow of the second order by the method of moments

    No full text
    We consider a correlated semi-synchronous event flow of the second order with two states; it is one of the mathematical models for an incoming stream of claims (events) in modern digital integral servicing networks, telecommunication systems and satellite communication networks. We solve the problem of estimating the probability density parameters of the values of the interval duration between the moments of the events occurrence by the method of moments for general and special cases of setting the flow parameters. The results of statistical experiments performed on a flow simulation model are given

    Distribution Parameters Estimation in Recurrent Synchronous Generalized Doubly Stochastic Flow of the Second Order

    No full text
    We solve the estimation problem of the probability density parameters of the inter-event interval duration in a synchronous generalized flow of the second order, which can be used as a powerful mathematical model for the arrival processes in queuing systems and networks. The explicit form of the parameter estimates is determined by the method of moments on the basis of observations of the doubly stochastic flow under the recurrence conditions that are formulated in terms of the joint probability density of the durations of two adjacent inter-event intervals. The quality of the estimates is established by using the model, reproducing the flow behavior under conditions of complete observability
    corecore