5,250 research outputs found

    Supervised learning using a symmetric bilinear form for record linkage

    Get PDF
    Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data gives an estimation of the disclosure risk. In this paper we propose a new parameterized aggregation operator and a supervised learning method for disclosure risk assessment. The parameterized operator is a symmetric bilinear form and the supervised learning method is formalized as an optimization problem. The target of the optimization problem is to find the values of the aggregation parameters that maximize the number of re-identification (or correct links). We evaluate and compare our proposal with other non-parametrized variations of record linkage, such as those using the Mahalanobis distance and the Euclidean distance (one of the most used approaches for this purpose). Additionally, we also compare it with other previously presented parameterized aggregation operators for record linkage such as the weighted mean and the Choquet integral. From these comparisons we show how the proposed aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators. We also study which are the necessary optimization problem conditions to consider the described aggregation functions as metric functions

    FORMALIZING INFORMALITY: THE PRAEDIAL REGISTRATION SYSTEM IN PERU

    Get PDF
    The Praedial Property Registration system has been presented as an alternative system to traditional registries for the formalization of immovable property. Much of the earlier design and pilot work for the Praedial Property Registration system was done by the Peruvian private organization, Instituto Libertad y Democracia (ILD). They claim that in Peru they "have formalized over 150,000 properties much more quickly, and at dramatically less costs, than traditional titling and registration programs" in three-and-a-half years during the early 1990s. This property formalization system has been trademarked as PROFORM. It is being offered to other countries as a quick and inexpensive way to convert informal property in the hands of a large proportion of the population into legally recognized private property, and as a source of capital for the grassroots development of these countries. This study assesses the functioning of this system in Peru and its replicability in other countries. There is no easily accessible documentation on how this property formalization program has actually functioned in Peru, and it is therefore difficult for development agencies to determine its applicability elsewhere. This assessment of the Registro Predial in Peru is an attempt to document the functioning of an important component of this formalization program. The study examines different aspects of property formalization and related institutions and processes. The scope of this assessment, therefore, includes not only the Registro Predial registration system, but also the titling process (prior to registration) and the credit worthiness and credit opportunities for titled and registered property in both urban and rural areas in Lima that fall under the jurisdiction of the Registro Predial. The study also examines the concepts and legal framework of titling, registration, ownership rights, and possession rights within the Peruvian context.Land titles--Registration and transfer--Peru, Land tenure--Government policy--Peru, Land administration--Peru, Land Economics/Use,

    Semantic privacy-preserving framework for electronic health record linkage

    Get PDF
    The combination of digitized health information and web-based technologies offers many possibilities for data analysis and business intelligence. In the healthcare and biomedical research domain, applications depending on electronic health records (EHRs) identify privacy preservation as a major concern. Existing solutions cannot always satisfy the evolving research demands such as linking patient records across organizational boundaries due to the potential for patient re-identification. In this work, we show how semantic methods can be applied to support the formulation and enforcement of access control policy whilst ensuring that privacy leakage can be detected and prevented. The work is illustrated through a case study associated with the Australasian Diabetes Data Network (ADDN – www.addn.org.au), the national paediatric type-1 diabetes data registry, and the Australian Urban Research Infrastructure Network (AURIN – www.aurin.org.au) platform that supports Australia-wide access to urban and built environment data sets. We demonstrate that through extending the eXtensible Access Control Markup Language (XACML) with semantic capabilities, finer-grained access control encompassing data risk disclosure mechanisms can be supported. We discuss the contributions that can be made using this approach to socio-economic development and political management within business systems, and especially those situations where secure data access and data linkage is required

    Automatic privacy and utility evaluation of anonymized documents via deep learning

    Get PDF
    Text anonymization methods are evaluated by comparing their outputs with human-based anonymizations through standard information retrieval (IR) metrics. On the one hand, the residual disclosure risk is quantified with the recall metric, which gives the proportion of re-identifying terms successfully detected by the anonymization algorithm. On the other hand, the preserved utility is measured with the precision metric, which accounts the proportion of masked terms that were also annotated by the human experts. Nevertheless, because these evaluation metrics were meant for information retrieval rather than privacy-oriented tasks, they suffer from several drawbacks. First, they assume a unique ground truth, and this does not hold for text anonymization, where several masking choices could be equally valid to prevent re-identification. Second, annotation-based evaluation relies on human judgements, which are inherently subjective and may be prone to errors. Finally, both metrics weight terms uniformly, thereby ignoring the fact that the influence on the disclosure risk or on utility preservation of some terms may be much larger than of others. To overcome these drawbacks, in this thesis we propose two novel methods to evaluate both the disclosure risk and the utility preserved in anonymized texts. Our approach leverages deep learning methods to perform this evaluation automatically, thereby not requiring human annotations. For assessing disclosure risks, we propose using a re-identification attack, which we define as a multi-class classification task built on top of state-of-the art language models. To make it feasible, the attack has been designed to capture the means and computational resources expected to be available at the attacker's end. For utility assessment, we propose a method that measures the information loss incurred during the anonymization process, which relies on a neural masked language modeling. We illustrate the effectiveness of our methods by evaluating the disclosure risk and retained utility of several well-known techniques and tools for text anonymization on a common dataset. Empirical results show significant privacy risks for all of them (including manual anonymization) and consistently proportional utility preservation

    Property Rights and Development: The Contingent Case for Formalization

    Get PDF

    The effects of land registration on financial development and economic growth - a theoretical and conceptual framework

    Get PDF
    The author develops a theoretical framework to guide empirical analysis of how land registration affects financial development and economic growth. Most conceptual approaches investigate the effects of land registration on only one sector, nut land registration is commonly observed to affect not only other sectors but the economy as a whole. The author builds on the well-tested link between secure land ownership and farm productivity, adding to the framework theory about positive information and transaction costs. To map the relationship between land registration and financial development and economic growth, the framework links: 1) Land tenure security and investment incentives. 2) Land title, collateral, and credit. 3) Land markets, transactions, and efficiency. 4) Labor mobility and efficiency. 5) Land liquidity, deposit mobilization, and investment. Empirical results from applying the framework to a single case study - of Thailand, described in a separate paper - suggest that the framework is sound.Labor Policies,Environmental Economics&Policies,Banks&Banking Reform,Economic Theory&Research,Payment Systems&Infrastructure,Economic Theory&Research,Municipal Financial Management,Rural Land Policies for Poverty Reduction,Environmental Economics&Policies,Banks&Banking Reform

    Generalization-Based k-Anonymization

    Get PDF
    Microaggregation is an anonymization technique consisting on partitioning the data into clusters no smaller than k elements and then replacing the whole cluster by its prototypical representant. Most of microaggregation techniques work on numerical attributes. However, many data sets are described by heterogeneous types of data, i.e., nu- merical and categorical attributes. In this paper we propose a new mi- croaggregation method for achieving a compliant k-anonymous masked file for categorical microdata based on generalization. The goal is to build a generalized description satisfied by at least k domain objects and to replace these domain objects by the description. The way to construct that generalization is similar that the one used in growing decision trees. Records that cannot be generalized satisfactorily are discarded, therefore some information is lost. In the experiments we performed we prove that the new approach gives good results. © Springer International Publishing Switzerland 2015.This research is partially funded by the Spanish MICINN projects COGNITIO (TIN-2012-38450-C03-03), EdeTRI (TIN2012-39348-C02-01) and COPRIVACY (TIN2011-27076-C03-03), the grant 2009-SGR-1434 from the Generalitat de Catalunya, and the European Project DwB (Grant Agreement Number 262608)Peer reviewe
    • …
    corecore