365 research outputs found

    Supervised learning using a symmetric bilinear form for record linkage

    Get PDF
    Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data gives an estimation of the disclosure risk. In this paper we propose a new parameterized aggregation operator and a supervised learning method for disclosure risk assessment. The parameterized operator is a symmetric bilinear form and the supervised learning method is formalized as an optimization problem. The target of the optimization problem is to find the values of the aggregation parameters that maximize the number of re-identification (or correct links). We evaluate and compare our proposal with other non-parametrized variations of record linkage, such as those using the Mahalanobis distance and the Euclidean distance (one of the most used approaches for this purpose). Additionally, we also compare it with other previously presented parameterized aggregation operators for record linkage such as the weighted mean and the Choquet integral. From these comparisons we show how the proposed aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators. We also study which are the necessary optimization problem conditions to consider the described aggregation functions as metric functions

    Fuzzy measures and integrals in re-identification problems

    Get PDF
    In this paper we give an overview of our approach of using aggregation operators, and more specifically, fuzzy integrals for solving re-identification problems. We show that the use of Choquet integrals are suitable for some kind of problems.Postprint (author’s final draft

    Supervised learning using a symmetric bilinear form for record linkage

    Full text link

    Aprendizaje supervisado para el enlace de registros a través de la media ponderada

    Get PDF
    En el área de la privacidad de datos, las técnicas para el enlace de registros son utilizadas para evaluar el riesgo de revelación de un conjunto de datos protegido. La idea principal detrás de estas técnicas es enlazar registros que hacen referencia a un mismo individuo, entre diferentes bases de datos. En este trabajo se presenta una variación del enlace de registros basada en una media ponderada para calcular distancias entre registros. Mediante el uso de un método supervisado de aprendizaje nuestra propuesta permite determinar cuáles son los pesos que maximizan el número de enlaces entre los registros de la base de datos original y su versión protegida. El resultado de este trabajo se aplica en la estimación del riesgo de revelación de datos protegidos.Esta investigación está parcialmente financiada por el MICINN (proyectos ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, TIN2010-15764 y TIN2011-27076-C03-03) y por by the EC (FP7/2007-2013) Data without Boundaries (número de subvención 262608). Algunos de los resultados presentados en este artículo han sido obtenidos gracias al Centro de Supercomputación de Galicia (CESGA). El trabajo contribuido por el primer autor ha sido parte de un programa de doctorado en Informática de la Universidad Autónoma de Barcelona (UAB)

    Aprendizaje supervisado para el enlace de registros a través de la media ponderada

    Get PDF
    En el área de la privacidad de datos, las técnicas para el enlace de registros son utilizadas para evaluar el riesgo de revelación de un conjunto de datos protegido. La idea principal detrás de estas técnicas es enlazar registros que hacen referencia a un mismo individuo, entre diferentes bases de datos. En este trabajo se presenta una variación del enlace de registros basada en una media ponderada para calcular distancias entre registros. Mediante el uso de un método supervisado de aprendizaje nuestra propuesta permite determinar cuáles son los pesos que maximizan el número de enlaces entre los registros de la base de datos original y su versión protegida. El resultado de este trabajo se aplica en la estimación del riesgo de revelación de datos protegidos.Esta investigacion esta parcialmente financiada por el MICINN (proyectos ARES-CONSOLIDER INGENIO 2010 CSD2007-00004, TIN2010-15764 y TIN2011-27076-C03-03) y por by the EC (FP7/2007-2013) Data without Boundaries (numero de subvencion 262608). Algunos de los resultados presentados en este artıculo han sido obtenidos gracias al Centro de Supercomputacion de Galicia (CESGA). El trabajo contribuido por el primer autor ha sido parte de un programa de doctorado en Informatica de la Universidad Autonoma de Barcelona (UAB).Peer Reviewe

    Efficient Data Driven Multi Source Fusion

    Get PDF
    Data/information fusion is an integral component of many existing and emerging applications; e.g., remote sensing, smart cars, Internet of Things (IoT), and Big Data, to name a few. While fusion aims to achieve better results than what any one individual input can provide, often the challenge is to determine the underlying mathematics for aggregation suitable for an application. In this dissertation, I focus on the following three aspects of aggregation: (i) efficient data-driven learning and optimization, (ii) extensions and new aggregation methods, and (iii) feature and decision level fusion for machine learning with applications to signal and image processing. The Choquet integral (ChI), a powerful nonlinear aggregation operator, is a parametric way (with respect to the fuzzy measure (FM)) to generate a wealth of aggregation operators. The FM has 2N variables and N(2N − 1) constraints for N inputs. As a result, learning the ChI parameters from data quickly becomes impractical for most applications. Herein, I propose a scalable learning procedure (which is linear with respect to training sample size) for the ChI that identifies and optimizes only data-supported variables. As such, the computational complexity of the learning algorithm is proportional to the complexity of the solver used. This method also includes an imputation framework to obtain scalar values for data-unsupported (aka missing) variables and a compression algorithm (lossy or losselss) of the learned variables. I also propose a genetic algorithm (GA) to optimize the ChI for non-convex, multi-modal, and/or analytical objective functions. This algorithm introduces two operators that automatically preserve the constraints; therefore there is no need to explicitly enforce the constraints as is required by traditional GA algorithms. In addition, this algorithm provides an efficient representation of the search space with the minimal set of vertices. Furthermore, I study different strategies for extending the fuzzy integral for missing data and I propose a GOAL programming framework to aggregate inputs from heterogeneous sources for the ChI learning. Last, my work in remote sensing involves visual clustering based band group selection and Lp-norm multiple kernel learning based feature level fusion in hyperspectral image processing to enhance pixel level classification

    Design of an Integrated Analytics Platform for Healthcare Assessment Centered on the Episode of Care

    Full text link
    Assessing care quality and performance is essential to improve healthcare processes and population health management. However, due to bad system design and lack of access to required data, this assessment is often delayed or not done at all. The goal of our research is to investigate an advanced analytics platform that enables healthcare quality and performance assessment. We used a user-centered design approach to identify the system requirements and have the concept of episode of care as the building block of information for a key performance indicator analytics system. We implemented architecture and interface prototypes, and performed a usability test with hospital users with managerial roles. The results show that by using user-centered design we created an analytical platform that provides a holistic and integrated view of the clinical, financial and operational aspects of the institution. Our encouraging results warrant further studies to understand other aspects of usability

    Process-oriented risk assessment methodology for manufacturing process evaluation

    Get PDF
    A process-oriented risk assessment methodology is proposed. Risks involved in a process and the corresponding risk factors are identified through an objectives-oriented risk identification approach and evaluated qualitatively in the Process FMEA. The critical risks of the PFMEA are then incorporated in the process model for further quantitative analysis employing simulation technique. Using the proposed methodology as a decision-making tool, alternative scenarios are developed and evaluated against the developed risk measures. The risk measures values issues out of simulation are normalized and aggregated to form a global risk indicator to rank the alternative processes on the basis of desirability. The methodology is illustrated with a case study issued from parts manufacturing but is applicable to a wide range of other processes

    Transient anchorage of cross-linked glycosyl-phosphatidylinositol–anchored proteins depends on cholesterol, Src family kinases, caveolin, and phosphoinositides

    Get PDF
    How outer leaflet plasma membrane components, including glycosyl-phosphatidylinositol–anchored proteins (GPIAPs), transmit signals to the cell interior is an open question in membrane biology. By deliberately cross-linking several GPIAPs under antibody-conjugated 40-nm gold particles, transient anchorage of the gold particle–induced clusters of both Thy-1 and CD73, a 5′ exonucleotidase, occurred for periods ranging from 300 ms to 10 s in fibroblasts. Transient anchorage was abolished by cholesterol depletion, addition of the Src family kinase (SFK) inhibitor PP2, or in Src-Yes-Fyn knockout cells. Caveolin-1 knockout cells exhibited a reduced transient anchorage time, suggesting the partial participation of caveolin-1. In contrast, a transmembrane protein, the cystic fibrosis transmembrane conductance regulator, exhibited transient anchorage that occurred without deliberately enhanced cross-linking; moreover, it was only slightly inhibited by cholesterol depletion or SFK inhibition and depended completely on the interaction of its PDZ-binding domain with the cytoskeletal adaptor EBP50. We propose that cross-linked GPIAPs become transiently anchored via a cholesterol-dependent SFK-regulatable linkage between a transmembrane cluster sensor and the cytoskeleton
    • …
    corecore