19 research outputs found

    Improvements to Iterated Local Search for Microaggregation

    Get PDF
    Microaggregation is a disclosure control method that uses k-anonymity to protect confidentiality in microdata while seeking minimal information loss. The problem is NP-hard. Iterated local search for microaggregation (ILSM) is an effective metaheuristic algorithm that consistently identifies better quality solutions than extant microaggregation methods. The present work presents improvements to local search, the perturbation operations and acceptance criterion within ILSM. The first, ILSMC, targets changed clusters within local search (LS) to avoid vast numbers of comparison tests, significantly reducing execution times. Second, a new probability distribution yields a better perturbation operator for most cases, significantly reducing the number of iterations needed to find similar quality solutions. A third improves the acceptance criterion by replacing the static balance between intensification and diversification with a dynamic balance. This helps ILSM escape local optima more quickly for some datasets and values of k. Experimental results with benchmark data show that ILSMC consistently reduces execution times significantly. Targeting changed clusters within LS avoids vast numbers of unproductive tests while allowing search to concentrate on more productive ones. Execution times are decreased by more than an order of magnitude for most benchmark test cases. In the worst case it decreased execution times by 75%. Advantageously, the biggest improvements were with the largest datasets. Perturbing clusters with higher information loss tend to reduce information loss more. Biasing the perturbation operations toward clusters with higher information loss increases the rate of improvement by more than 50 percent in the earliest iterations for two of the benchmarks. Occasionally accepting worse solutions provides diversification; however, increasing the probability of accepting worse solutions closer in quality to the current best solution aids in escaping local optima. This increases the rate of improvement by up to 30 percent in the earliest iterations. Combining the new perturbation operation with the new acceptance criterion can further increase the rate of improvement by as much as 20 percent for some test cases. All three improvements are orthogonal and can be combined for additive effect

    Privacy in trajectory micro-data publishing : a survey

    Get PDF
    We survey the literature on the privacy of trajectory micro-data, i.e., spatiotemporal information about the mobility of individuals, whose collection is becoming increasingly simple and frequent thanks to emerging information and communication technologies. The focus of our review is on privacy-preserving data publishing (PPDP), i.e., the publication of databases of trajectory micro-data that preserve the privacy of the monitored individuals. We classify and present the literature of attacks against trajectory micro-data, as well as solutions proposed to date for protecting databases from such attacks. This paper serves as an introductory reading on a critical subject in an era of growing awareness about privacy risks connected to digital services, and provides insights into open problems and future directions for research.Comment: Accepted for publication at Transactions for Data Privac

    Heuristic Algorithm for Univariate Stratification Problem

    Full text link
    In sampling theory, stratification corresponds to a technique used in surveys, which allows segmenting a population into homogeneous subpopulations (strata) to produce statistics with a higher level of precision. In particular, this article proposes a heuristic to solve the univariate stratification problem - widely studied in the literature. One of its versions sets the number of strata and the precision level and seeks to determine the limits that define such strata to minimize the sample size allocated to the strata. A heuristic-based on a stochastic optimization method and an exact optimization method was developed to achieve this goal. The performance of this heuristic was evaluated through computational experiments, considering its application in various populations used in other works in the literature, based on 20 scenarios that combine different numbers of strata and levels of precision. From the analysis of the obtained results, it is possible to verify that the heuristic had a performance superior to four algorithms in the literature in more than 94% of the cases, particularly concerning the known algorithms of Kozak and Lavallee-Hidiroglou.Comment: 25 pages and 7 figure

    Comparative Modeling and Functional Characterization of Two Enzymes of the Cyclooxygenase Pathway in Drosophila melanogaster

    Full text link
    Eicosanoids are biologically active molecules oxygenated from twenty carbon polyunsaturated fatty acids. Natural eicosanoids exert potent biological effects in humans, and a great deal of pharmaceutical research has led to the discovery of compounds for selective inhibition of specific enzymes in eicosanoid biosynthesis. Coupled with different receptors, eicosanoids mediate various physiological and pathophysiological processes, including fever generation, pain response, vasoconstriction, vasodilation, platelet aggregation, platelet declumping, body temperature maintenance and sleep-wake cycle regulation. In mammals, the eicosanoid biosynthesis has three pathways: the cyclooxygenase (COX) pathway, the lipoxygenase (LOX) pathway and the epoxygenase pathway. The COX pathway synthesizes prostanoids, which are important signaling molecules in inflammation. Because of their central role in inflammatory disease and human health, COX enzymes continue to be a focus of intense research as new details emerge about their mechanism of action and their interactions with NSAIDs. To date, the majority of studies dealing with the COX pathway are centered on mammalian systems. Although the literature is rich in speculations that prostaglandins are central signaling molecules for mediating and coordinating insect cellular immunity, genes responsible for encoding COX or COX-like enzymes and other enzymes in the COX pathway have not been reported in insects. The value of Drosophila melanogaster as a model organism is well established, and the fundamental regulatory signaling mechanisms that regulate immunity at the cellular level in human and flies are conserved. Given the importance of eicosanoids in mammalian and insect immunity, this study was designed to identify and characterize the enzymes that mediate eicosanoid biosynthesis in D. melanogaster computationally. After a preliminary extensive search for putative D. melanogaster homologues for all enzymes in the COX pathway, we conducted a systematic, comprehensive, and detailed computational investigation for two enzymes, COX and prostaglandin E synthase (PGES) in an endeavor to model and characterize the possible candidates and identify those that possess all the requisite sequence and structural motifs to qualify as valid COX(s)/PGE synthase proteins. In this study, we report the presence of qualified D. melanogaster COX(s)/PGE synthase proteins, characterize their biophysical properties, and compare them with their mammalian counterparts. This study lays the groundwork for further exploration of these proteins and establishing their role in D. melanogaster inflammation and immunity, opening up avenues for addressing the use of this model organism in COX signaling and its crosstalk with other signaling pathways

    Privacy in trajectory micro-data publishing: a survey

    Get PDF
    International audienceWe survey the literature on the privacy of trajectory micro-data, i.e., spatiotemporal information about the mobility of individuals, whose collection is becoming increasingly simple and frequent thanks to emerging information and communication technologies. The focus of our review is on privacy-preserving data publishing (PPDP), i.e., the publication of databases of trajectory micro-data that preserve the privacy of the monitored individuals. We classify and present the literature of attacks against trajectory micro-data, as well as solutions proposed to date for protecting databases from such attacks. This paper serves as an introductory reading on a critical subject in an era of growing awareness about privacy risks connected to digital services, and provides insights into open problems and future directions for research

    Erzeugung Mehrfach Imputierter Synthetischer Datensätze: Theorie und Implementierung

    Get PDF
    The book describes different approaches to generating multiply imputed synthetic datasets to guarantee confidentiality. Each chapter is dedicated to one approach, first describing the general concept followed by a detailed application to a real dataset providing useful guidelines on how to implement the theory in practice.Die Arbeit beschreibt verschiedene Ansätze zur Erstellung mehrfach imputierter synthetischer Datensätze. Diese Datensätze können der interessierten Fachöffentlichkeit zur Verfügung gestellt werden, ohne den Datenschutz zu verletzen. Jedes Kapitel befasst sich mit einem eigenen Ansatz, wobei zunächst das allgemeine Konzept beschrieben wird. Anschließend bietet eine detailierte Anwendung auf einen realen Datensatz hilfreiche Richtlinien, wie sich die beschriebene Theorie in der Praxis anwenden läßt

    Exact algorithms for minimum sum-of-squares clustering

    Get PDF
    NP-Hardness of Euclidean sum-of-squares clustering -- Computational complexity -- An incorrect reduction from the K-section problem -- A new proof by reduction from the densest cut problem -- Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering -- Reformulation-Linearization technique for the MSSC -- Branch-and-bound for the MSSC -- An attempt at reproducting computational results -- Breaking symmetry and convex hull inequalities -- A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering -- Equivalence of MSSC to 0-1 SDP -- A branch-and cut algorithm for the 0-1 SDP formulation -- Computational experiments -- An improved column generation algorithm for minimum sum-of-squares clustering -- Column generation algorithm revisited -- A geometric approach -- Generalization to the Euclidean space -- Computational results

    Understanding and Exploiting the Latent Space to improve Machine Learning models eXplainability

    Get PDF
    In recent years, Artificial Intelligence (AI) and Machine Learning (ML) systems have dramatically increased their capabilities, achieving human-like or even humansuperior performance in specific tasks. This increased performance has gone hand in hand with an increase in the complexity of AI and ML models, compromising their transparency and trustworthiness and making them inscrutable black boxes for decision making. Explainable AI (XAI) is a field that seeks to make the decisions suggested by ML models more transparent to human users, by providing different types of explanations. This thesis explores the possibility of using a reduced feature space called “latent space”, produced by a particular kind of ML models, as a means for the explanation process. First, we study the possibility of navigating the latent space as a form of interactive explanation to better understand the rationale behind the model’s predictions. Second, we propose an interpretable-by-design approach to make the explanation process completely transparent to the user. Third, we exploit mathematical properties of the latent space of certain ML models (similarity and linearity) to produce explanations that are shown more plausible and accurate than those of existing competitors in the state of the art. In order to validate our approach, we perform extensive benchmarking on different datasets, with respect to both existing metrics and new ones introduced in our work to highlight new XAI problems, beyond current literature

    Privacy by Design in Data Mining

    Get PDF
    Privacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Many recent research works have focused on the study of privacy protection: some of these studies aim at individual privacy, i.e., the protection of sensitive individual data, while others aim at corporate privacy, i.e., the protection of strategic information at organization level. Unfortunately, it is in- creasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze complex data which describes human activities in great detail and resolution. As a result anonymization simply cannot be accomplished by de-identification. In the last few years, several techniques for creating anonymous or obfuscated versions of data sets have been proposed, which essentially aim to find an acceptable trade-off between data privacy on the one hand and data utility on the other. So far, the common result obtained is that no general method exists which is capable of both dealing with “generic personal data” and preserving “generic analytical results”. In this thesis we propose the design of technological frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of data mining technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technol- ogy by design, so that the analysis incorporates the relevant privacy requirements from the start. Therefore, we propose the privacy-by-design paradigm that sheds a new light on the study of privacy protection: once specific assumptions are made about the sensitive data and the target mining queries that are to be answered with the data, it is conceivable to design a framework to: a) transform the source data into an anonymous version with a quantifiable privacy guarantee, and b) guarantee that the target mining queries can be answered correctly using the transformed data instead of the original ones. This thesis investigates on two new research issues which arise in modern Data Mining and Data Privacy: individual privacy protection in data publishing while preserving specific data mining analysis, and corporate privacy protection in data mining outsourcing

    COMPUTATIONAL ANALYSIS OF G-PROTEIN COUPLED RECEPTOR SCREENING, DIMERIZATION, AND DESENSITIZATION

    Full text link
    Mechanistic models of G-protein coupled receptor (GPCR) signaling are used to gain insight into how changes in drug properties affect cellular response. Broadly, this work is divided in to three areas focusing on drug screening, desensitization, and receptor dimerization. First, ordinary differential equation models are used to examine biases in drug screening assays such as those used in drug discovery. It is shown that some screens should be innately biased against detecting inverse agonists and as such may miss pharmaceutically valuable drug leads. However, the results also suggest ways in which the screening assay can be modified to correct this bias. Second, Monte Carlo simulations of protein diffusion and reaction are used to determine the effects of drug properties on GPCR activation and desensitization. For most GPCRs, drugs cause an initial burst of activity (activation) followed by an attenuation of the signal over long times (desensitization). Simulations of this activation and desensitization process show that the mean drug-receptor lifetime can affect desensitization in a way that allows receptor activation and desensitization to be partially decoupled. Third, Monte Carlo simulations of receptor dimerization and diffusion are used to show how dimerization can affect membrane organization. Many membrane bound proteins, including GPCRs, form transient dimers, but the physiological reason for dimerization is not clear. The simulations show that dimerization under diffusion limited conditions can lead to the formation of extended clusters. These clusters, in turn, can alter the receptor internalization rate and the degree of cross-talk among receptors, in agreement with experimental findings. Overall, this work has a variety of implications. Pharmacologically, this work presents a new way of making drug discovery a more rational process by focusing assays toward drugs with desirable efficacies and improved desensitization profiles. Similarly, receptor dimerization could also provide a novel mechanism for affecting drug signaling. For basic biology, the modeling work presented here suggests that dimerization could provide a new way to control protein organization within the cell membrane. Together this work helps us to provide us with a more mechanistic understanding of how cells communicate via GPCRs.http://deepblue.lib.umich.edu/bitstream/2027.42/133962/1/woolf.thesis.pdfDescription of woolf.thesis.pdf : Peter Woolf Thesis Documen
    corecore