34 research outputs found

    Random projection to preserve patient privacy

    Get PDF
    With the availability of accessible and widely used cloud services, it is natural that large components of healthcare systems migrate to them; for example, patient databases can be stored and processed in the cloud. Such cloud services provide enhanced flexibility and additional gains, such as availability, ease of data share, and so on. This trend poses serious threats regarding the privacy of the patients and the trust that an individual must put into the healthcare system itself. Thus, there is a strong need of privacy preservation, achieved through a variety of different approaches. In this paper, we study the application of a random projection-based approach to patient data as a means to achieve two goals: (1) provably mask the identity of users under some adversarial-attack settings, (2) preserve enough information to allow for aggregate data analysis and application of machine-learning techniques. As far as we know, such approaches have not been applied and tested on medical data. We analyze the tradeoff between the loss of accuracy on the outcome of machine-learning algorithms and the resilience against an adversary. We show that random projections proved to be strong against known input/output attacks while offering high quality data, as long as the projected space is smaller than the original space, and as long as the amount of leaked data available to the adversary is limited

    Polynomial Time Approximation Schemes for All 1-Center Problems on Metric Rational Set Similarities

    Get PDF
    In this paper, we investigate algorithms for finding centers of a given collection N of sets. In particular, we focus on metric rational set similarities, a broad class of similarity measures including Jaccard and Hamming. A rational set similarity S is called metric if D= 1 - S is a distance function. We study the 1-center problem on these metric spaces. The problem consists of finding a set C that minimizes the maximum distance of C to any set of N. We present a general framework that computes a (1 + ε) approximation for any metric rational set similarity

    Spectral Relaxations and Fair Densest Subgraphs

    Get PDF
    Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention. In this paper, we address the problem of identifying a densest subgraph, while ensuring that none of one binary protected attribute is disparately impacted. Unfortunately, the underlying algorithmic problem is NP-hard, even in its approximation version: approximating the densest fair subgraph with a polynomial-time algorithm is at least as hard as the densest subgraph problem of at most k vertices, for which no constant approximation algorithms are known. Despite such negative premises, we are able to provide approximation results in two important cases. In particular, we are able to prove that a suitable spectral embedding allows recovery of an almost optimal, fair, dense subgraph hidden in the input data, whenever one is present, a result that is further supported by experimental evidence. We also show a polynomial-time, 22-approximation algorithm, whenever the underlying graph is itself fair. We finally prove that, under the small set expansion hypothesis, this result is tight for fair graphs. The above theoretical findings drive the design of heuristics, which we experimentally evaluate on a scenario based on real data, in which our aim is to strike a good balance between diversity and highly correlated items from Amazon co-purchasing graphs

    The Power of Uniform Sampling for Coresets

    Full text link
    Motivated by practical generalizations of the classic kk-median and kk-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive error. This reduction enables us to construct coresets using uniform sampling, in contrast to the widely-used importance sampling, and consequently we can easily handle constrained objectives. Notably and perhaps surprisingly, this simpler sampling scheme can yield coresets whose size is independent of nn, the number of input points. Our technique yields smaller coresets, and sometimes the first coresets, for a large number of constrained clustering problems, including capacitated clustering, fair clustering, Euclidean Wasserstein barycenter, clustering in minor-excluded graph, and polygon clustering under Fr\'{e}chet and Hausdorff distance. Finally, our technique yields also smaller coresets for 11-median in low-dimensional Euclidean spaces, specifically of size O~(ε1.5)\tilde{O}(\varepsilon^{-1.5}) in R2\mathbb{R}^2 and O~(ε1.6)\tilde{O}(\varepsilon^{-1.6}) in R3\mathbb{R}^3

    Actin Dynamics Regulate Multiple Endosomal Steps during Kaposi's Sarcoma-Associated Herpesvirus Entry and Trafficking in Endothelial Cells

    Get PDF
    The role of actin dynamics in clathrin-mediated endocytosis in mammalian cells is unclear. In this study, we define the role of actin cytoskeleton in Kaposi's sarcoma-associated herpesvirus (KSHV) entry and trafficking in endothelial cells using an immunofluorescence-based assay to visualize viral capsids and the associated cellular components. In contrast to infectivity or reporter assays, this method does not rely on the expression of any viral and reporter genes, but instead directly tracks the accumulation of individual viral particles at the nuclear membrane as an indicator of successful viral entry and trafficking in cells. Inhibitors of endosomal acidification reduced both the percentage of nuclei with viral particles and the total number of viral particles docking at the perinuclear region, indicating endocytosis, rather than plasma membrane fusion, as the primary route for KSHV entry into endothelial cells. Accordingly, a viral envelope protein was only detected on internalized KSHV particles at the early but not late stage of infection. Inhibitors of clathrin- but not caveolae/lipid raft-mediated endocytosis blocked KSHV entry, indicating that clathrin-mediated endocytosis is the major route of KSHV entry into endothelial cells. KSHV particles were colocalized not only with markers of early and recycling endosomes, and lysosomes, but also with actin filaments at the early time points of infection. Consistent with these observations, transferrin, which enters cells by clathrin-mediated endocytosis, was found to be associated with actin filaments together with early and recycling endosomes, and to a lesser degree, with late endosomes and lysosomes. KSHV infection induced dynamic actin cytoskeleton rearrangements. Disruption of the actin cytoskeleton and inhibition of regulators of actin nucleation such as Rho GTPases and Arp2/3 complex profoundly blocked KSHV entry and trafficking. Together, these results indicate an important role for actin dynamics in the internalization and endosomal sorting/trafficking of KSHV and clathrin-mediated endocytosis in endothelial cells

    Analysis of Smith's rule in stochastic machine scheduling

    Get PDF
    In a landmark paper from 1986, Kawaguchi and Kyan show that scheduling jobs according to ratios weight over processing time–also known as Smith’s rule–has a tight performance guarantee of approximately 1.207 for minimizing the weighted sum of completion times in parallel machine scheduling. We prove the counterintuitive result that the performance guarantee of Smith’s rule is not better than 1.243 when processing times are exponentially distributed

    On coresets for logistic regression

    Get PDF
    Coresets are one of the central methods to facilitate the analysis of large data. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show the negative result that no strongly sublinear sized coresets exist for logistic regression. To deal with intractable worst-case instances we introduce a complexity measure µ(X), which quantifies the hardness of compressing a data set for logistic regression. µ(X) has an intuitive statistical interpretation that may be of independent interest. For data sets with bounded µ(X)-complexity, we show that a novel sensitivity sampling scheme produces the first provably sublinear (1 ± ε)-coreset. We illustrate the performance of our method by comparing to uniform sampling as well as to state of the art methods in the area. The experiments are conducted on real world benchmark data for logistic regression

    Algorithms for fair k-clustering with multiple protected attributes

    No full text
    We study fair center based clustering problems. In an influential paper, Chierichetti, Kumar, Lattanzi and Vassilvitskii (NIPS 2017) consider the problem of finding a good clustering, say of women and men, such that every cluster contains an equal number of women and men. They were able to obtain a constant factor approximation for this problem for most center based k-clustering objectives such as k-median, k-means, and k-center. Despite considerable interest in extending this problem for multiple protected attributes (e.g. women and men, with or without citizenship), so far constant factor approximations for these problems have remained elusive except in special cases. We settle this question in the affirmative by giving the first constant factor approximation for a wide range of center based k-clustering objectives

    Random Projection to Preserve Patient Privacy

    Get PDF
    With the availability of accessible and widely used cloud services, it is natural that large components of healthcare systems migrate to them; for example, patient databases can be stored and processed in the cloud. Such cloud services provide enhanced flexibility and additional gains, such as availability, ease of data share, and so on. This trend poses serious threats regarding the privacy of the patients and the trust that an individual must put into the healthcare system itself. Thus, there is a strong need of privacy preservation, achieved through a variety of different approaches. In this paper, we study the application of a random projection-based approach to patient data as a means to achieve two goals: (1) provably mask the identity of users under some adversarial-attack settings, (2) preserve enough information to allow for aggregate data analysis and application of machine-learning techniques. As far as we know, such approaches have not been applied and tested on medical data. We analyze the trade-off between the loss of accuracy on the outcome of machine-learning algorithms and the resilience against an adversary. We show that random projections proved to be strong against known input/output attacks while offering high quality data, as long as the projected space is smaller than the original space, and as long as the amount of leaked data available to the adversary is limited

    (1 + ε)-approximate incremental matching in constant deterministic amortized time

    No full text
    We study the matching problem in the incremental setting, where we are given a sequence of edge insertions and aim at maintaining a near-maximum cardinality matching of the graph with small update time. We present a deterministic algorithm that, for any constant ε > 0, maintains a (1 + ε)-approximate matching with constant amortized update time per insertion
    corecore