34 research outputs found

    Differentially private correlation clustering

    Get PDF
    Correlation clustering is a widely used technique in unsupervised machine learning. Motivated by applications where individual privacy is a concern, we initiate the study of differentially private correlation clustering. We propose an algorithm that achieves subquadratic additive error compared to the optimal cost. In contrast, straightforward adaptations of existing non-private algorithms all lead to a trivial quadratic error. Finally, we give a lower bound showing that any pure differentially private algorithm for correlation clustering requires additive error of Ω(n)

    Differentially private correlation clustering

    Get PDF
    Correlation clustering is a widely used technique in unsupervised machine learning. Motivated by applications where individual privacy is a concern, we initiate the study of differentially private correlation clustering. We propose an algorithm that achieves subquadratic additive error compared to the optimal cost. In contrast, straightforward adaptations of existing non-private algorithms all lead to a trivial quadratic error. Finally, we give a lower bound showing that any pure differentially private algorithm for correlation clustering requires additive error of Ω(n)\Omega(n)

    Tight bounds for Double Coverage against weak adversaries

    Get PDF
    We study the Double Coverage (DC) algorithm for the k-server problem in tree metrics in the (h,k)-setting, i.e., when DC with k servers is compared against an offline optimum algorithm with h ≤ k servers. It is well-known that in such metric spaces DC is k-competitive (and thus optimal) for h = k. We prove that even if k > h the competitive ratio of DC does not improve; in fact, it increases slightly as k grows, tending to h + 1. Specifically, we give matching upper and lower bounds of (k(h+1)) / (k+1) on the competitive ratio of DC on any tree metric

    Monodopsis and Vischeria genomes shed new light on the biology of eustigmatophyte algae

    Get PDF
    Acknowledgment This study was supported by the National Science Foundation Dimensions of Biodiversity grant (1831428) to F.-W.L., and the Czech Science Foundation grant 20-27648S to M.E. We thank the reviewers and editor for their thoughtful commentsPeer reviewedPublisher PD

    Learning-augmented dynamic power management with multiple states via new ski rental bounds

    Get PDF
    We study the online problem of minimizing power consumption in systems with multiple power-saving states. During idle periods of unknown lengths, an algorithm has to choose between power-saving states of different energy consumption and wake-up costs. We develop a learning-augmented online algorithm that makes decisions based on (potentially inaccurate) predicted lengths of the idle periods. The algorithm's performance is near-optimal when predictions are accurate and degrades gracefully with increasing prediction error, with a worst-case guarantee almost identical to the optimal classical online algorithm for the problem. A key ingredient in our approach is a new algorithm for the online ski rental problem in the learning augmented setting with tight dependence on the prediction error. We support our theoretical findings with experiments

    Extreme genome diversity in the hyper-prevalent parasitic eukaryote Blastocystis

    Get PDF
    Blastocystis is the most prevalent eukaryotic microbe colonizing the human gut, infecting approximately 1 billion individuals worldwide. Although Blastocystis has been linked to intestinal disorders, its pathogenicity remains controversial because most carriers are asymptomatic. Here, the genome sequence of Blastocystis subtype (ST) 1 is presented and compared to previously published sequences for ST4 and ST7. Despite a conserved core of genes, there is unexpected diversity between these STs in terms of their genome sizes, guanine-cytosine (GC) content, intron numbers, and gene content. ST1 has 6,544 protein-coding genes, which is several hundred more than reported for ST4 and ST7. The percentage of proteins unique to each ST ranges from 6.2% to 20.5%, greatly exceeding the differences observed within parasite genera. Orthologous proteins also display extreme divergence in amino acid sequence identity between STs (i.e., 59%–61%median identity), on par with observations of the most distantly related species pairs of parasite genera. The STs also display substantial variation in gene family distributions and sizes, especially for protein kinase and protease gene families, which could reflect differences in virulence. It remains to be seen to what extent these inter-ST differences persist at the intra-ST level. A full 26% of genes in ST1 have stop codons that are created on the mRNA level by a novel polyadenylation mechanism found only in Blastocystis. Reconstructions of pathways and organellar systems revealed that ST1 has a relatively complete membrane-trafficking system and a near-complete meiotic toolkit, possibly indicating a sexual cycle. Unlike some intestinal protistan parasites, Blastocystis ST1 has near-complete de novo pyrimidine, purine, and thiamine biosynthesis pathways and is unique amongst studied stramenopiles in being able to metabolize ?-glucans rather than ?-glucans. It lacks all genes encoding heme-containing cytochrome P450 proteins. Predictions of the mitochondrion-related organelle (MRO) proteome reveal an expanded repertoire of functions, including lipid, cofactor, and vitamin biosynthesis, as well as proteins that may be involved in regulating mitochondrial morphology and MRO/endoplasmic reticulum (ER) interactions. In sharp contrast, genes for peroxisome-associated functions are absent, suggesting Blastocystis STs lack this organelle. Overall, this study provides an important window into the biology of Blastocystis, showcasing significant differences between STs that can guide future experimental investigations into differences in their virulence and clarifying the roles of these organisms in gut health and disease

    Online metric algorithms with untrusted predictions

    No full text
    Machine-learned predictors, although achieving very good results for inputs resembling training data, cannot possibly provide perfect predictions in all situations. Still, decision-making systems that are based on such predictors need not only to benefit from good predictions but also to achieve a decent performance when the predictions are inadequate. In this paper, we propose a prediction setup for arbitrary metrical task systems (MTS) (e.g., caching, k-server and convex body chasing) and online matching on the line. We utilize results from the theory of online algorithms to show how to make the setup robust. Specifically for caching, we present an algorithm whose performance, as a function of the prediction error, is exponentially better than what is achievable for general MTS. Finally, we present an empirical evaluation of our methods on real world datasets, which suggests practicality

    The draft nuclear genome sequence and predicted mitochondrial proteome of Andalucia godoyi, a protist with the most gene-rich and bacteria-like mitochondrial genome

    Get PDF
    [Background] Comparative analyses have indicated that the mitochondrion of the last eukaryotic common ancestor likely possessed all the key core structures and functions that are widely conserved throughout the domain Eucarya. To date, such studies have largely focused on animals, fungi, and land plants (primarily multicellular eukaryotes); relatively few mitochondrial proteomes from protists (primarily unicellular eukaryotic microbes) have been examined. To gauge the full extent of mitochondrial structural and functional complexity and to identify potential evolutionary trends in mitochondrial proteomes, more comprehensive explorations of phylogenetically diverse mitochondrial proteomes are required. In this regard, a key group is the jakobids, a clade of protists belonging to the eukaryotic supergroup Discoba, distinguished by having the most gene-rich and most bacteria-like mitochondrial genomes discovered to date.[Results] In this study, we assembled the draft nuclear genome sequence for the jakobid Andalucia godoyi and used a comprehensive in silico approach to infer the nucleus-encoded portion of the mitochondrial proteome of this protist, identifying 864 candidate mitochondrial proteins. The A. godoyi mitochondrial proteome has a complexity that parallels that of other eukaryotes, while exhibiting an unusually large number of ancestral features that have been lost particularly in opisthokont (animal and fungal) mitochondria. Notably, we find no evidence that the A. godoyi nuclear genome has or had a gene encoding a single-subunit, T3/T7 bacteriophage-like RNA polymerase, which functions as the mitochondrial transcriptase in all eukaryotes except the jakobids.[Conclusions] As genome and mitochondrial proteome data have become more widely available, a strikingly punctuate phylogenetic distribution of different mitochondrial components has been revealed, emphasizing that the pathways of mitochondrial proteome evolution are likely complex and lineage-specific. Unraveling this complexity will require comprehensive comparative analyses of mitochondrial proteomes from a phylogenetically broad range of eukaryotes, especially protists. The systematic in silico approach described here offers a valuable adjunct to direct proteomic analysis (e.g., via mass spectrometry), particularly in cases where the latter approach is constrained by sample limitation or other practical considerations.Peer reviewe

    Paging with Succinct Predictions

    Get PDF
    Paging is a prototypical problem in the area of online algorithms. It has also played a central role in the development of learning-augmented algorithms -- a recent line of research that aims to ameliorate the shortcomings of classical worst-case analysis by giving algorithms access to predictions. Such predictions can typically be generated using a machine learning approach, but they are inherently imperfect. Previous work on learning-augmented paging has investigated predictions on (i) when the current page will be requested again (reoccurrence predictions), (ii) the current state of the cache in an optimal algorithm (state predictions), (iii) all requests until the current page gets requested again, and (iv) the relative order in which pages are requested. We study learning-augmented paging from the new perspective of requiring the least possible amount of predicted information. More specifically, the predictions obtained alongside each page request are limited to one bit only. We consider two natural such setups: (i) discard predictions, in which the predicted bit denotes whether or not it is ``safe'' to evict this page, and (ii) phase predictions, where the bit denotes whether the current page will be requested in the next phase (for an appropriate partitioning of the input into phases). We develop algorithms for each of the two setups that satisfy all three desirable properties of learning-augmented algorithms -- that is, they are consistent, robust and smooth -- despite being limited to a one-bit prediction per request. We also present lower bounds establishing that our algorithms are essentially best possible
    corecore