167 research outputs found

    Fast Scalable Construction of (Minimal Perfect Hash) Functions

    Full text link
    Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible. In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed. Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03

    Confirmation Sampling for Exact Nearest Neighbor Search

    Get PDF
    Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings. We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability 1−δ , using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of Ω(L) trees with internal parameters specifically tuned to the query and data

    Alternative splicing of TGF-betas and their high-affinity receptors TβRI, TβRII and TβRIII (betaglycan) reveal new variants in human prostatic cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The transforming growth factors (TGF)-β, TGF-β1, TGF-β2 and TGF-β3, and their receptors [TβRI, TβRII, TβRIII (betaglycan)] elicit pleiotropic functions in the prostate. Although expression of the ligands and receptors have been investigated, the splice variants have never been analyzed. We therefore have analyzed all ligands, the receptors and the splice variants TβRIB, TβRIIB and TGF-β2B in human prostatic cells.</p> <p>Results</p> <p>Interestingly, a novel human receptor transcript TβRIIC was identified, encoding additional 36 amino acids in the extracellular domain, that is expressed in the prostatic cancer cells PC-3, stromal hPCPs, and other human tissues. Furthermore, the receptor variant TβRIB with four additional amino acids was identified also in human. Expression of the variant TβRIIB was found in all prostate cell lines studied with a preferential localization in epithelial cells in some human prostatic glands. Similarly, we observed localization of TβRIIC and TGF-β2B mainly in the epithelial cells with a preferential localization of TGF-β2B in the apical cell compartment. Whereas in the androgen-independent hPCPs and PC-3 cells all TGF-β ligands and receptors are expressed, the androgen-dependent LNCaP cells failed to express all ligands. Additionally, stimulation of PC-3 cells with TGF-β2 resulted in a significant and strong increase in secretion of plasminogen activator inhibitor-1 (PAI-1) with a major participation of TβRII.</p> <p>Conclusion</p> <p>In general, expression of the splice variants was more heterogeneous in contrast to the well-known isoforms. The identification of the splice variants TβRIB and the novel isoform TβRIIC in man clearly contributes to the growing complexity of the TGF-β family.</p

    Fair Near Neighbor Search: Independent Range Sampling in High Dimensions. PODS

    Get PDF
    Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the rr-near neighbor (rr-NN) problem: given a radius r>0r>0 and a set of points SS, construct a data structure that, for any given query point qq, returns a point pp within distance at most rr from qq. In this paper, we study the rr-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance rr from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for rr-NN where all points in SS that are near qq have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.Comment: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), Pages 191-204, June 202

    The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

    Get PDF
    This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets. Moreover, the effect of different LID distributions on the running time performance of implementations is empirically studied. To this end, different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles. The paper closes with remarks about the diversity of datasets commonly used for nearest neighbor search benchmarking. It is shown that such real-world datasets are not diverse: results on a single dataset predict results on all other datasets well.Comment: Preprint of the paper accepted at SISAP 201

    Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

    Full text link
    We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions

    Weighting non-covalent forces in the molecular recognition of C60. Relevance of concave–convex complementarity

    Get PDF
    The relative contributions of several weak intermolecular forces to the overall stability of the complexes formed between structurally related receptors and [60]fullerene are compared, revealing a discernible contribution from concave–convex complementarity.Viruela Martin, Pedro Manuel, [email protected] ; Viruela Martin, Rafael, [email protected] ; Orti Guillen, Enrique, [email protected]

    Targeted disruption of Slc2a8 (GLUT8) reduces motility and mitochondrial potential of spermatozoa

    Get PDF
    GLUT8 is a class 3 sugar transport facilitator which is predominantly expressed in testis and also detected in brain, heart, skeletal muscle, adipose tissue, adrenal gland, and liver. Since its physiological function in these tissues is unknown, we generated a Slc2a8 null mouse and characterized its phenotype. Slc2a8 knockout mice appeared healthy and exhibited normal growth, body weight development and glycemic control, indicating that GLUT8 does not play a significant role for maintenance of whole body glucose homeostasis. However, analysis of the offspring distribution of heterozygous mating indicated a lower number of Slc2a8 knockout offspring (30.5:47.3:22.1%, Slc2a8+/+, Slc2a8+/−, and Slc2a8−/− mice, respectively) resulting in a deviation (p = 0.0024) from the expected Mendelian distribution. This difference was associated with lower ATP levels, a reduced mitochondrial membrane potential and a significant reduction of sperm motility of the Slc2a8 knockout in comparison to wild-type spermatozoa. In contrast, number and survival rate of spermatozoa were not altered. These data indicate that GLUT8 plays an important role in the energy metabolism of sperm cells

    Cryptanalysis of GlobalPlatform Secure Channel Protocols

    Get PDF
    GlobalPlatform (GP) card specifications are the de facto standards for the industry of smart cards. Being highly sensitive, GP specifications were defined regarding stringent security requirements. In this paper, we analyze the cryptographic core of these requirements; i.e. the family of Secure Channel Protocols (SCP). Our main results are twofold. First, we demonstrate a theoretical attack against SCP02, which is the most popular protocol in the SCP family. We discuss the scope of our attack by presenting an actual scenario in which a malicious entity can exploit it in order to recover encrypted messages. Second, we investigate the security of SCP03 that was introduced as an amendment in 2009. We find that it provably satisfies strong notions of security. Of particular interest, we prove that SCP03 withstands algorithm substitution attacks (ASAs) defined by Bellare et al. that may lead to secret mass surveillance. Our findings highlight the great value of the paradigm of provable security for standards and certification, since unlike extensive evaluation, it formally guarantees the absence of security flaws

    Legal Paradigm Shifts and Their Impacts on the Socio-Spatial Exclusion of Asylum Seekers in Denmark

    Get PDF
    This chapter discusses the genesis of Denmark’s asylum accommodation system and recent legal and socio-spatial changes as a reaction to the increase of arrivals. By elucidating the structures and objectives of asylum accommodation, I present that the state’s further tightening of restrictive reception and accommodation policies significantly impacts the socio-spatial configurations of accommodations, refugees’ access to housing and their well-being. I discuss the links between the tensioning of laws, the reduction of living conditions and the (re-)constitution of large accommodations as means of socio-spatial exclusion. Applying the case of Denmark’s Hovedstaden Region (Capital Region), I finally argue that asylum accommodation is a central instrument of Denmark’s approaches to strategically isolate forced migrants and to deter them from migrating to Denmark
    corecore