167 research outputs found
Fast Scalable Construction of (Minimal Perfect Hash) Functions
Recent advances in random linear systems on finite fields have paved the way
for the construction of constant-time data structures representing static
functions and minimal perfect hash functions using less space with respect to
existing techniques. The main obstruction for any practical application of
these results is the cubic-time Gaussian elimination required to solve these
linear systems: despite they can be made very small, the computation is still
too slow to be feasible.
In this paper we describe in detail a number of heuristics and programming
techniques to speed up the resolution of these systems by several orders of
magnitude, making the overall construction competitive with the standard and
widely used MWHC technique, which is based on hypergraph peeling. In
particular, we introduce broadword programming techniques for fast equation
manipulation and a lazy Gaussian elimination algorithm. We also describe a
number of technical improvements to the data structure which further reduce
space usage and improve lookup speed.
Our implementation of these techniques yields a minimal perfect hash function
data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based
ones, and a static function data structure which reduces the multiplicative
overhead from 1.23 to 1.03
Confirmation Sampling for Exact Nearest Neighbor Search
Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problem, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings.
We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability 1−δ
, using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of Ω(L) trees with internal parameters specifically tuned to the query and data
Alternative splicing of TGF-betas and their high-affinity receptors TβRI, TβRII and TβRIII (betaglycan) reveal new variants in human prostatic cells
<p>Abstract</p> <p>Background</p> <p>The transforming growth factors (TGF)-β, TGF-β1, TGF-β2 and TGF-β3, and their receptors [TβRI, TβRII, TβRIII (betaglycan)] elicit pleiotropic functions in the prostate. Although expression of the ligands and receptors have been investigated, the splice variants have never been analyzed. We therefore have analyzed all ligands, the receptors and the splice variants TβRIB, TβRIIB and TGF-β2B in human prostatic cells.</p> <p>Results</p> <p>Interestingly, a novel human receptor transcript TβRIIC was identified, encoding additional 36 amino acids in the extracellular domain, that is expressed in the prostatic cancer cells PC-3, stromal hPCPs, and other human tissues. Furthermore, the receptor variant TβRIB with four additional amino acids was identified also in human. Expression of the variant TβRIIB was found in all prostate cell lines studied with a preferential localization in epithelial cells in some human prostatic glands. Similarly, we observed localization of TβRIIC and TGF-β2B mainly in the epithelial cells with a preferential localization of TGF-β2B in the apical cell compartment. Whereas in the androgen-independent hPCPs and PC-3 cells all TGF-β ligands and receptors are expressed, the androgen-dependent LNCaP cells failed to express all ligands. Additionally, stimulation of PC-3 cells with TGF-β2 resulted in a significant and strong increase in secretion of plasminogen activator inhibitor-1 (PAI-1) with a major participation of TβRII.</p> <p>Conclusion</p> <p>In general, expression of the splice variants was more heterogeneous in contrast to the well-known isoforms. The identification of the splice variants TβRIB and the novel isoform TβRIIC in man clearly contributes to the growing complexity of the TGF-β family.</p
Fair Near Neighbor Search: Independent Range Sampling in High Dimensions. PODS
Similarity search is a fundamental algorithmic primitive, widely used in many
computer science disciplines. There are several variants of the similarity
search problem, and one of the most relevant is the -near neighbor (-NN)
problem: given a radius and a set of points , construct a data
structure that, for any given query point , returns a point within
distance at most from . In this paper, we study the -NN problem in
the light of fairness. We consider fairness in the sense of equal opportunity:
all points that are within distance from the query should have the same
probability to be returned. In the low-dimensional case, this problem was first
studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the
theoretically strongest approach to similarity search in high dimensions, does
not provide such a fairness guarantee. To address this, we propose efficient
data structures for -NN where all points in that are near have the
same probability to be selected and returned by the query. Specifically, we
first propose a black-box approach that, given any LSH scheme, constructs a
data structure for uniformly sampling points in the neighborhood of a query.
Then, we develop a data structure for fair similarity search under inner
product that requires nearly-linear space and exploits locality sensitive
filters. The paper concludes with an experimental evaluation that highlights
(un)fairness in a recommendation setting on real-world datasets and discusses
the inherent unfairness introduced by solving other variants of the problem.Comment: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on
Principles of Database Systems (PODS), Pages 191-204, June 202
The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search
This paper reconsiders common benchmarking approaches to nearest neighbor
search. It is shown that the concept of local intrinsic dimensionality (LID)
allows to choose query sets of a wide range of difficulty for real-world
datasets. Moreover, the effect of different LID distributions on the running
time performance of implementations is empirically studied. To this end,
different visualization concepts are introduced that allow to get a more
fine-grained overview of the inner workings of nearest neighbor search
principles. The paper closes with remarks about the diversity of datasets
commonly used for nearest neighbor search benchmarking. It is shown that such
real-world datasets are not diverse: results on a single dataset predict
results on all other datasets well.Comment: Preprint of the paper accepted at SISAP 201
Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs
We demonstrate that a graph-based search algorithm-relying on the
construction of an approximate neighborhood graph-can directly work with
challenging non-metric and/or non-symmetric distances without resorting to
metric-space mapping and/or distance symmetrization, which, in turn, lead to
substantial performance degradation. Although the straightforward metrization
and symmetrization is usually ineffective, we find that constructing an index
using a modified, e.g., symmetrized, distance can improve performance. This
observation paves a way to a new line of research of designing index-specific
graph-construction distance functions
Weighting non-covalent forces in the molecular recognition of C60. Relevance of concave–convex complementarity
The relative contributions of several weak intermolecular forces to the overall stability of the complexes formed between structurally related receptors and [60]fullerene are compared, revealing a discernible contribution from concave–convex complementarity.Viruela Martin, Pedro Manuel, [email protected] ; Viruela Martin, Rafael, [email protected] ; Orti Guillen, Enrique, [email protected]
Targeted disruption of Slc2a8 (GLUT8) reduces motility and mitochondrial potential of spermatozoa
GLUT8 is a class 3 sugar transport facilitator which is predominantly expressed in testis and also detected in brain, heart, skeletal muscle, adipose tissue, adrenal gland, and liver. Since its physiological function in these tissues is unknown, we generated a Slc2a8 null mouse and characterized its phenotype. Slc2a8 knockout mice appeared healthy and exhibited normal growth, body weight development and glycemic control, indicating that GLUT8 does not play a significant role for maintenance of whole body glucose homeostasis. However, analysis of the offspring distribution of heterozygous mating indicated a lower number of Slc2a8 knockout offspring (30.5:47.3:22.1%, Slc2a8+/+, Slc2a8+/−, and Slc2a8−/− mice, respectively) resulting in a deviation (p = 0.0024) from the expected Mendelian distribution. This difference was associated with lower ATP levels, a reduced mitochondrial membrane potential and a significant reduction of sperm motility of the Slc2a8 knockout in comparison to wild-type spermatozoa. In contrast, number and survival rate of spermatozoa were not altered. These data indicate that GLUT8 plays an important role in the energy metabolism of sperm cells
Cryptanalysis of GlobalPlatform Secure Channel Protocols
GlobalPlatform (GP) card specifications are the de facto standards for the industry of smart cards. Being highly sensitive, GP specifications were defined regarding stringent security requirements. In this paper, we analyze the cryptographic core of these requirements; i.e. the family of Secure Channel Protocols (SCP). Our main results are twofold. First, we demonstrate a theoretical attack against SCP02, which is the most popular protocol in the SCP family. We discuss the scope of our attack by presenting an actual scenario in which a malicious entity can exploit it in order to recover encrypted messages. Second, we investigate the security of SCP03 that was introduced as an amendment in 2009. We find that it provably satisfies strong notions of security. Of particular interest, we prove that SCP03 withstands algorithm substitution attacks (ASAs) defined by Bellare et al. that may lead to secret mass surveillance. Our findings highlight the great value of the paradigm of provable security for standards and certification, since unlike extensive evaluation, it formally guarantees the absence of security flaws
Legal Paradigm Shifts and Their Impacts on the Socio-Spatial Exclusion of Asylum Seekers in Denmark
This chapter discusses the genesis of Denmark’s asylum accommodation system and recent legal and socio-spatial changes as a reaction to the increase of arrivals. By elucidating the structures and objectives of asylum accommodation, I present that the state’s further tightening of restrictive reception and accommodation policies significantly impacts the socio-spatial configurations of accommodations, refugees’ access to housing and their well-being. I discuss the links between the tensioning of laws, the reduction of living conditions and the (re-)constitution of large accommodations as means of socio-spatial exclusion. Applying the case of Denmark’s Hovedstaden Region (Capital Region), I finally argue that asylum accommodation is a central instrument of Denmark’s approaches to strategically isolate forced migrants and to deter them from migrating to Denmark
- …