620 research outputs found

    Scalability and Total Recall with Fast CoveringLSH

    Get PDF
    Locality-sensitive hashing (LSH) has emerged as the dominant algorithmic technique for similarity search with strong performance guarantees in high-dimensional spaces. A drawback of traditional LSH schemes is that they may have \emph{false negatives}, i.e., the recall is less than 100\%. This limits the applicability of LSH in settings requiring precise performance guarantees. Building on the recent theoretical "CoveringLSH" construction that eliminates false negatives, we propose a fast and practical covering LSH scheme for Hamming space called \emph{Fast CoveringLSH (fcLSH)}. Inheriting the design benefits of CoveringLSH our method avoids false negatives and always reports all near neighbors. Compared to CoveringLSH we achieve an asymptotic improvement to the hash function computation time from O(dL)\mathcal{O}(dL) to O(d+LlogL)\mathcal{O}(d + L\log{L}), where dd is the dimensionality of data and LL is the number of hash tables. Our experiments on synthetic and real-world data sets demonstrate that \emph{fcLSH} is comparable (and often superior) to traditional hashing-based approaches for search radius up to 20 in high-dimensional Hamming space.Comment: Short version appears in Proceedings of CIKM 201

    LINVIEW: Incremental View Maintenance for Complex Analytical Queries

    Full text link
    Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.Comment: 14 pages, SIGMO

    Applying semantic web technologies to knowledge sharing in aerospace engineering

    Get PDF
    This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

    Elucidating the role of Staphylococcus epidermidis serine-aspartate repeat protein G in platelet activation.

    Get PDF
    BACKGROUND: Staphylococcus epidermidis is a commensal of the human skin that has been implicated in infective endocarditis and infections involving implanted medical devices. S. epidermidis induces platelet aggregation by an unknown mechanism. The fibrinogen-binding protein serine-aspartate repeat protein G (SdrG) is present in 67-91% of clinical strains. OBJECTIVES: To determine whether SdrG plays a role in platelet activation, and if so to investigate the role of fibrinogen in this mechanism. METHODS: SdrG was expressed in a surrogate host, Lactococcus lactis, in order to investigate its role in the absence of other staphylococcal components. Platelet adhesion and platelet aggregation assays were employed. RESULTS: L. lactis expressing SdrG stimulated platelet aggregation (lag time: 2.9 +/- 0.5 min), whereas the L. lactis control did not. L. lactis SdrG-induced aggregation was inhibited by alpha(IIb)beta3 antagonists and aspirin. Aggregation was dependent on both fibrinogen and IgG, and the platelet IgG receptor FcgammaRIIa. Preincubation of the bacteria with Bbeta-chain fibrinopeptide inhibited aggregation (delaying the lag time six-fold), suggesting that fibrinogen acts as a bridging molecule. Platelets adhered to L. lactis SdrG in the absence of fibrinogen. Adhesion was inhibited by alpha(IIb)beta3 antagonists, suggesting that this direct interaction involves alpha(IIb)beta3. Investigation using purified fragments of SdrG revealed a direct interaction with the B-domains. Adhesion to the A-domain involved both a fibrinogen and an IgG bridge. CONCLUSION: SdrG alone is sufficient to support platelet adhesion and aggregation through both direct and indirect mechanisms

    On correctness in RDF stream processor benchmarking

    Get PDF
    Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness verification using an automatic method

    SRBench: A streaming RDF/SPARQL benchmark

    Full text link
    We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet omprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art

    Space-optimal Heavy Hitters with Strong Error Bounds

    Get PDF
    The problem of finding heavy hitters and approximating the frequencies of items is at the heart of many problems in data stream analysis. It has been observed that several proposed solutions to this problem can outperform their worst-case guarantees on real data. This leads to the question of whether some stronger bounds can be guaranteed. We answer this in the positive by showing that a class of "counter-based algorithms" (including the popular and very space-efficient FREQUENT and SPACESAVING algorithms) provide much stronger approximation guarantees than previously known. Specifically, we show that errors in the approximation of individual elements do not depend on the frequencies of the most frequent elements, but only on the frequency of the remaining "tail." This shows that counter-based methods are the most space-efficient (in fact, space-optimal) algorithms having this strong error bound. This tail guarantee allows these algorithms to solve the "sparse recovery" problem. Here, the goal is to recover a faithful representation of the vector of frequencies, f. We prove that using space O(k), the algorithms construct an approximation f* to the frequency vector f so that the L1 error ||f -- f*||[subscript 1] is close to the best possible error min[subscript f2] ||f2 -- f||[subscript 1], where f2 ranges over all vectors with at most k non-zero entries. This improves the previously best known space bound of about O(k log n) for streams without element deletions (where n is the size of the domain from which stream elements are drawn). Other consequences of the tail guarantees are results for skewed (Zipfian) data, and guarantees for accuracy of merging multiple summarized streams.David & Lucile Packard Foundation (Fellowship)Center for Massive Data Algorithmics (MADALGO)National Science Foundation (U.S.). (Grant number CCF-0728645

    A Nonexistence Result for Abelian Menon Difference Sets Using Perfect Binary Arrays

    Get PDF
    A Menon difference set has the parameters (4N2, 2N2-N, N2-N). In the abelian case it is equivalent to a perfect binary array, which is a multi-dimensional matrix with elements ±1 such that all out-of-phase periodic autocorrelation coefficients are zero. Suppose that the abelian group H×K×Zpα contains a Menon difference set, where p is an odd prime, |K|=pα, and pj≡−1 (mod exp (H)) for some j. Using the viewpoint of perfect binary arrays we prove that K must be cyclic. A corollary is that there exists a Menon difference set in the abelian group H×K×Z3α, where exp (H)=2 or 4 and |K|=3α, if and only if K is cyclic
    corecore