172 research outputs found

    Memory-Efficient Algorithms for Finding Needles in Haystacks

    Get PDF
    One of the most common tasks in cryptography and cryptanalysis is to find some interesting event (a needle) in an exponentially large collection (haystack) of N=2nN=2^n possible events, or to demonstrate that no such event is likely to exist. In particular, we are interested in finding needles which are defined as events that happen with an unusually high probability of p≫1/Np \gg 1/N in a haystack which is an almost uniform distribution on NN possible events. When the search algorithm can only sample values from this distribution, the best known time/memory tradeoff for finding such an event requires O(1/Mp2)O(1/Mp^2) time given O(M)O(M) memory. In this paper we develop much faster needle searching algorithms in the common cryptographic setting in which the distribution is defined by applying some deterministic function ff to random inputs. Such a distribution can be modelled by a random directed graph with NN vertices in which almost all the vertices have O(1)O(1) predecessors while the vertex we are looking for has an unusually large number of O(pN)O(pN) predecessors. When we are given only a constant amount of memory, we propose a new search methodology which we call \textbf{NestedRho}. As pp increases, such random graphs undergo several subtle phase transitions, and thus the log-log dependence of the time complexity TT on pp becomes a piecewise linear curve which bends four times. Our new algorithm is faster than the O(1/p2)O(1/p^2) time complexity of the best previous algorithm in the full range of 1/N<p<11/N<p<1, and in particular it improves the previous time complexity by a significant factor of N\sqrt{N} for any pp in the range N−0.75<p<N−0.5N^{-0.75}<p< N^{-0.5}. When we are given more memory, we show how to combine the \textbf{NestedRho} technique with the parallel collision search technique in order to further reduce its time complexity. Finally, we show how to apply our new search technique to more complicated distributions with multiple peaks when we want to find all the peaks whose probabilities are higher than pp

    Simple Local Computation Algorithms for the General Lovasz Local Lemma

    Full text link
    We consider the task of designing Local Computation Algorithms (LCA) for applications of the Lov\'{a}sz Local Lemma (LLL). LCA is a class of sublinear algorithms proposed by Rubinfeld et al.~\cite{Ronitt} that have received a lot of attention in recent years. The LLL is an existential, sufficient condition for a collection of sets to have non-empty intersection (in applications, often, each set comprises all objects having a certain property). The ground-breaking algorithm of Moser and Tardos~\cite{MT} made the LLL fully constructive, following earlier results by Beck~\cite{beck_lll} and Alon~\cite{alon_lll} giving algorithms under significantly stronger LLL-like conditions. LCAs under those stronger conditions were given in~\cite{Ronitt}, where it was asked if the Moser-Tardos algorithm can be used to design LCAs under the standard LLL condition. The main contribution of this paper is to answer this question affirmatively. In fact, our techniques yield LCAs for settings beyond the standard LLL condition

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

    Reactome pathway analysis: a high-performance in-memory approach

    Get PDF
    BACKGROUND: Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples. RESULTS: Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user’s sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree. CONCLUSION: Through the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub (https://github.com/reactome/)
    • …
    corecore