172 research outputs found
Memory-Efficient Algorithms for Finding Needles in Haystacks
One of the most common tasks in cryptography and cryptanalysis is to find
some interesting event (a needle) in an exponentially large collection (haystack) of
possible events, or to demonstrate that no such event is likely to
exist. In particular, we are interested in finding needles which are defined as events that
happen with an unusually high probability of in a haystack which is an almost uniform
distribution on possible events. When the search algorithm can
only sample values from this distribution, the best known time/memory
tradeoff for finding such an event requires time given
memory.
In this paper we develop much faster needle searching algorithms in the common
cryptographic setting in which the distribution is defined
by applying some deterministic function to random inputs.
Such a distribution can be modelled by a random directed graph with vertices in
which almost all the vertices have predecessors while
the vertex we are looking for has an unusually large number of predecessors.
When we are given only a constant amount of memory, we propose a new search methodology which we call
\textbf{NestedRho}. As increases, such random graphs undergo several subtle phase transitions,
and thus the log-log dependence of the time complexity on
becomes a piecewise linear curve which bends four times. Our new algorithm is faster than the
time complexity of the best previous algorithm in the full range of , and in particular
it improves the previous time complexity by a significant factor of for any in the range . When we are given more memory, we show how to combine the \textbf{NestedRho} technique with the parallel collision
search technique in order to further reduce its time complexity. Finally, we show how to apply our new search
technique to more complicated distributions with multiple peaks when we want to find all the peaks whose
probabilities are higher than
Simple Local Computation Algorithms for the General Lovasz Local Lemma
We consider the task of designing Local Computation Algorithms (LCA) for
applications of the Lov\'{a}sz Local Lemma (LLL). LCA is a class of sublinear
algorithms proposed by Rubinfeld et al.~\cite{Ronitt} that have received a lot
of attention in recent years. The LLL is an existential, sufficient condition
for a collection of sets to have non-empty intersection (in applications,
often, each set comprises all objects having a certain property). The
ground-breaking algorithm of Moser and Tardos~\cite{MT} made the LLL fully
constructive, following earlier results by Beck~\cite{beck_lll} and
Alon~\cite{alon_lll} giving algorithms under significantly stronger LLL-like
conditions. LCAs under those stronger conditions were given in~\cite{Ronitt},
where it was asked if the Moser-Tardos algorithm can be used to design LCAs
under the standard LLL condition. The main contribution of this paper is to
answer this question affirmatively. In fact, our techniques yield LCAs for
settings beyond the standard LLL condition
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Reactome pathway analysis: a high-performance in-memory approach
BACKGROUND: Reactome aims to provide bioinformatics tools for visualisation, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Pathway analysis methods have a broad range of applications in physiological and biomedical research; one of the main problems, from the analysis methods performance point of view, is the constantly increasing size of the data samples. RESULTS: Here, we present a new high-performance in-memory implementation of the well-established over-representation analysis method. To achieve the target, the over-representation analysis method is divided in four different steps and, for each of them, specific data structures are used to improve performance and minimise the memory footprint. The first step, finding out whether an identifier in the user’s sample corresponds to an entity in Reactome, is addressed using a radix tree as a lookup table. The second step, modelling the proteins, chemicals, their orthologous in other species and their composition in complexes and sets, is addressed with a graph. The third and fourth steps, that aggregate the results and calculate the statistics, are solved with a double-linked tree. CONCLUSION: Through the use of highly optimised, in-memory data structures and algorithms, Reactome has achieved a stable, high performance pathway analysis service, enabling the analysis of genome-wide datasets within seconds, allowing interactive exploration and analysis of high throughput data. The proposed pathway analysis approach is available in the Reactome production web site either via the AnalysisService for programmatic access or the user submission interface integrated into the PathwayBrowser. Reactome is an open data and open source project and all of its source code, including the one described here, is available in the AnalysisTools repository in the Reactome GitHub (https://github.com/reactome/)
- …