1,408 research outputs found
Recommended from our members
Text Indexing for Long Patterns: Anchors are All you Need
PVLDB Artifact Availability:
The source code, data, and/or other artifacts have been made available at https://github.com/lorrainea/BDA- index.Copyright © 2023 the owner/author(s). In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to represent such string datasets in a compact form but also to simultaneously enable fast pattern matching queries. This is the classic text indexing problem. The four absolute measures anyone should pay attention to when designing or implementing a text index are: (i) index space; (ii) query time; (iii) construction space; and (iv) construction time. Unfortunately, however, most (if not all) widely-used indexes (e.g., suffix tree, suffix array, or their compressed counterparts) are not optimized for all four measures simultaneously, as it is difficult to have the best of all four worlds. Here, we take an important step in this direction by showing that text indexing with locally consistent anchors (lc-anchors) offers remarkably good performance in all four measures, when we have at hand a lower bound l on the length of the queried patterns --- which is arguably a quite reasonable assumption in practical applications. Specifically, we improve on the construction of the index proposed by Loukides and Pissis, which is based on bidirectional string anchors (bd-anchors), a new type of lc-anchors, by: (i) designing an average-case linear-time algorithm to compute bd-anchors; and (ii) developing a semi-external-memory implementation to construct the index in small space using near-optimal work. We then present an extensive experimental evaluation, based on the four measures, using real benchmark datasets. The results show that, for long patterns, the index constructed using our improved algorithms compares favorably to all classic indexes: (compressed) suffix tree; (compressed) suffix array; and the FM-index.European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreements No 872539 and 956229, respectively; and by UKRI through REPHRAIN (EP/V011189/1)
Low Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
The longest common prefix (LCP) array is a versatile auxiliary data structure
in indexed string matching. It can be used to speed up searching using the
suffix array (SA) and provides an implicit representation of the topology of an
underlying suffix tree. The LCP array of a string of length can be
represented as an array of length words, or, in the presence of the SA, as
a bit vector of bits plus asymptotically negligible support data
structures. External memory construction algorithms for the LCP array have been
proposed, but those proposed so far have a space requirement of words
(i.e. bits) in external memory. This space requirement is in some
practical cases prohibitively expensive. We present an external memory
algorithm for constructing the bit version of the LCP array which uses
bits of additional space in external memory when given a
(compressed) BWT with alphabet size and a sampled inverse suffix array
at sampling rate . This is often a significant space gain in
practice where is usually much smaller than or even constant. We
also consider the case of computing succinct LCP arrays for circular strings
Certifying Solvers for Clique and Maximum Common (Connected) Subgraph Problems
An algorithm is said to be certifying if it outputs, together with a solution to the problem it solves, a proof that this solution is correct. We explain how state of the art maximum clique, maximum weighted clique, maximal clique enumeration and maximum common (connected) induced subgraph algorithms can be turned into certifying solvers by using pseudo-Boolean models and cutting planes proofs, and demonstrate that this approach can also handle reductions between problems. The generality of our results suggests that this method is ready for widespread adoption in solvers for combinatorial graph problems
Meteorological Navigation by Integrating Metocean Forecast Data and Ship Performance Models into an ECDIS-like e-Navigation Prototype Interface
In the complex processes of route planning, voyage monitoring, and post-voyage analysis, a key element is the capability of merging metocean forecast data with the available knowledge of ship responses in the encountered environmental conditions. In this context, a prototype system has been implemented capable of integrating metocean models forecasts with ship specific performance data and models. The work is based on the exploitation of an open source ECDIS-like system originally developed in the e-Navigation framework. The resulting prototype system allows the uploading and visualization of metocean data, the consequent computation of fuel consumption along each analyzed route, and the evaluation of the encountered meteo-marine conditions on each route way point. This allows us to "effectively and deeply dig inside" the various layers of available metocean forecast data regarding atmospheric and marine conditions and evaluating their effects on ship performance indicators. The system could also be used to trigger route optimization algorithms and subsequently evaluate the results. All these functionalities are tailored in order to facilitate the "what-if" analysis in the route selection process performed by deck officers. Many of the added functionalities can be utilized also in a shore-based fleet monitoring and management center. A description is given of the modeling and visualization approaches that have been implemented. Their potentialities are illustrated through the discussion of some examples in Mediterranean navigation
Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)
The Helmholtz Association funded the "Large-Scale Data Management and Analysis" portfolio theme from 2012-2016. Four Helmholtz centres, six universities and another research institution in Germany joined to enable data-intensive science by optimising data life cycles in selected scientific communities. In our Data Life cycle Labs, data experts performed joint R&D together with scientific communities. The Data Services Integration Team focused on generic solutions applied by several communities
- …