16 research outputs found

    Matching Statistics Speed up BWT Construction

    Get PDF

    Suffix Sorting via Matching Statistics

    Get PDF
    Funding Information: Academy of Finland grants 339070 and 351150 Publisher Copyright: © Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi.We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.Peer reviewe

    Multivariate sensor signals collected by aquatic drones involved in water monitoring: A complete dataset

    Get PDF
    Sensor data generated by intelligent systems, such as autonomous robots, smart buildings and other systems based on artificial intelligence, represent valuable sources of knowledge in today\u2019s data-driven society, since they contain information about the situations these systems face during their operation. These data are usually multivariate time series since modern technologies enable the simultaneous acquisition of multiple signals during long periods of time. In this paper we present a dataset containing sensor traces of six data acquisition campaigns performed by autonomous aquatic drones involved in water monitoring. A total of 5.6 hours of navigation are available, with data coming from both lakes and rivers, and from different locations in Italy and Spain. The monitored variables concern both the internal state of the drone (e.g., battery voltage, GPS position and signals to propellers) and the state of the water (e.g., temperature, dissolved oxygen and electrical conductivity). Data were collected in the context of the EU-funded Horizon 2020 project INTCATCH (http://www.intcatch.eu) which aims to develop a new paradigm for monitoring water quality of catchments. The aquatic drones used for data acquisition are Platypus Lutra boats. Both autonomous and manual drive is used in different parts of the navigation. The dataset is analyzed in the paper \u201cTime series segmentation for state-model generation of autonomous aquatic drones: A systematic framework\u201d [1] by means of recent time series clustering/segmentation techniques to extract data-driven models of the situations faced by the drones in the data acquisition campaigns. These data have strong potential for reuse in other kinds of data analysis and evaluation of machine learning methods on real-world datasets [2]. Moreover, we consider this dataset valuable also for the variety of situations faced by the drone, from which machine learning techniques can learn behavioural patterns or detect anomalous activities. We also provide manual labeling for some known states of the drones, such as, drone inside/outside the water, upstream/downstream navigation, manual/autonomous drive, and drone turning, that represent a ground truth for validation purposes. Finally, the real-world nature of the dataset makes it more challenging for machine learning methods because it contains noisy samples collected while the drone was exposed to atmospheric agents and uncertain water flow conditions

    Subspace clustering for situation assessment in aquatic drones

    Get PDF
    We propose a novel methodology based on subspace clustering for detecting, modeling and interpreting aquatic drone states in the context of autonomous water monitoring. It enables both more informative and focused analysis of the large amounts of data collected by the drone, and enhanced situation awareness, which can be exploited by operators and drones to improve decision making and autonomy. The approach is completely data-driven and unsupervised. It takes unlabeled sensor traces from several water monitoring missions and returns both a set of sparse drone state models and a clustering of data samples according to these models. We tested the methodology on a real dataset containing data of six different missions, two rivers and four lakes in different countries, for about 5.5 hours of navigation. Results show that the methodology is able to recognize known states “in/out of the water”, “up- stream/downstream navigation” and “manual/autonomous drive”, and to discover meaningful unknown states from their data-based properties, enabling novelty detection

    Suffix Sorting via Matching Statistics

    Get PDF
    Funding Information: Academy of Finland grants 339070 and 351150 Publisher Copyright: © Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi.We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.Peer reviewe

    Constant Time and Space Updates for the Sigma-Tau Problem

    No full text
    Sawada and Williams in [SODA 2018] and [ACM Trans. Alg. 2020] gave algorithms for constructing Hamiltonian paths and cycles in the Sigma-Tau graph, thereby solving a problem of Nijenhuis and Wilf that had been open for over 40 years. The Sigma-Tau graph is the di- rected graph whose vertex set consists of all permutations of n, and there is a directed edge from π to π′ if π′ can be obtained from π either by a cyclic left-shift (sigma) or by exchanging the first two entries (tau). We improve the existing algorithms from O(n) time per permutation to O(1) time per permutation. Moreover, our algorithms require only O(1) extra space. The result is the first combinatorial generation algorithm for n-permutations that is optimal in both time and space, and which lists the objects in a Gray code order using only two types of changes

    When a dollar makes a BWT

    No full text
    The Burrows-Wheeler-Transform (BWT) is a reversible string transformation which plays a central role in text compression and is fundamental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every string is a BWT image, and exact characterizations of BWT images are known. We investigate a related combinatorial question. In many applications, a sentinel character isaddedtomarktheendofthestring,andthustheBWTofastringendingwith is added to mark the end of the string, and thus the BWT of a string ending with contains exactly one character.Givenastringw,weaskinwhichpositions,ifany,the-character. Given a string w, we ask in which positions, if any, the -character can be inserted to turn w into the BWT image of a word ending with $. We show that this depends only on the standard permutation of w and present a O(nlogn)-time algorithm for identifying all such positions, improving on the naive quadratic time algorithm. We also give a combinatorial characterization of such positions and develop bounds on their number and value. This is an extended version of [Giuliani et al. ICTCS 2019]

    Suffix sorting via matching statistics

    No full text
    We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest

    Time series segmentation for state-model generation of autonomous aquatic drones: A systematic framework

    No full text
    Autonomous surface vessels are becoming increasingly important for water monitoring. Their aim is to navigate rivers and lakes with limited intervention of human operators, to collect real-time data about water parameters. To reach this goal, these intelligent systems must interact with the environment and act according to the situations they face. In this work we propose a framework based on the integration of recent time-series clustering/segmentation methods and cluster validity indices, for detecting, modeling and evaluating aquatic drone states. The approach is completely data-driven and unsupervised. It takes unlabeled multivariate time series of sensor traces and returns both a set of statistically significant state-models (generated by different mathematical approaches) and a related segmentation of the dataset. We test the approach on a real dataset containing data of six campaigns, two in rivers and four in lakes, in different countries for about 5.6 h of navigation. Results show that the methodology is able to recognize known states and to discover unknown states, enabling novelty detection. The approach is therefore an easy-to-use tool for discovering and interpreting significant states in sensor data, that enables improved data analysis and drone autonomy
    corecore