Search CORE

16 research outputs found

Matching Statistics Speed up BWT Construction

Author: Masillo Francesco
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Suffix Sorting via Matching Statistics

Author: Lipták Zsuzsanna
Masillo Francesco
Puglisi Simon J.
Publication venue: Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Publication date: 01/01/2022
Field of study

Funding Information: Academy of Finland grants 339070 and 351150 Publisher Copyright: © Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi.We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.Peer reviewe

Dagstuhl Research Online Publication Server

Catalogo dei prodotti della ricerca

Helsingin yliopiston digitaalinen arkisto

Multivariate sensor signals collected by aquatic drones involved in water monitoring: A complete dataset

Author: BLOISI Domenico Daniele
BLUM Jason Joseph
CASTELLINI ALBERTO
FARINELLI Alessandro
MASILLO FRANCESCO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Sensor data generated by intelligent systems, such as autonomous robots, smart buildings and other systems based on artificial intelligence, represent valuable sources of knowledge in today\u2019s data-driven society, since they contain information about the situations these systems face during their operation. These data are usually multivariate time series since modern technologies enable the simultaneous acquisition of multiple signals during long periods of time. In this paper we present a dataset containing sensor traces of six data acquisition campaigns performed by autonomous aquatic drones involved in water monitoring. A total of 5.6 hours of navigation are available, with data coming from both lakes and rivers, and from different locations in Italy and Spain. The monitored variables concern both the internal state of the drone (e.g., battery voltage, GPS position and signals to propellers) and the state of the water (e.g., temperature, dissolved oxygen and electrical conductivity). Data were collected in the context of the EU-funded Horizon 2020 project INTCATCH (http://www.intcatch.eu) which aims to develop a new paradigm for monitoring water quality of catchments. The aquatic drones used for data acquisition are Platypus Lutra boats. Both autonomous and manual drive is used in different parts of the navigation. The dataset is analyzed in the paper \u201cTime series segmentation for state-model generation of autonomous aquatic drones: A systematic framework\u201d [1] by means of recent time series clustering/segmentation techniques to extract data-driven models of the situations faced by the drones in the data acquisition campaigns. These data have strong potential for reuse in other kinds of data analysis and evaluation of machine learning methods on real-world datasets [2]. Moreover, we consider this dataset valuable also for the variety of situations faced by the drone, from which machine learning techniques can learn behavioural patterns or detect anomalous activities. We also provide manual labeling for some known states of the drones, such as, drone inside/outside the water, upstream/downstream navigation, manual/autonomous drive, and drone turning, that represent a ground truth for validation purposes. Finally, the real-world nature of the dataset makes it more challenging for machine learning methods because it contains noisy samples collected while the drone was exposed to atmospheric agents and uncertain water flow conditions

Catalogo dei prodotti della ricerca

Subspace clustering for situation assessment in aquatic drones

Author: Alberto Castellini
Alessandro Farinelli
Domenico Bloisi
Francesco Masillo
Jason Blum
Manuele Bicego
Sergio Peigner
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

We propose a novel methodology based on subspace clustering for detecting, modeling and interpreting aquatic drone states in the context of autonomous water monitoring. It enables both more informative and focused analysis of the large amounts of data collected by the drone, and enhanced situation awareness, which can be exploited by operators and drones to improve decision making and autonomy. The approach is completely data-driven and unsupervised. It takes unlabeled sensor traces from several water monitoring missions and returns both a set of sparse drone state models and a clustering of data samples according to these models. We tested the methodology on a real dataset containing data of six different missions, two rivers and four lakes in different countries, for about 5.5 hours of navigation. Results show that the methodology is able to recognize known states “in/out of the water”, “up- stream/downstream navigation” and “manual/autonomous drive”, and to discover meaningful unknown states from their data-based properties, enabling novelty detection

Crossref

Catalogo dei prodotti della ricerca

When a Dollar in a Fully Clustered Word Makes a BWT

Author: Francesco Masillo
Sara Giuliani
Zsuzsanna Liptak
Publication venue
Publication date: 01/01/2022
Field of study

Catalogo dei prodotti della ricerca

Suffix Sorting via Matching Statistics

Author: Lipták Zsuzsanna
Masillo Francesco
Puglisi Simon J.
Publication venue: Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Publication date: 01/01/2022
Field of study

Helsingin yliopiston digitaalinen arkisto

Constant Time and Space Updates for the Sigma-Tau Problem

Author: Aaron Williams
Francesco Masillo
Gonzalo Navarro
Zsuzsanna Liptak
Publication venue: Springer
Publication date: 01/01/2023
Field of study

Sawada and Williams in [SODA 2018] and [ACM Trans. Alg. 2020] gave algorithms for constructing Hamiltonian paths and cycles in the Sigma-Tau graph, thereby solving a problem of Nijenhuis and Wilf that had been open for over 40 years. The Sigma-Tau graph is the di- rected graph whose vertex set consists of all permutations of n, and there is a directed edge from π to π′ if π′ can be obtained from π either by a cyclic left-shift (sigma) or by exchanging the first two entries (tau). We improve the existing algorithms from O(n) time per permutation to O(1) time per permutation. Moreover, our algorithms require only O(1) extra space. The result is the first combinatorial generation algorithm for n-permutations that is optimal in both time and space, and which lists the objects in a Gray code order using only two types of changes

Catalogo dei prodotti della ricerca

When a dollar makes a BWT

Author: Giuliani Sara
Lipták Zsuzsanna
Masillo Francesco
Rizzi Romeo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

The Burrows-Wheeler-Transform (BWT) is a reversible string transformation which plays a central role in text compression and is fundamental in many modern bioinformatics applications. The BWT is a permutation of the characters, which is in general better compressible and allows to answer several different query types more efficiently than the original string. It is easy to see that not every string is a BWT image, and exact characterizations of BWT images are known. We investigate a related combinatorial question. In many applications, a sentinel character

is added to mark the end of the string, and thus the BWT of a string ending with

contains exactly one

-character. Given a string w, we ask in which positions, if any, the

-character can be inserted to turn w into the BWT image of a word ending with $. We show that this depends only on the standard permutation of w and present a O(nlogn)-time algorithm for identifying all such positions, improving on the naive quadratic time algorithm. We also give a combinatorial characterization of such positions and develop bounds on their number and value. This is an extended version of [Giuliani et al. ICTCS 2019]

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Suffix sorting via matching statistics

Author: Francesco Masillo
Simon J. Puglisi
Zsuzsanna Liptak
Publication venue
Publication date
Field of study

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest

Catalogo dei prodotti della ricerca

Time series segmentation for state-model generation of autonomous aquatic drones: A systematic framework

Author: Bicego Manuele
Castellini Alberto
Farinelli Alessandro
Masillo Francesco
Zuccotto Maddalena
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Autonomous surface vessels are becoming increasingly important for water monitoring. Their aim is to navigate rivers and lakes with limited intervention of human operators, to collect real-time data about water parameters. To reach this goal, these intelligent systems must interact with the environment and act according to the situations they face. In this work we propose a framework based on the integration of recent time-series clustering/segmentation methods and cluster validity indices, for detecting, modeling and evaluating aquatic drone states. The approach is completely data-driven and unsupervised. It takes unlabeled multivariate time series of sensor traces and returns both a set of statistically significant state-models (generated by different mathematical approaches) and a related segmentation of the dataset. We test the approach on a real dataset containing data of six campaigns, two in rivers and four in lakes, in different countries for about 5.6 h of navigation. Results show that the methodology is able to recognize known states and to discover unknown states, enabling novelty detection. The approach is therefore an easy-to-use tool for discovering and interpreting significant states in sensor data, that enables improved data analysis and drone autonomy

Catalogo dei prodotti della ricerca