36 research outputs found

    Chase of datalog programs and its application to solve the functional dependencies implication problem

    Get PDF
    [Resumen]Esta tesis presenta resultados en dos áreas principales. Por un lado se presentan resultados en el área de optimización de consultas recursivas (programas datalog recursivos lineales) en sistemas de gestión de bases datos deductivas (o convencionales pero que cumplan las especificaciones de SQL99) y por otro se presentan resultados en la implicación de dependencias funcionales en el modelo de datos deductivo. Para la optimización de programas recursivos lineales la aproximación adoptada es la de la optimización semántica de consultas que consiste en la utilización de las restricciones, que cumplen las bases de datos sobre las que se ejecutan las consultas, para obtener un programa más eficiente de evaluar. En concreto, se presentan dos algoritmos para la optimización de programas de datolg recursivos lineales cuando la base de datos sobre la que se ejecutan las consultas cumple un conjunto de dependencias funcionales. El primero se denomina chase de programas datalog y el segundo se denomina cyclic chase de programas datalog. Ambos algortimos persiguen el mismo objetivo (pero siguiendo dos aproximaciones ligeramente distintas), esto es, a partir de un progrma datalog recursivo lineal P y un conjunto de dependencias funcionales F, los dos algoritmos obtienen un programa P' que es equivalente a P cuando ambos (P y P') son evaluados sobre bases de datos que cumplen las dependencias funcionales F. Los dos algoritmos se basan en la utilización del chase, un procedimiento que originalmente se desarrolló para comprobar si una descomposición (de una relación universal) en distintas relaciones tenía pérdida de información o no. Los dos algoritmos utilizan la idea básica del chase (la igualación de variables siguiendo las dependencias funcionales) para la igualación de variables dentro de los programas datalog

    Map algebra on raster datasets represented by compact data structures

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract]: The increase in the size of data repositories has forced the design of new computing paradigms to be able to process large volumes of data in a reasonable amount of time. One of them is in-memory computing, which advocates storing all the data in main memory to avoid the disk I/O bottleneck. Compression is one of the key technologies for this approach. For raster data, a compact data structure, called (Formula presented.) -raster, have been recently been proposed. It compresses raster maps while still supporting fast retrieval of a given datum or a portion of the data directly from the compressed data. (Formula presented.) -raster's original work introduced several queries in which it was superior to competitors. However, to be used as the basis of an in-memory system for raster data, it is mandatory to demonstrate its efficiency when performing more complex operations such as the map algebra operators. In this work, we present the algorithms to run a set of these operators directly on (Formula presented.) -raster without a decompression procedure.This work was supported by the National Natural Science Foundation of China (Grant Nos. 31171944, 31640068), Anhui Provincial Natural Science Foundation (Grant No. 2019B319), Earmarked Fund for Anhui Science and Technology Major Project (202003b06020016). Information CITIC, Ministerio de Ciencia e Innovación, Grant/Award Numbers: PID2020-114635RB-I00; PDC2021-120917-C21; PDC2021-121239-C31; PID2019-105221RB-C41; TED2021-129245-C21; Xunta de Galicia, Grant/Award Numbers: ED431C 2021/53; IN852D 2021/3 (CO3)This work was partially supported by CITIC, CITIC is funded by the Xunta de Galicia through the collaboration agreement between the Department of Culture, Education, Vocational Training and Universities and the Galician universities for the reinforcement of the research centers of the Galician University System (CIGUS). IN852D 2021/3(CO3): partially funded by UE, (ERDF), GAIN, convocatoria Conecta COVID. GRC: ED431C 2021/53: partially funded by GAIN/Xunta de Galicia. TED2021-129245B-C21; PDC2021-121239-C31; PDC2021-120917-C21: partially funded by MCIN/AEI/10.13039/501100011033 and “NextGenerationEU”/PRTR. PID2020-114635RB-I00; PID2019-105221RB-C41: partially funded by MCIN/AEI/10.13039/501100011033. Funding for open access charge: Universidadeda Coruña/CISUG.Xunta de Galicia; ED431C 2021/53Xunta de Galicia; IN852D 2021/3 (CO3)National Natural Science Foundation of China; 31171944National Natural Science Foundation of China; 31640068Anhui Provincial Natural Science Foundation; 2019B31

    Scalable processing and autocovariance computation of big functional data

    Get PDF
    This is the peer reviewed version of the following article: Brisaboa NR, Cao R, Paramá JR, Silva-Coira F. Scalable processing and autocovariance computation of big functional data. Softw Pract Exper. 2018; 48: 123–140 which has been published in final form at https://doi.org/10.1002/spe.2524 . This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. This article may not be enhanced, enriched or otherwise transformed into a derivative work, without express permission from Wiley or by statutory rights under applicable legislation. Copyright notices must not be removed, obscured or modified. The article must be linked to Wiley’s version of record on Wiley Online Library and any embedding, framing or otherwise making available the article or pages thereof by third parties from platforms, services and websites other than Wiley Online Library must be prohibited.[Abstract]: This paper presents 2 main contributions. The first is a compact representation of huge sets of functional data or trajectories of continuous-time stochastic processes, which allows keeping the data always compressed even during the processing in main memory. It is oriented to facilitate the efficient computation of the sample autocovariance function without a previous decompression of the data set, by using only partial local decoding. The second contribution is a new memory-efficient algorithm to compute the sample autocovariance function. The combination of the compact representation and the new memory-efficient algorithm obtained in our experiments the following benefits. The compressed data occupy in the disk 75% of the space needed by the original data. The computation of the autocovariance function used up to 13 times less main memory, and run 65% faster than the classical method implemented, for example, in the R package.This work was supported by the Ministerio de Economía y Competitividad (PGE and FEDER) under grants [TIN2016-78011-C4-1-R; MTM2014-52876-R; TIN2013-46238-C4-3-R], Centro para el desarrollo Tecnológico e Industrial MINECO [IDI-20141259; ITC-20151247; ITC-20151305; ITC-20161074]; Xunta de Galicia (cofounded with FEDER) under Grupos de Referencia Competitiva grant ED431C-2016-015; Xunta de Galicia-Consellería de Cultura, Educación e Ordenación Universitaria (cofounded with FEDER) under Redes grants R2014/041, ED341D R2016/045; Xunta de Galicia-Consellería de Cultura, Educación e Ordenación Universitaria (cofounded with FEDER) under Centro Singular de Investigación de Galicia grant ED431G/01.Xunta de Galicia; D431C-2016-015Xunta de Galicia; R2014/041Xunta de Galicia; ED341D R2016/045Xunta de Galicia; ED431G/0

    An index for moving objects with constant-time access to their compressed trajectories

    Get PDF
    This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of Geographical Information Science in 2021, available at: https://doi.org/10.1080/13658816.2020.1833015Versión final aceptada de: Nieves R. Brisaboa, Travis Gagie, Adrián Gómez-Brandón, Gonzalo Navarro & José R. Paramá (2021) An index for moving objects with constant-time access to their compressed trajectories, International Journal of Geographical Information Science, 35:7, 1392-1424, DOI: 10.1080/13658816.2020.1833015[Abstract]: As the number of vehicles and devices equipped with GPS technology has grown explosively, an urgent need has arisen for time- and space-efficient data structures to represent their trajectories. The most commonly desired queries are the following: queries about an object’s trajectory, range queries, and nearest neighbor queries. In this paper, we consider that the objects can move freely and we present a new compressed data structure for storing their trajectories, based on a combination of logs and snapshots, with the logs storing sequences of the objects’ relative movements and the snapshots storing their absolute positions sampled at regular time intervals. We call our data structure ContaCT because it provides Constant- time access to Compressed Trajectories. Its logs are based on a compact partial-sums data structure that returns cumulative displacement in constant time, and allows us to compute in constant time any object’s position at any instant, enabling a speedup when processing several other queries. We have compared ContaCT experimentally with another compact data structure for trajectories, called GraCT, and with a classic spatio-temporal index, the MVR-tree. Our results show that ContaCT outperforms the MVR-tree by orders of magnitude in space and also outperforms the compressed representation in time performance.This work was supported by Xunta de Galicia/FEDER-UE under Grants [IN848D-2017-2350417; IN852A 2018/14; ED431C 2017/58]; Xunta de Galicia and European Union (European Regional Development Fund- Galicia 2014-2020 Program) with the support of CITIC research center under Grant [ED431G 2019/01]; Ministerio de Ciencia, Innovación y Universidades under Grants [TIN2016-78011-C4-1-R; RTC-2017-5908-7]; A.G. was supported by Ministerio de Educación y Formación Profesional (FPU) [grant number FPU16/02914]; G.N. was supported by ANID - Millennium Science Initiative Program under Grant [ICN17_002]; and Fondecyt under Grant [1-200038]. T.G. was supported by NSERC under grant [RGPIN-2020-07185].Xunta de Galicia; IN848D-2017-2350417Xunta de Galicia; IN852A 2018/14Xunta de Galicia; ED431C 2017/58Xunta de Galicia; ED431G 2019/0

    Lossless Compression of Industrial Time Series With Direct Access

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] The new opportunities generated by the data-driven economy in the manufacturing industry have causedmany companies opt for it. However, the size of time series data that need to be captured creates theproblem of having to assume high storage costs. Moreover, these costs, which are constantly growing,begin to have an impact on the profitability of companies. Thus, in this scenario, the need arises to developtechniques that allow obtaining reduced representations of the time series. In this paper, we present alossless compression method for industrial time series that allows an efficient access. That is, our aim goesbeyond pure compression, where the usual way to access the data requires a complete decompressionof the dataset before processing it. Instead, our method allows decompressing portions of the dataset,and moreover, it allows direct querying the compressed data. Thus, the proposed method combines theefficient access, typical of lossy methods, with the lossless compression.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; IG240. 2020.1.185Xunta de Galicia; IN852A 2018/14Gobierno Vasco; IT1330-19For the A Coruña team: This work was supported by CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014-2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01), Xunta de Galicia/FEDER-UE under Grants [IG240.2020.1.185; IN852A 2018/14] and Ministerio de Ciencia, Innovación under Grants [TIN2016-78011-C4-1-R; RTC-2017-5908-7]. For the Basque team: Ministerio de Ciencia, Innovación y Universidades under Grant [FEDER/TIN2016-78011-C4-2-R] and the Basque Government under Grant No. [IT1330-19]. Funding for open access charge: Universidade da Coruña/CISUG

    Space-Efficient Representations of Raster Time Series

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Raster time series, a.k.a. temporal rasters, are collections of rasters covering the same region at consecutive timestamps. These data have been used in many different applications ranging from weather forecast systems to monitoring of forest degradation or soil contamination. Many different sensors are generating this type of data, which makes such analyses possible, but also challenges the technological capacity to store and retrieve the data. In this work, we propose a space-efficient representation of raster time series that is based on Compact Data Structures (CDS). Our method uses a strategy of snapshots and logs to represent the data, in which both components are represented using CDS. We study two variants of this strategy, one with regular sampling and another one based on a heuristic that determines at which timestamps should the snapshots be created to reduce the space redundancy. We perform a comprehensive experimental evaluation using real datasets. The results show that the proposed strategy is competitive in space with alternatives based on pure data compression, while providing much more efficient query times for different types of queries.The data used in this study were acquired as part of the mission of NASA’s Earth Science Division and archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC). Funding: CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia”, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014-2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2019/01). This work was also supported by Xunta de Galicia/FEDER-UE under Grants [IG240.2020.1.185; IN852A 2018/14]; Ministerio de Ciencia, Innovación y Universidades under Grants [TIN2016-78011-C4-1-R; RTC-2017-5908-7; PID2019- 105221RB-C41/AEI/10.13039/501100011033]; ANID - Millennium Science Initiative Program - Code ICN17_002; Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo (CYTED) [Grant No. 519RT0579]Xunta de Galicia; ED431G 2019/01Xunta de Galicia; IG240.2020.1.185Xunta de Galicia; IN852A 2018/14Chile. Agencia Nacional de Investigación y Desarrollo; ICN17_00

    Compact and indexed representation for LiDAR point clouds

    Get PDF
    [Abstract]: LiDAR devices are capable of acquiring clouds of 3D points reflecting any object around them, and adding additional attributes to each point such as color, position, time, etc. LiDAR datasets are usually large, and compressed data formats (e.g. LAZ) have been proposed over the years. These formats are capable of transparently decompressing portions of the data, but they are not focused on solving general queries over the data. In contrast to that traditional approach, a new recent research line focuses on designing data structures that combine compression and indexation, allowing directly querying the compressed data. Compression is used to fit the data structure in main memory all the time, thus getting rid of disk accesses, and indexation is used to query the compressed data as fast as querying the uncompressed data. In this paper, we present the first data structure capable of losslessly compressing point clouds that have attributes and jointly indexing all three dimensions of space and attribute values. Our method is able to run range queries and attribute queries up to 100 times faster than previous methods.Secretara Xeral de Universidades; [ED431G 2019/01]Ministerio de Ciencia e Innovacion; [PID2020-114635RB-I00]Ministerio de Ciencia e Innovacion; [PDC2021-120917C21]Ministerio de Ciencia e Innovación; [PDC2021-121239-C31]Ministerio de Ciencia e Innovación; [PID2019-105221RB-C41]Xunta de Galicia; [ED431C 2021/53]Xunta de Galicia; [IG240.2020.1.185

    Stronger compact representations of object trajectories

    Get PDF
    [Absctract]: GraCT and ContaCT were the first compressed data structures to represent object trajectories, demonstrating that it was possible to use orders of magnitude less space than classical indexes while staying competitive in query times. In this paper we considerably enhance their space, query capabilities, and time performance with three contributions. (1) We design and evaluate algorithms for more sophisticated nearest neighbor queries, finding the trajectories closest to a given trajectory or to a given point during a time interval. (2) We modify the data structure used to sample the spatial positions of the objects along time. This improves the performance on the classic spatio-temporal and the nearest neighbor queries, by orders of magnitude in some cases. (3) We introduce RelaCT, a tradeoff between the faster and larger ContaCT and the smaller and slower GraCT, offering a new relevant space-time tradeoff for large repetitive datasets of trajectories.For the A Coruña team: This work was supported by GAIN/Xunta de Galicia: GRC: grants ED431C 2021/53, and CIGUS 2023-2026; Ministerio de Ciencia e Innovación and EU/ERDF A way of making Europe under grant [PID2022-141027NB-C21];Ministerio de Ciencia e Innovación under grant [PID2020-114635RB-I00]; Ministerio de Ciencia e Innovación and Next-GenerationEU/PRTR under grants [TED2021-129245B-C21; PDC2021-120917-C21]; Gonzalo Navarro was funded by ANID – Millennium Science Initiative Program – Code ICN17_002 and by Fondecyt under grants [1-200038; 1-230755]. Travis Gagie was funded by Fondecyt under grant [1171058] and by NSERC Discovery under grant [RGPIN-07185-2020].Xunta de Galicia; ED431C 2021/53Xunta de Galicia; CIGUS 2023-2026Chile. National Agency of Research and Development (ANID); ICN17_002Chile. Fondo Nacional de Desarrollo Científico y Tecnológico (Fondecyt); 1-200038Chile. Fondo Nacional de Desarrollo Científico y Tecnológico (Fondecyt); 1-230755Chile. Fondo Nacional de Desarrollo Científico y Tecnológico (Fondecyt); 1171058Canada. Natural Science and Engineering Research Council (NSERC); RGPIN-07185-202

    Efficient Processing of Raster and Vector Data

    Get PDF
    [Abstract] In this work, we propose a framework to store and manage spatial data, which includes new efficient algorithms to perform operations accepting as input a raster dataset and a vector dataset. More concretely, we present algorithms for solving a spatial join between a raster and a vector dataset imposing a restriction on the values of the cells of the raster; and an algorithm for retrieving K objects of a vector dataset that overlap cells of a raster dataset, such that the K objects are those overlapping the highest (or lowest) cell values among all objects. The raster data is stored using a compact data structure, which can directly manipulate compressed data without the need for prior decompression. This leads to better running times and lower memory consumption. In our experimental evaluation comparing our solution to other baselines, we obtain the best space/time trade-offs.This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 690941; from the Ministerio de Ciencia, Innovación y Universidades (PGE and ERDF) grant numbers TIN2016-78011-C4-1-R; TIN2016-77158 C4-3-R; RTC-2017-5908-7; from Xunta de Galicia (co-founded with ERDF) grant numbers ED431C 2017/58; ED431G/01; IN852A 2018/14; and University of Bío-Bío grant numbers 192119 2/R; 195119 GI/VCXunta de Galicia; ED431C 2017/58Xunta de Galicia; ED431G/01Xunta de Galicia; IN852A 2018/14Universidad del Bío-Bío (Chile); 192119 2/RUniversidad del Bío-Bío (Chile); 195119 GI/V
    corecore