12 research outputs found

    Compressed self-indexed XML representation with efficient XPath evaluation

    Get PDF
    [Abstract] The popularity of the eXtensible Markup Language (XML) has been continuously growing since its first introduction, being today acknowledged as the de facto standard for semi-structured data representation and data exchange on the World Wide Web. In this scenario, several query languages were proposed to exploit the expressiveness of XML data, as well as systems to provide an eficient support. At the same time, as research in compression became more and more relevant, works also focused their efforts on studying new approaches to provide eficient solutions, using the minimum amount of space. Today, however, there is a lack of practical available tools that join both eficient query support, and minimum space requirements. In this thesis we address this problem, and propose a new approach for storing, processing and querying XML documents in time and space eficient way, by specially focusing on XPath queries. We have developed a new compressed selfindexed representation of XML documents that obtains compression ratios about 30%-40%, over which a query module providing eficient XPath query evaluation has also been developed. As a whole, both parts make up a complete system, we called XXS, for the eficient evaluation of XPath queries over compressed self-indexed XML documents. Experimental results show the outstanding performance of our proposal, which can successfully compete with some of the best-known solutions, and that largely outperforms them in terms of space.[Resumo] A popularidade do eXtensible Markup Language (XML) non fixo máis que medrar dende a súa introdución inicial, sendo recoñecido hoxe en día como o estándar de facto para a representación de datos semi-estruturados e o intercambio de datos na Rede. Baixo este escenario, son varias as linguaxes de consulta que se propuxeron para explotar a expresividade dos datos en formato XML, así como sistemas que proporcionasen un soporte eficiente a eles. Ó mesmo tempo, e conforme a investigación en compresión se fixo cada vez máis relevante, os esforzos tamén foron dirixidos a estudiar novas aproximacións que ofrecesen solucións eficientes, pero usando ademáis a menor cantidade de espacio posible. Actualmente, sen embargo, existe unha clara ausencia de ferramentas prácticas dispoñibles que agrupen ambas características: un soporte á realización de consultas eficiente, xunto con requisitos de espacio mínimos. Nesta tese abordamos ese problema, e propoñemos unha nova solución para o almacenamento, procesamento e consulta de documentos XML, eficiente tanto en tempo como en espacio, centrándonos, en particular, na linguaxe de consulta XPath. Así, desenvolvimos unha nova representación comprimida e auto-indexada de documentos XML, que obtén ratios de compresión en torno ó 30%-40%, e sobre a cal se creou tamén un módulo de consulta para a eficiente evaluación de consultas XPath. En conxunto, ambas contribucións conforman un sistema completo, que chamamos XXS, para a evaluación eficiente de consultas XPath sobre documentos XML comprimidos e auto-indexados. Os resultados experimentais amosan o destacado comportamento da nosa ferramenta, que é capaz de competir exitosamente con algunhas das solucións máis coñecidas, ás que ademáis supera claramente en termos de espacio.[Resumen] La popularidad del eXtensible Markup Language (XML) no ha hecho sino más que ir en aumento desde su introducción inicial, siendo hoy día reconocido como el estándar de facto para la representación de datos semi-estructurados, y el intercambio de datos en Internet. Bajo este escenario, son varios los lenguajes de consulta que se han venido proponiendo para explotar la expresividad de los datos en formato XML, así como sistemas que proporcionasen un soporte eficiente a ellos. Al mismo tiempo, y conforme la investigación en compresión se ha hecho cada vez más relevante, los esfuerzos se han dirigido también a estudiar nuevas aproximaciones que ofreciesen soluciones eficientes, pero usando además la menor cantidad de espacio posible. Actualmente, sin embargo, existe una clara ausencia de herramientas prácticas disponibles que aúnen ambas características: un soporte a la realización de consultas eficiente, con requisitos de espacio mínimos. En esta tesis abordamos ese problema, y proponemos una nueva solución para el almacenamiento, procesamiento y consulta de documentos XML, eficiente en tiempo y en espacio, centrándonos, en particular, en el lenguaje de consulta XPath. Así, hemos desarrollado una nueva representación comprimida y auto-indexada de documentos XML, que obtiene ratios de compresión del 30%-40%, y sobre la cual se ha creado un módulo de consulta para la eficiente evaluación de consultas XPath. En conjunto, ambas contribuciones conforman un sistema completo, que hemos dado en llamar XXS, para la evaluación eficiente de consultas XPath sobre documentos XML comprimidos y auto-indexados. Los resultados experimentales evidencian el destacado comportamiento de nuestra herramienta, que es capaz de competir exitosamente con algunas de las soluciones más conocidas, a las que además supera claramente en términos de espacio

    Revisiting compact RDF stores based on k2-trees

    Get PDF
    We present a new compact representation to efficiently store and query large RDF datasets in main memory. Our proposal, called BMatrix, is based on the k2-tree, a data structure devised to represent binary matrices in a compressed way, and aims at improving the results of previous state-of-the-art alternatives, especially in datasets with a relatively large number of predicates. We introduce our technique, together with some improvements on the basic k2-tree that can be applied to our solution in order to boost compression. Experimental results in the flagship RDF dataset DBPedia show that our proposal achieves better compression than existing alternatives, while yielding competitive query times, particularly in the most frequent triple patterns and in queries with unbound predicate, in which we outperform existing solutions.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

    An Architecture for Software Engineering Gamification

    Get PDF
    [Abstract] Gamification has been applied in software engineering to improve quality and results by increasing people's motivation and engagement. A systematic mapping has identified research gaps in the field, one of them being the difficulty of creating an integrated gamified environment comprising all the tools of an organization, since most existing gamified tools are custom developments or prototypes. In this paper, we propose a gamification software architecture that allows us to transform the work environment of a software organization into an integrated gamified environment, i.e., the organization can maintain its tools, and the rewards obtained by the users for their actions in different tools will mount up. We developed a gamification engine based on our proposal, and we carried out a case study in which we applied it in a real software development company. The case study shows that the gamification engine has allowed the company to create a gamified workplace by integrating custom-developed tools and off-the-shelf tools such as Redmine, TestLink, or JUnit, with the gamification engine. Two main advantages can be highlighted: (i) our solution allows the organization to maintain its current tools, and (ii) the rewards for actions in any tool accumulate in a centralized gamified environment

    Optimization in Sanger sequencing

    Get PDF
    © 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article Carpente, L., Cerdeira-Pena, A., Lorenzo-Freire, S., Places, Á.S., 2019. Optimization in Sanger sequencing. Computers & Operations Research 109, 250–262 has been accepted for publication in Computers & Operations Research. The Version of Record is available online at https://doi.org/10.1016/j.cor.2019.05.011[Abstract]: The main objective of this paper is to solve the optimization problem that is associated with the classification of DNA samples in PCR plates for Sanger sequencing. To achieve this goal, we design an integer linear programming model. Given that the real instances involve the classification of thousands of samples and the linear model can only be solved for small instances, the paper includes a heuristic to cope with bigger problems. The heuristic algorithm is based on the simulated annealing technique. This algorithm obtains satisfactory solutions to the problem in a short amount of time. It has been tested with real data and yields improved results compared to some commercial software typically used in (clinical) laboratories. Moreover, the algorithm has already been implemented in the laboratory and is being successfully used.This work has been supported by MINECO: MTM2014-53395-C3-1-P, MINECO: MTM2017-87197-C3-1-P, Xunta de Galicia/FEDER-UE ERDF: ED431C-2016-015, Xunta de Galicia/FEDER-UE ERDF: ED431G/01, FEDER-UE ESF, Xunta de Galicia Conecta Peme-2014: IN852A-2014/9, Xunta de Galicia/FEDER-UE CSI: ED431G/01, Xunta de Galicia/FEDER-UE GRC: ED431C 2017/58, MINECO-CDTI/FEDER-UE CIEN LPS-BIGGER: IDI-20141259, MINECO-CDTI/FEDER-UE INNTERCONECTA uForest: ITC-20161074, MINECO-AEI/FEDER-UE eDSalud: RTC-2016-5143-1, MINECO-AEI/FEDER-UE Datos 4.0: TIN2016-78011-C4-1-R and MINECO-AEI/FEDER-UE ETOME-RDFD3: TIN2015-69951-R.Xunta de Galicia; ED431C-2016-015Xunta de Galicia; ED431G/01Xunta de Galicia; ED431G/01Xunta de Galicia; ED431C 2017/58Xunta de Galicia; IN852A-2014/

    Space/time-efficient RDF stores based on circular suffix sorting

    Full text link
    In recent years, RDF has gained popularity as a format for the standardized publication and exchange of information in the Web of Data. In this paper we introduce RDFCSA, a data structure that is able to self-index an RDF dataset in small space and supports efficient querying. RDFCSA regards the triples of the RDF store as short circular strings and applies suffix sorting on those strings, so that triple-pattern queries reduce to prefix searching on the string set. The RDF store is then represented compactly using a Compressed Suffix Array (CSA), a proved technology in text indexing that efficiently supports prefix searches. Our experimental evaluation shows that RDFCSA is able to answer triple-pattern queries in a few microseconds per result while using less than 60% of the space required by the raw original data. We also support join queries, which provide the basis for full SPARQL query support. Even though smaller-space solutions exist, as well as faster ones, RDFCSA is shown to provide an excellent space/time tradeoff, with fast and consistent query times within much less space than alternatives that compete in time.Comment: This work has been submitted to the IEEE TKDE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Extending General Compact Querieable Representations to GIS Applications

    Get PDF
    The raster model is commonly used for the representation of images in many domains, and is especially useful in Geographic Information Systems (GIS) to store information about continuous variables of the space (elevation, temperature, etc.). Current representations of raster data are usually designed for external memory or, when stored in main memory, lack efficient query capabilities. In this paper we propose compact representations to efficiently store and query raster datasets in main memory. We present different representations for binary raster data, general raster data and time-evolving raster data. We experimentally compare our proposals with traditional storage mechanisms such as linear quadtrees or compressed GeoTIFF files. Results show that our structures are up to 10 times smaller than classical linear quadtrees, and even comparable in space to non-querieable representations of raster data, while efficiently answering a number of typical queries.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941

    XXS: Efficient XPath Evaluation on Compressed XML Documents

    No full text
    Artículo de publicación ISIThe eXtensible Markup Language (XML) is acknowledged as the de facto standard for semistructured data representation and data exchange on the Web and many other scenarios. A well-known shortcoming of XML is its verbosity, which increases manipulation, transmission, and processing costs. Various structure-blind and structure-conscious compression techniques can be applied to XML, and some are even access-friendly, meaning that the documents can be efficiently accessed in compressed form. Direct access is necessary to implement the query languages XPath and XQuery, which are the standard ones to exploit the expressiveness of XML. While a good deal of theoretical and practical proposals exist to solve XPath/XQuery operations on XML, only a few ones are well integrated with a compression format that supports the required access operations on the XML data. In this work we go one step further and design a compression format for XML collections that boosts the performance of XPath queries on the data. This is done by designing compressed representations of the XML data that support some complex operations apart from just accessing the data, and those are exploited to solve key components of the XPath queries. Our system, called XXS, is aimed at XML collections containing natural language text, which are compressed to within 35%–50% of their original size while supporting a large subset of XPath operations in time competitive with, and many times outperforming, the best state-of-the-art systems that work on uncompressed representations.Funded in part by MICINN grants (PGE and FEDER) TIN2009-14560-C03-02- and TIN2010-21246-C02-01, Xunta de Galicia grants (co-funded with FEDER) GRC2013/053 and CN 2012/211, and MINECO grants (co-funded with CDTI and GAIN) CDTI EXP 00064563 and ITC-20133062 (for the Spanish group); and by Fondecyt grants 1-080019 and 1-110066, Chile (G.N.)
    corecore