24 research outputs found

    TabbyXL2: Experiment Data

    No full text
    This dataset is designed to evaluate TabbyXL2, v1.0.1., a tool for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at GitHub (https://github.com/cellsrg/tabbyxl2/releases/tag/v1.0.1). Our source data are based on the existing dataset of tables called Troy_200 (http://tc11.cvc.uab.es/datasets/Troy_200_1) that contains 200 arbitrary tables as CSV files collected from 10 different government statistical websites. We use its earlier version that stores the original tables with style features (fonts, alignment, and indentation) as Excel spreadsheets (available at http://tango.byu.edu/data).The dataset contains the following material: 1. All of Troy_200 tables with style features put into a single spreadsheet file; 2. The ground-truth data we prepared for the automatic performance evaluation of TabbyXL2 in the role and structural stages of the table analysis; 3. CRL and CLP rulesets designed for transforming Troy_200 arbitrary tables into the relational form; 4. The log files with the results of the program running and with the results of the performance evaluation of TabbyXL2.The dataset provides all required data to reproduce the automatic performance evaluation of TabbyXL2, using three following options:1. TabbyXL2 automatically generates Java source code from CRL rules with our CRL interpreter and compile it to Java byte code, and then runs this generated program with JRE.2. TabbyXL2 automatically maps CRL rules to DRL ones with the DSL specification and runs the executing them with Drools Expert (https://www.drools.org) rule engine.3. TabbyXL2 runs the executing CLP ruleset corresponding to our CRL ruleset with JESS (http://www.jessrules.com) rule engine.The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). The experiment demonstrates that our tool, TabbyXL2, can be used for developing programs for the transformation of spreadsheet data into the relational form.README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

    TabbyXL: Dataset for the Performance Evaluation of a Software Platform for Rule-Based Spreadsheet Data Extraction and Transformation

    No full text
    This dataset is designed to evaluate TabbyXL (version 1.0.3), a software platform for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at https://github.com/tabbydoc/tabbyxl/releases/tag/v1.0.3. The dataset provides all required data to reproduce the performance evaluation including the program running and automatic performance evaluation of TabbyXL. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). This demonstrates that TabbyXL can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

    TabbyXL: Experiment Data

    No full text
    The data are designed to evaluate TabbyXL, a system for rule-based transformation spreadsheet data from arbitrary to relational tables that is freely available at GitHub (https://github.com/cellsrg/tabbyxl). Our data are based on the existing dataset of tables Troy_200 [1]. It contains 200 arbitrary tables as CSV files collected from 10 different government statistical websites. They were collected for the experiment on data extraction from tables that is presented in the paper [2]. We use its earlier version that stores the original tables with style features (fonts, alignment, and indentation) as Excel spreadsheets available at http://tango.byu.edu/data. We have put all of these tables with style features into the single spreadsheet file (data/TangoDataset.xlsx). Each of 200 tables is located in a separate sheet. The pair of tags STARTandSTART and END points out to its location inside the sheet. We initially used this file in our previous experiment described in the paper [3]. We have transformed automatically all tables of the single spreadsheet into the relational form, using TabbyXL and the ruleset (data/rules.dslr). The folder data/results contains the obtained results. The folder data/gt contains the ground-truth data for automated performance evaluation of TabbyXL in the role and structural stages of the table analysis. Each table of our data/results and data/gt dataset is accompanied with two recordsets: ENTRIES and LABELS. The first of them specifies entries. Each record presents an entry as a triple . In LABELS recordset each record presents a label as a triple . We also have stored the log files: results.log with the results of running and eval.log with the results of performance evaluation of TabbyXL. REFERENCES [1] Nagy G. TANGO-DocLab web tables from international statistical sites, (Troy_200), 1, ID: Troy_200_1. URL: http://tc11.cvc.uab.es/datasets/Troy_200_1. [2] Embley D., Krishnamoorthy M., Nagy G., & Seth S. (2016). Converting heterogeneous statistical tables on the web to searchable databases. Int. J. on Document Analysis and Recognition, 19(2), 119-138. URL: https://link.springer.com/article/10.1007/s10032-016-0259-1. [3] Shigarov A., Paramonov V., Belykh P., & Bondarev A. (2016) Rule-Based Canonicalization of Arbitrary Tables in Spreadsheets. Proc. 22nd Int. Conf. on Information and Software Technologies, pp. 78-91. URL: http://link.springer.com/chapter/10.1007/978-3-319-46254-7_7

    TabbyXL: Dataset for the Performance Evaluation of a Software Platform for Rule-Based Spreadsheet Data Extraction and Transformation

    No full text
    This dataset is designed to evaluate TabbyXL (version 1.0.4), a software platform for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at https://github.com/tabbydoc/tabbyxl/releases/tag/v1.0.4. The dataset provides all required data to reproduce the performance evaluation including the program running and automatic performance evaluation of TabbyXL. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). This demonstrates that TabbyXL can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

    TabbyXL: Dataset for the Performance Evaluation of a Software Platform for Rule-Based Spreadsheet Data Extraction and Transformation

    No full text
    This dataset is designed to evaluate TabbyXL (version 1.1.0), a software platform for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at [GitHub](https://github.com/tabbydoc/tabbyxl/releases/tag/v1.1.0). The dataset provides all required data to reproduce the performance evaluation including the program running and automatic performance evaluation of TabbyXL. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). This demonstrates that TabbyXL can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

    TabbyXL2: Experiment Data

    No full text
    This dataset is designed to evaluate TabbyXL2, a tool for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at GitHub (https://github.com/cellsrg/tabbyxl2). Our source data are based on the existing dataset of tables called Troy_200 (http://tc11.cvc.uab.es/datasets/Troy_200_1) that contains 200 arbitrary tables as CSV files collected from 10 different government statistical websites. We use its earlier version that stores the original tables with style features (fonts, alignment, and indentation) as Excel spreadsheets (available at http://tango.byu.edu/data). The dataset contains the following material: 1. All of Troy_200 tables with style features put into a single spreadsheet file; 2. The ground-truth data we prepared for the automatic performance evaluation of TabbyXL2 in the role and structural stages of the table analysis; 3. CRL and CLP rulesets designed for transforming Troy_200 arbitrary tables into the relational form; 4. The log files with the results of the program running and with the results of the performance evaluation of TabbyXL2. The dataset provides all required data to reproduce the automatic performance evaluation of TabbyXL2, using three following options: 1. TabbyXL2 automatically generates Java source code from CRL rules with our CRL interpreter and compile it to Java byte code, and then runs this generated program with JRE. 2. TabbyXL2 automatically maps CRL rules to DRL ones with the DSL specification, and runs the executing them with Drools Expert (https://www.drools.org) rule engine. 3. TabbyXL2 runs the executing CLP ruleset corresponding to our CRL ruleset with JESS (http://www.jessrules.com) rule engine. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). The experiment demonstrates that our tool, TabbyXL2, can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

    KEFT: Knowledge Extraction and Graph Building from Statistical Data Tables

    No full text
    International audienceData provided by statistical models are commonly represented by textual, tabular or graphical form in documents (reports, articles, posters and presentations). These documents are often available in PDF format. Even though it makes accessing a particular information more difficult, it is interesting to process the PDF documents directly. We present KEFT, a solution in the statistical domain and we describe the fully functional pipeline to constructing a knowledge graph by extracting entities and relations from statistical Data Tables. We showcase how this approach can be used to construct a knowledge graph from different statistical studies
    corecore