Search CORE

24 research outputs found

German-Russian astroparticle data life cycle initiative

Author: Bychkov I.
Dubenskaya J.
Fedorov O.
Haungs A.
Heiss A.
Kang D.
Kazarina Y.
Korosteleva E.
Kostunin D.
Kryukov A.
Mikhailov A.
Nguyen M.-D.
Polgart F.
Polyakov S.
Postnikov E.
Shigarov A.
Shipilov D.
Streit A.
Tokareva V.
Wochele D.
Wochele J.
Zhurov D.
Publication venue
Publication date: 17/07/2020
Field of study

TabbyXL2: Experiment Data

Author: Shigarov A
Publication venue: Data Archiving and Networked Services (DANS)
Publication date
Field of study

This dataset is designed to evaluate TabbyXL2, v1.0.1., a tool for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at GitHub (https://github.com/cellsrg/tabbyxl2/releases/tag/v1.0.1). Our source data are based on the existing dataset of tables called Troy_200 (http://tc11.cvc.uab.es/datasets/Troy_200_1) that contains 200 arbitrary tables as CSV files collected from 10 different government statistical websites. We use its earlier version that stores the original tables with style features (fonts, alignment, and indentation) as Excel spreadsheets (available at http://tango.byu.edu/data).The dataset contains the following material: 1. All of Troy_200 tables with style features put into a single spreadsheet file; 2. The ground-truth data we prepared for the automatic performance evaluation of TabbyXL2 in the role and structural stages of the table analysis; 3. CRL and CLP rulesets designed for transforming Troy_200 arbitrary tables into the relational form; 4. The log files with the results of the program running and with the results of the performance evaluation of TabbyXL2.The dataset provides all required data to reproduce the automatic performance evaluation of TabbyXL2, using three following options:1. TabbyXL2 automatically generates Java source code from CRL rules with our CRL interpreter and compile it to Java byte code, and then runs this generated program with JRE.2. TabbyXL2 automatically maps CRL rules to DRL ones with the DSL specification and runs the executing them with Drools Expert (https://www.drools.org) rule engine.3. TabbyXL2 runs the executing CLP ruleset corresponding to our CRL ruleset with JESS (http://www.jessrules.com) rule engine.The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). The experiment demonstrates that our tool, TabbyXL2, can be used for developing programs for the transformation of spreadsheet data into the relational form.README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

TabbyXL: Dataset for the Performance Evaluation of a Software Platform for Rule-Based Spreadsheet Data Extraction and Transformation

Author: Shigarov A (via Mendeley Data)
Publication venue
Publication date: 14/12/2018
Field of study

This dataset is designed to evaluate TabbyXL (version 1.0.3), a software platform for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at https://github.com/tabbydoc/tabbyxl/releases/tag/v1.0.3. The dataset provides all required data to reproduce the performance evaluation including the program running and automatic performance evaluation of TabbyXL. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). This demonstrates that TabbyXL can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

Electronic Archiving System

TabbyXL: Experiment Data

Author: Shigarov A (via Mendeley Data)
Publication venue
Publication date: 25/06/2017
Field of study

The data are designed to evaluate TabbyXL, a system for rule-based transformation spreadsheet data from arbitrary to relational tables that is freely available at GitHub (https://github.com/cellsrg/tabbyxl). Our data are based on the existing dataset of tables Troy_200 [1]. It contains 200 arbitrary tables as CSV files collected from 10 different government statistical websites. They were collected for the experiment on data extraction from tables that is presented in the paper [2]. We use its earlier version that stores the original tables with style features (fonts, alignment, and indentation) as Excel spreadsheets available at http://tango.byu.edu/data. We have put all of these tables with style features into the single spreadsheet file (data/TangoDataset.xlsx). Each of 200 tables is located in a separate sheet. The pair of tags

START and

END points out to its location inside the sheet. We initially used this file in our previous experiment described in the paper [3]. We have transformed automatically all tables of the single spreadsheet into the relational form, using TabbyXL and the ruleset (data/rules.dslr). The folder data/results contains the obtained results. The folder data/gt contains the ground-truth data for automated performance evaluation of TabbyXL in the role and structural stages of the table analysis. Each table of our data/results and data/gt dataset is accompanied with two recordsets: ENTRIES and LABELS. The first of them specifies entries. Each record presents an entry as a triple . In LABELS recordset each record presents a label as a triple . We also have stored the log files: results.log with the results of running and eval.log with the results of performance evaluation of TabbyXL. REFERENCES [1] Nagy G. TANGO-DocLab web tables from international statistical sites, (Troy_200), 1, ID: Troy_200_1. URL: http://tc11.cvc.uab.es/datasets/Troy_200_1. [2] Embley D., Krishnamoorthy M., Nagy G., & Seth S. (2016). Converting heterogeneous statistical tables on the web to searchable databases. Int. J. on Document Analysis and Recognition, 19(2), 119-138. URL: https://link.springer.com/article/10.1007/s10032-016-0259-1. [3] Shigarov A., Paramonov V., Belykh P., & Bondarev A. (2016) Rule-Based Canonicalization of Arbitrary Tables in Spreadsheets. Proc. 22nd Int. Conf. on Information and Software Technologies, pp. 78-91. URL: http://link.springer.com/chapter/10.1007/978-3-319-46254-7_7

Electronic Archiving System

TabbyXL: Dataset for the Performance Evaluation of a Software Platform for Rule-Based Spreadsheet Data Extraction and Transformation

Author: Shigarov A (via Mendeley Data)
Publication venue
Publication date: 16/12/2019
Field of study

This dataset is designed to evaluate TabbyXL (version 1.1.0), a software platform for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at [GitHub](https://github.com/tabbydoc/tabbyxl/releases/tag/v1.1.0). The dataset provides all required data to reproduce the performance evaluation including the program running and automatic performance evaluation of TabbyXL. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). This demonstrates that TabbyXL can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

Electronic Archiving System

TabbyXL: Dataset for the Performance Evaluation of a Software Platform for Rule-Based Spreadsheet Data Extraction and Transformation

Author: Shigarov A (via Mendeley Data)
Publication venue
Publication date: 19/12/2018
Field of study

This dataset is designed to evaluate TabbyXL (version 1.0.4), a software platform for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at https://github.com/tabbydoc/tabbyxl/releases/tag/v1.0.4. The dataset provides all required data to reproduce the performance evaluation including the program running and automatic performance evaluation of TabbyXL. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). This demonstrates that TabbyXL can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

Electronic Archiving System

TabbyXL2: Experiment Data

Author: Shigarov A (via Mendeley Data)
Publication venue
Publication date: 29/06/2018
Field of study

This dataset is designed to evaluate TabbyXL2, a tool for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at GitHub (https://github.com/cellsrg/tabbyxl2). Our source data are based on the existing dataset of tables called Troy_200 (http://tc11.cvc.uab.es/datasets/Troy_200_1) that contains 200 arbitrary tables as CSV files collected from 10 different government statistical websites. We use its earlier version that stores the original tables with style features (fonts, alignment, and indentation) as Excel spreadsheets (available at http://tango.byu.edu/data). The dataset contains the following material: 1. All of Troy_200 tables with style features put into a single spreadsheet file; 2. The ground-truth data we prepared for the automatic performance evaluation of TabbyXL2 in the role and structural stages of the table analysis; 3. CRL and CLP rulesets designed for transforming Troy_200 arbitrary tables into the relational form; 4. The log files with the results of the program running and with the results of the performance evaluation of TabbyXL2. The dataset provides all required data to reproduce the automatic performance evaluation of TabbyXL2, using three following options: 1. TabbyXL2 automatically generates Java source code from CRL rules with our CRL interpreter and compile it to Java byte code, and then runs this generated program with JRE. 2. TabbyXL2 automatically maps CRL rules to DRL ones with the DSL specification, and runs the executing them with Drools Expert (https://www.drools.org) rule engine. 3. TabbyXL2 runs the executing CLP ruleset corresponding to our CRL ruleset with JESS (http://www.jessrules.com) rule engine. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). The experiment demonstrates that our tool, TabbyXL2, can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment

Electronic Archiving System

Rule-based spreadsheet data transformation from arbitrary to relational tables

Author: Abraham
Adelfio
Alexey O. Shigarov
Andrey A. Mikhailov
Barowy
Braunschweig
Cafarella
Cao
Chen
Chen
Chen
Chen
Chen
Crestan
Cunha
Cunha
de Vos
Deng
e Silva
Eberius
Embley
Embley
Embley
Fiorelli
Galkin
Goto
Govindaraju
Gulwani
Harris
Hung
Hurst
Jin
Kandel
Kim
Koci
Lautert
Lehmberg
Limaye
Mauro
Mulwad
Muñoz
Nagy
Nagy
Nagy
Nagy
Pivk
Raman
Seth
Shigarov
Shigarov
Shigarov
Shigarov
Tao
Tijerino
Venetis
Wang
Wang
Yang
Yoshida
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

KEFT: Knowledge Extraction and Graph Building from Statistical Data Tables

Author: A Klahold
A Shigarov
A Souili
A-O Shigarov
AL Gentile
F Ronzano
G Saporta
H-R Divakar
J Tekli
M Cremaschi
P Meneton
R Ahmad
R Rastan
R-M Heiberger
W Lu
X Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/11/2020
Field of study

International audienceData provided by statistical models are commonly represented by textual, tabular or graphical form in documents (reports, articles, posters and presentations). These documents are often available in PDF format. Even though it makes accessing a particular information more difficult, it is interesting to process the PDF documents directly. We present KEFT, a solution in the statistical domain and we describe the fully functional pipeline to constructing a knowledge graph by extracting entities and relations from statistical Data Tables. We showcase how this approach can be used to construct a knowledge graph from different statistical studies

Crossref

INRIA a CCSD electronic archive server

HAL-Paris 13

Oskar Bordeaux