Search CORE

47 research outputs found

Analysis and Synthesis of Metadata Goals for Scientific Data

Author: Bain
Baker
Blank
Bountouri
Bosch
Brazma
Bruce
Buschmann
Committee on Science Engineering, and Public Policy (US), and Committee on Ensuring the Utility and Integrity of Research Data in a Digital Age
Consultative Committee for Space Data Systems (CCSDS)
Duval
Frenkel
Garvey
Greenberg
Greenberg
Greenberg
Greenberg
Hall
Hall
Heidorn
Hey
Higgins
Hjørland
Hubenthal
Jones
Kelling
Klein
Krippendorff
Lide
Lim
Michener
Murray-Rust
National Science Foundation
NSF Task Force on Cyberlearning
Rayner
Ryssevik
Sommerville
Spellman
Spurgin
Stvilia
Westbrook
Westbrook
Zhang
Publication venue: Duke University School of Law
Publication date: 01/01/2012
Field of study

The proliferation of discipline-specific metadata schemes contributes to artificial barriers that can impede interdisciplinary and transdisciplinary research. The authors considered this problem by examining the domains, objectives, and architectures of nine metadata schemes used to document scientific data in the physical, life, and social sciences. They used a mixed-methods content analysis and Greenberg’s (2005) metadata objectives, principles, domains, and architectural layout (MODAL) framework, and derived 22 metadata-related goals from textual content describing each metadata scheme. Relationships are identified between the domains (e.g., scientific discipline and type of data) and the categories of scheme objectives. For each strong correlation (\u3e0.6), a Fisher’s exact test for nonparametric data was used to determine significance (p \u3c .05). Significant relationships were found between the domains and objectives of the schemes. Schemes describing observational data are more likely to have “scheme harmonization” (compatibility and interoperability with related schemes) as an objective; schemes with the objective “abstraction” (a conceptual model exists separate from the technical implementation) also have the objective “sufficiency” (the scheme defines a minimal amount of information to meet the needs of the community); and schemes with the objective “data publication” do not have the objective “element refinement.” The analysis indicates that many metadata-driven goals expressed by communities are independent of scientific discipline or the type of data, although they are constrained by historical community practices and workflows as well as the technological environment at the time of scheme creation. The analysis reveals 11 fundamental metadata goals for metadata documenting scientific data in support of sharing research data across disciplines and domains. The authors report these results and highlight the need for more metadata-related research, particularly in the context of recent funding agency policy changes

Publikationer från KTH

Crossref

Duke Law Scholarship Repository

Digitala Vetenskapliga Arkivet - Academic Archive On-line

espace@Curtin

Resources for Lothbrok: Optimizing SPARQL Queries over Decentralized Knowledge Graphs

Author: Aebeloe Christian
AI for the People
Artificial Intelligence and Machine Learning
Data Engineering Science and Systems
Data Knowledge and Web Engineering
Database and Web Technologies
Department of Computer Science
Hose Katja
Montoya Gabriela
The Technical Faculty of IT and Design
Publication venue: Zenodo
Publication date: 01/01/2022
Field of study

A repository for the resources needed to reproduce the experiments in our paper "Optimizing SPARQL Queries over Decentralized Knowledge Graphs"

VBN (Videnbasen) Aalborg Universitets forskningsportal

DBpedia RDF2Vec Graph Embeddings

Author: AI for the People
Artificial Intelligence and Machine Learning
Christensen Martin Pekár
Data Engineering Science and Systems
Data Knowledge and Web Engineering
Department of Computer Science
Hose Katja
Lissandrini Matteo
The Technical Faculty of IT and Design
Publication venue: Zenodo
Publication date: 22/03/2022
Field of study

Generation of RDF2Vec embeddings.DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1]. The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021). Figure of cosine similarities between a selected set of DBpedia entities are provided in the dataset here. Generating Embeddings The code for generating these embeddings can be found here. Run the run.sh script that wraps all the necessary commmands to generate embeddings bash run.sh The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files. A folder files is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia is created containing the embeddings in vectors.txt along a set of random walk files. Run Time of Embeddings Generation Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine. Total: 1 day, 8 hours, 52 minutes, 41 secondsWalk generation: 0 days, 7 minutes, 24 minutes, 36 secondsTraining: 1 day, 1 hour, 28 minutes, 5 seconds Parameters Used Here is listed the parameters used to generate the embeddings provided here: Number of walks per entity: 100Depth (hops) per walk: 4Walk generation mode: RANDOM_WALKS_DUPLICATE_FREEThreads: # of processors / 2Training mode: sgEmbeddings vector dimension: 200Minimum word2vec word count: 1Sample rate: 0.0Training window size: 5Training epochs:

VBN (Videnbasen) Aalborg Universitets forskningsportal

A core ontology for modeling life cycle sustainability assessment on the Semantic Web with Accompanying Database

Author: Aalborg University
AI for the People
Daisy - Center for Data-intensive Systems
Data Engineering Science and Systems
Data Knowledge and Web Engineering
Database and Web Technologies
Department of Computer Science
Department of Sustainability and Planning
Ghose Agneta
Hansen Emil Riis
Lissandrini Matteo
The Danish Centre for Environmental Assessment
The Technical Faculty of IT and Design
Weidema Bo Pedersen
Publication venue: Zenodo
Publication date: 01/01/2021
Field of study

To enable and support the uptake of semantic ontologies, we present a core ontology developed specifically to capture the data relevant for life cycle sustainability assessment. We further demonstrate the utility of the ontology by using it to integrate data relevant to sustainability assessments, such as EXIOBASE and the Yale Stocks and Flow Database to the Semantic Web. These datasets can be accessed by the machine-readable endpoint using SPARQL, a semantic query language

VBN (Videnbasen) Aalborg Universitets forskningsportal

Resources for Lothbrok: Optimizing SPARQL Queries over Decentralized Knowledge Graphs

Author: Aebeloe Christian
AI for the People
Artificial Intelligence and Machine Learning
Data Engineering Science and Systems
Data Knowledge and Web Engineering
Database and Web Technologies
Department of Computer Science
Hose Katja
Montoya Gabriela
The Technical Faculty of IT and Design
Publication venue: Zenodo
Publication date: 01/01/2022
Field of study

A repository for the resources needed to reproduce the experiments in our paper "Optimizing SPARQL Queries over Decentralized Knowledge Graphs"

VBN (Videnbasen) Aalborg Universitets forskningsportal

Automatically Extracted SHACL Shapes for WikiData, DBpedia, YAGO-4, and LUBM & Associated Coverage Statistics

Author: Aalborg University Denmark
AI for the People
Artificial Intelligence and Machine Learning
Data Engineering Science and Systems
Data Knowledge and Web Engineering
Database and Web Technologies
Department of Computer Science
Hose Katja
Lissandrini Matteo
Rabbani Kashif
The Technical Faculty of IT and Design
Publication venue: Zenodo
Publication date: 01/01/2022
Field of study

The uploaded datasets contain automatically extracted SHACL shapes for the following datasets: WikiData (the truthy dump from September 2021 filtered by removing non-English strings) [1]DBpedia [2]YAGO-4 [3] LUBM (scale factor 500) [4] The validating shapes for these datasets are generated by a program that parses the corresponding RDF files (in `.nt` format). The extracted shapes encode various SHACL constraints, e.g., sh:minCount, sh:path, sh:class, sh:datatype etc. For each shape we encode coverage in terms of number of entities satisfying such shape, this information is encoded using the void:entities predicate. We have provided as executable Jar file the program we developed to extract these SHACL shapes. More details about the datasets used to extract these shapes and how to run the Jar are available on our GitHub repository https://github.com/Kashif-Rabbani/validatingshapes. [1] Vrandečić, Denny, and Markus Krötzsch. "Wikidata: a free collaborative knowledgebase." Communications of the ACM 57.10 (2014): 78-85. [2] Auer, Sören, et al. "Dbpedia: A nucleus for a web of open data." The semantic web. Springer, Berlin, Heidelberg, 2007. 722-735. [3] Pellissier Tanon, Thomas, Gerhard Weikum, and Fabian Suchanek. "Yago 4: A reason-able knowledge base." European Semantic Web Conference. Springer, Cham, 2020. [4] Guo, Yuanbo, Zhengxiang Pan, and Jeff Heflin. "LUBM: A benchmark for OWL knowledge base systems." Journal of Web Semantics 3.2-3 (2005): 158-182

VBN (Videnbasen) Aalborg Universitets forskningsportal

The Project of Efficient and Error-bounded Spatiotemporal Quantile Monitoring in Edge Computing Environments

Author: AI for the People
Artificial Intelligence and Machine Learning
CITS
Daisy - Center for Data-intensive Systems
Data Engineering Science and Systems
Department of Computer Science
Jensen Christian S.
Li Huan
Lu Hua
Roskilde University
Southern University of Science and Technology
Tang Bo
The Technical Faculty of IT and Design
Yi Lanjing
Publication venue: Zenodo
Publication date: 25/05/2022
Field of study

The source code, datasets, and scripts for reproducing of our paper entitled "Efficient and Error-bounded Spatiotemporal Quantile Monitoring in Edge Computing Environments"

VBN (Videnbasen) Aalborg Universitets forskningsportal

Tutorial for the 2022 ACM SIGMOD Conference: Spatial Data Quality in the IoT Era: Management and Exploitation

Author: AI for the People
Artificial Intelligence and Machine Learning
Cheema Muhammad Aamir
CITS
Daisy - Center for Data-intensive Systems
Data Engineering Science and Systems
Department of Computer Science
Jensen Christian S.
Li Huan
Lu Hua
Monash University
Roskilde University
Southern University of Science and Technology
Tang Bo
The Technical Faculty of IT and Design
Publication venue: Zenodo
Publication date: 17/06/2022
Field of study

Within the rapidly expanding Internet of Things (IoT), growing amounts of spatially referenced data are being generated. Due to the dynamic, decentralized, and heterogeneous nature of the IoT, spatial IoT data (SID) quality has attracted considerable attention in academia and industry. How to invent and use technologies for managing spatial data quality and exploiting low-quality spatial data are key challenges in the IoT. In this tutorial, we highlight the SID consumption requirements in applications and offer an overview of spatial data quality in the IoT setting. In addition, we review pertinent technologies for quality management and low-quality data exploitation, and we identify trends and future directions for quality-aware SID management and utilization. The tutorial aims to not only help researchers and practitioners to better comprehend SID quality challenges and solutions, but also offer insights that may enable innovative research and applications

VBN (Videnbasen) Aalborg Universitets forskningsportal

Generalized Approximate Message Passing Practical 2D Phase Transition Simulations Dataset

Author: AAU Shared Services
Arildsen Thomas
Artificial Intelligence and Sound
CLAAUDIA Research Data Services
Data Knowledge and Web Engineering
Department of Computer Science
Department of Electronic Systems
IT Services
Larsen Torben
Oxvig Christian Schou
Signal and Information Processing
Technology Sourcing and Support
The Faculty of Engineering and Science (TECH)
The Technical Faculty of IT and Design
Publication venue: Zenodo
Publication date: 01/01/2016
Field of study

This deposition contains the results from a simulation of phase transitions for various practical 2D problem suites when using the Generalised Approximate Message Passing (GAMP) reconstruction algorithm. The deposition consists of: Five HDF5 databases containing the results from the phase transition simulations (gamp_practical_2d_phase_transitions_ID_[0-4]_of_5.hdf5). The Python script which was used to create the databases (gamp_practical_2d_phase_transitions.py). A Python module with tools needed to run the simulations (gamp_pt_tools.py). MD5 and SHA256 checksums of the databases and Python scripts (gamp_practical_2d_phase_transitions.MD5SUMS / gamp_practical_2d_phase_transitions.SHA256SUMS). The HDF5 databases are licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/) . Since the CC BY 4.0 license is not well suited for source code, the Python scripts are licensed under the BSD 2-Clause license (http://opensource.org/licenses/BSD-2-Clause) . The files are provided as-is with no warranty as detailed in the above mentioned licenses

VBN (Videnbasen) Aalborg Universitets forskningsportal

Reconstruction Algorithms in Undersampled AFM Imaging - results

Author: AAU Shared Services
Arildsen Thomas
Artificial Intelligence and Sound
CLAAUDIA Research Data Services
Data Knowledge and Web Engineering
Department of Computer Science
Department of Electronic Systems
IT Services
Larsen Torben
Oxvig Christian Schou
Pedersen Patrick Steffen
Signal and Information Processing
Technology Sourcing and Support
The Faculty of Engineering and Science (TECH)
The Technical Faculty of IT and Design
Østergaard Jan
Publication venue: Zenodo
Publication date: 01/01/2015
Field of study

This data set contains numerical simulation results from experiments for the paper "Review of compressed sensing reconstruction algorithms in AFM cell imaging", submitted to IEEE Journal of Selected Topics in Signal Processing. The data set consists of an HDF5 file containing the simulation results as well as MD5 and SHA checksums of the HDF5 database for validating the integrity of the data after download. The data set is licensed under CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Python scripts used for producing these results as well as Python scripts for extracting images and data used in the accompanying paper from the database can be found in the accompanying deposition http://doi.org/10.5281/zenodo.18745. The data set contains images, and reconstructed versions of these, originally published in the data set available at http://dx.doi.org/10.5281/zenodo.17573

VBN (Videnbasen) Aalborg Universitets forskningsportal