Search CORE

3,063 research outputs found

Open data and the academy: an evaluation of CKAN for research data management

Author: Winn Joss
Publication venue
Publication date: 01/05/2013
Field of study

This paper offers a full and critical evaluation of the open source CKAN software (http://ckan.org) for use as a Research Data Management (RDM) tool within a university environment. It presents a case study of CKAN's implementation and use at the University of Lincoln, UK, and highlights its strengths and current weaknesses as an institutional Research Data Management tool. The author draws on his prior experience of implementing a mixed media Digital Asset Management system (DAM), Institutional Repository (IR) and institutional Web Content Management System (CMS), to offer an outline proposal for how CKAN can be used effectively for data analysis, storage and publishing in academia. This will be of interest to researchers, data librarians, and developers, who are responsible for the implementation of institutional RDM infrastructure. This paper is presented as part of the dissemination activities of the Jisc-funded Orbital project (http://orbital.blogs.lincoln.ac.uk

LinkChains: Exploring the space of decentralised trustworthy Linked Data

Author: Domingue John
Third Allan
Publication venue
Publication date: 01/01/2017
Field of study

Distributed ledger platforms based on blockchains provide a fully distributed form of data storage which can guarantee data integrity. Certain use cases, such as medical applications, can benefit from guarantees that the results of arbitrary queries against a Linked Dataset faithfully represent its contents as originally published, without tampering or data corruption. We describe potential approaches to the storage and querying of Linked Data with varying degrees of decentralisation and guarantees of integrity, using distributed ledgers, and discuss their a priori differences in performance, storage limitations and reliability, setting out a programme for future empirical research

Constitute: The world’s constitutions to read, search, and compare

Author: Elkins Zachary
Ginsburg Tom
Melton James
Miranker Daniel
Sequeda Juan
Shaffer Robert
Publication venue
Publication date: 07/07/2014
Field of study

Constitutional design and redesign is constant. Over the last 200 years, countries have replaced their constitutions an average of every 19 years and some have amended them almost yearly. A basic problem in the drafting of these documents is the search and analysis of model text deployed in other jurisdictions. Traditionally, this process has been ad hoc and the results suboptimal. As a result, drafters generally lack systematic information about the institutional options and choices available to them. In order to address this informational need, the investigators developed a web application, Constitute [online at http://www.constituteproject.org], with the use of semantic technologies. Constitute provides searchable access to the world’s constitutions using the conceptualization, texts, and data developed by the Comparative Constitutions Project. An OWL ontology represents 330 ‘‘topics’’ – e.g. right to health – with which the investigators have tagged relevant provisions of nearly all constitutions in force as of September of 2013. The tagged texts were then converted to an RDF representation using R2RML mappings and Capsenta’s Ultrawrap. The portal implements semantic search features to allow constitutional drafters to read, search, and compare the world’s constitutions. The goal of the project is to improve the efficiency and systemization of constitutional design and, thus, to support the independence and self-reliance of constitutional drafters.Governmen

Texas ScholarWorks

knn-seq: Efficient, Extensible kNN-MT Framework

Author: Deguchi Hiroyuki
Hirano Hayate
Hoshino Tomoki
Nishida Yuto
Vasselli Justin
Watanabe Taro
Publication venue
Publication date: 18/10/2023
Field of study

k-nearest-neighbor machine translation (kNN-MT) boosts the translation quality of a pre-trained neural machine translation (NMT) model by utilizing translation examples during decoding. Translation examples are stored in a vector database, called a datastore, which contains one entry for each target token from the parallel data it is made from. Due to its size, it is computationally expensive both to construct and to retrieve examples from the datastore. In this paper, we present an efficient and extensible kNN-MT framework, knn-seq, for researchers and developers that is carefully designed to run efficiently, even with a billion-scale large datastore. knn-seq is developed as a plug-in on fairseq and easy to switch models and kNN indexes. Experimental results show that our implemented kNN-MT achieves a comparable gain to the original kNN-MT, and the billion-scale datastore construction took 2.21 hours in the WMT'19 German-to-English translation task. We publish our knn-seq as an MIT-licensed open-source project and the code is available on https://github.com/naist-nlp/knn-seq . The demo video is available on https://youtu.be/zTDzEOq80m0

arXiv.org e-Print Archive