242,292 research outputs found
An open database of productivity in Vietnam's social sciences and humanities for public use
This study presents a description of an open database on scientific output of Vietnamese researchers in social sciences and humanities, one that corrects for the shortcomings in current research publication databases such as data duplication, slow update, and a substantial cost of doing science. Here, using scientists’ self-reports, open online sources and cross-checking with Scopus database, we introduce a manual system and its semi-automated version of the database on the profiles of 657 Vietnamese researchers in social sciences and humanities who have published in Scopus-indexed journals from 2008 to 2018. The final system also records 973 foreign co-authors, 1,289 papers, and 789 affiliations. The data collection method, highly applicable for other sources, could be replicated in other developing countries while its content be used in cross-section, multivariate, and network data analyses. The open database is expected to help Vietnam revamp its research capacity and meet the public demand for greater transparency in science management
Rethinking databases: the fib project for a connected data repository
The need for reliable test data to verify scientific theories together with the necessity to calibrate code-oriented expressions and their level of safety have pushed researchers to develop test databases. Such databases collect the work of multiple independent scientific initiatives compiling experimental evidences on similar test configurations but with different parameters such as geometry, size of the component tested, boundary conditions, and/or mechanical properties of the materials. Such databases, usually collected for specific purposes, have traditionally faced maintenance, and especially consistency issues. Also, they have given rise to conflicting values due to different interpretations of the same data entries in databases elaborated by different researchers. In addition, such databases are frequently designed and configured to serve for one purpose, although their data could be used in multiple manners. Recent developments regarding machine learning and artificial intelligence have opened the door to exploiting historically collected test results by data mining techniques. In an effort to improve the current situation concerning databases, the fib has launched a wide and open initiative on test data management. It consists of a common data structure, sufficiently flexible to serve for multiple purposes, hosting very different test setups, and meant to be used for cross analyses of data. This allows reusing data originally developed for one purpose to analyze other aspects, enhancing the value of current experimental work. Also, a consistent management structure is defined taking advantage of the fib network, with database editors, collectors and users, allowing for a transparent and documented peer-review process while incorporating new test data. This paper presents the main basis grounding the test data management as well as details on the first two completed applications, referring to fiber-reinforced concrete (material data) and to punching of slab-column connections (structural response), showcasing the framework versatility and universality.Peer ReviewedPostprint (published version
DataHub: Collaborative Data Science & Dataset Version Management at Scale
Relational databases have limited support for data collaboration, where teams
collaboratively curate and analyze large datasets. Inspired by software version
control systems like git, we propose (a) a dataset version control system,
giving users the ability to create, branch, merge, difference and search large,
divergent collections of datasets, and (b) a platform, DataHub, that gives
users the ability to perform collaborative data analysis building on this
version control system. We outline the challenges in providing dataset version
control at scale.Comment: 7 page
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Making visible the invisible through the analysis of acknowledgements in the humanities
Purpose: Science is subject to a normative structure that includes how the
contributions and interactions between scientists are rewarded. Authorship and
citations have been the key elements within the reward system of science,
whereas acknowledgements, despite being a well-established element in scholarly
communication, have not received the same attention. This paper aims to put
forward the bearing of acknowledgements in the humanities to bring to the
foreground contributions and interactions that, otherwise, would remain
invisible through traditional indicators of research performance.
Design/methodology/approach: The study provides a comprehensive framework to
understanding acknowledgements as part of the reward system with a special
focus on its value in the humanities as a reflection of intellectual
indebtedness. The distinctive features of research in the humanities are
outlined and the role of acknowledgements as a source of contributorship
information is reviewed to support these assumptions.
Findings: Peer interactive communication is the prevailing support thanked in
the acknowledgements of humanities, so the notion of acknowledgements as
super-citations can make special sense in this area. Since single-authored
papers still predominate as publishing pattern in this domain, the study of
acknowledgements might help to understand social interactions and intellectual
influences that lie behind a piece of research and are not visible through
authorship.
Originality/value: Previous works have proposed and explored the prevailing
acknowledgement types by domain. This paper focuses on the humanities to show
the role of acknowledgements within the reward system and highlight publication
patterns and inherent research features which make acknowledgements
particularly interesting in the area as reflection of the socio-cognitive
structure of research.Comment: 14 page
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
- …