242,292 research outputs found

    An open database of productivity in Vietnam's social sciences and humanities for public use

    Get PDF
    This study presents a description of an open database on scientific output of Vietnamese researchers in social sciences and humanities, one that corrects for the shortcomings in current research publication databases such as data duplication, slow update, and a substantial cost of doing science. Here, using scientists’ self-reports, open online sources and cross-checking with Scopus database, we introduce a manual system and its semi-automated version of the database on the profiles of 657 Vietnamese researchers in social sciences and humanities who have published in Scopus-indexed journals from 2008 to 2018. The final system also records 973 foreign co-authors, 1,289 papers, and 789 affiliations. The data collection method, highly applicable for other sources, could be replicated in other developing countries while its content be used in cross-section, multivariate, and network data analyses. The open database is expected to help Vietnam revamp its research capacity and meet the public demand for greater transparency in science management

    Rethinking databases: the fib project for a connected data repository

    Get PDF
    The need for reliable test data to verify scientific theories together with the necessity to calibrate code-oriented expressions and their level of safety have pushed researchers to develop test databases. Such databases collect the work of multiple independent scientific initiatives compiling experimental evidences on similar test configurations but with different parameters such as geometry, size of the component tested, boundary conditions, and/or mechanical properties of the materials. Such databases, usually collected for specific purposes, have traditionally faced maintenance, and especially consistency issues. Also, they have given rise to conflicting values due to different interpretations of the same data entries in databases elaborated by different researchers. In addition, such databases are frequently designed and configured to serve for one purpose, although their data could be used in multiple manners. Recent developments regarding machine learning and artificial intelligence have opened the door to exploiting historically collected test results by data mining techniques. In an effort to improve the current situation concerning databases, the fib has launched a wide and open initiative on test data management. It consists of a common data structure, sufficiently flexible to serve for multiple purposes, hosting very different test setups, and meant to be used for cross analyses of data. This allows reusing data originally developed for one purpose to analyze other aspects, enhancing the value of current experimental work. Also, a consistent management structure is defined taking advantage of the fib network, with database editors, collectors and users, allowing for a transparent and documented peer-review process while incorporating new test data. This paper presents the main basis grounding the test data management as well as details on the first two completed applications, referring to fiber-reinforced concrete (material data) and to punching of slab-column connections (structural response), showcasing the framework versatility and universality.Peer ReviewedPostprint (published version

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Get PDF
    Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

    Theory and Practice of Data Citation

    Full text link
    Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

    Making visible the invisible through the analysis of acknowledgements in the humanities

    Full text link
    Purpose: Science is subject to a normative structure that includes how the contributions and interactions between scientists are rewarded. Authorship and citations have been the key elements within the reward system of science, whereas acknowledgements, despite being a well-established element in scholarly communication, have not received the same attention. This paper aims to put forward the bearing of acknowledgements in the humanities to bring to the foreground contributions and interactions that, otherwise, would remain invisible through traditional indicators of research performance. Design/methodology/approach: The study provides a comprehensive framework to understanding acknowledgements as part of the reward system with a special focus on its value in the humanities as a reflection of intellectual indebtedness. The distinctive features of research in the humanities are outlined and the role of acknowledgements as a source of contributorship information is reviewed to support these assumptions. Findings: Peer interactive communication is the prevailing support thanked in the acknowledgements of humanities, so the notion of acknowledgements as super-citations can make special sense in this area. Since single-authored papers still predominate as publishing pattern in this domain, the study of acknowledgements might help to understand social interactions and intellectual influences that lie behind a piece of research and are not visible through authorship. Originality/value: Previous works have proposed and explored the prevailing acknowledgement types by domain. This paper focuses on the humanities to show the role of acknowledgements within the reward system and highlight publication patterns and inherent research features which make acknowledgements particularly interesting in the area as reflection of the socio-cognitive structure of research.Comment: 14 page

    A Survey on Array Storage, Query Languages, and Systems

    Full text link
    Since scientific investigation is one of the most important providers of massive amounts of ordered data, there is a renewed interest in array data processing in the context of Big Data. To the best of our knowledge, a unified resource that summarizes and analyzes array processing research over its long existence is currently missing. In this survey, we provide a guide for past, present, and future research in array processing. The survey is organized along three main topics. Array storage discusses all the aspects related to array partitioning into chunks. The identification of a reduced set of array operators to form the foundation for an array query language is analyzed across multiple such proposals. Lastly, we survey real systems for array processing. The result is a thorough survey on array data storage and processing that should be consulted by anyone interested in this research topic, independent of experience level. The survey is not complete though. We greatly appreciate pointers towards any work we might have forgotten to mention.Comment: 44 page
    • …
    corecore