5 research outputs found

    Theory and Practice of Data Citation

    Full text link
    Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

    Data citation practices in the CRAWDAD wireless network data archive

    Get PDF
    We are thankful for the generous support of our current funders ACM SIGCOMM and ACM SIGMOBILE, and our past funders Aruba Networks, Intel and the National Science Foundation.CRAWDAD (Community Resource for Archiving Wireless Data At Dartmouth) is a popular research data archive for wireless network data, archiving over 100 datasets used by over 6,500 users. In this paper we examine citation behaviour amongst 1,281 papers that use CRAWDAD datasets. We find that (in general) paper authors cite datasets in a manner that is sufficient for providing credit to dataset authors and also provides access to the datasets that were used. Only 11.5% of papers did not do so; common problems included (1) citing the canonical papers rather than the dataset, (2) describing the dataset using unclear identifiers, and (3) not providing URLs or pointers to datasets.PostprintPeer reviewe

    Scientific data citation : scoping review

    Get PDF
    Objetivo: Para acompanhar a evolução dos estudos relacionados a dados científicos, investigou-se o significado das citações a eles, buscando responder: 1) Quais as motivações dos pesquisadores para citar dados científicos?; 2) Quais as práticas de citação de dados apresentadas nas áreas cobertas pelo presente estudo?; 3) Quais as análises métricas para citação de dados? Método: Caracteriza-se como pesquisa do tipo qualitativa e descritiva, sendo uma revisão de literatura do tipo Scoping Review, com busca às bases de dados Emerald, LISA, LISTA, Scopus e Web of Science. Resultados: Como motivação, identificaram-se estudos sobre a correlação do incremento de citações às publicações tradicionais ao citarem os dados que as embasavam, muitos confirmaram a correlação, outros não, surgindo também a hipótese de causa comum: qualidade da pesquisa associada a mais recursos. Quanto às práticas, a comunidade está ciente que as citações atuais a dados não estão padronizadas, surgindo a tendência para a adoção de um padrão de citação que atenda às demandas de diferentes tipos de dados. Esta falta de padrão dificulta a análise métrica de citação a dados científicos, que ainda precisa ser explorada em pesquisas, tendo em vista que há uma repetição em utilizar as mesmas técnicas da citação tradicional para essa nova fonte de informação. Conclusões: Promover o avanço da ciência é a principal vantagem em disponibilizar dados, mas existem dificuldades técnicas e de atribuição de crédito que precisam ser enfrentadas em conjunto pelos pesquisadores, instituições, agências de fomento, repositórios de dados e equipes editoriais de publicações.Objective: This paper investigates the meaning assigned to data citation in order to follow the evolution of studies related to data citation, it tries to answer: 1) What are the motivations of researchers to cite scientific data?; 2) What are the data citation practices presented by the areas covered by this study?; 3) What are the metric analysis for data citation? Methods: It is a qualitative and descriptive research, being a scoping review of literature, by searching the Emerald, LISA, LIST, Scopus and Web of Science databases. Results : The studies investigated the correlation of citations increment to traditional publications by citing the data that supported them, many studies confirmed the correlation, others did not, and a common cause hypothesis arose: research quality associated with more resources. As for practices, the community is aware that current citations to data are not standardized, and there is a tendency to adopt a citation standard that meets the demands of different types of data. This lack of standard hinders the metric analysis of citation to scientific data that still needs to be explored in research, given that there is a repetition in using the same techniques of traditional citation for this new source of information. Conclusions : Promoting the progress of science is the main advantage in making data available, but there are credit and technical difficulties that need to be tackled together by researchers, institutions, funding agencies, data repositories, and publishing editorial teams

    WHO, WHAT, WHEN, WHERE, AND WHY? QUANTIFYING AND UNDERSTANDING BIOMEDICAL DATA REUSE

    Get PDF
    Since the mid-2000s, new data sharing mandates have led to an increase in the amount of research data available for reuse. Reuse of data benefits the scientific community and the public by potentially speeding scientific discovery and increasing the return on investment of publicly funded research. However, despite the potential benefits of reuse and the increasing availability of data, research on the impact of data reuse is so far sparse. This dissertation provides a deeper understanding of the impacts of shared biomedical research data by exploring who is reusing data and for what purpose. Specifically, this dissertation examines use requests and dataset descriptions from three biomedical repositories that require potential requestors to submit descriptions of their planned reuse. Content analysis of use requests yields insight into who is requesting data and the methods and topics of their planned reuse. Comparing use requests to the descriptions of the original datasets provides insight into the breadth of impact of data reuse and text mining of the original dataset descriptions helps determine the topics of datasets that are highly reused. This study demonstrates that patterns of reuse differ between dataset types, with genomic datasets used more frequently together in meta-analyses for topics that diverge from the original purpose of collection, while clinical datasets are used more often on their own within a context that is similar to the reason for which they were collected. While requestors do come from a range of career stages from around the world, they are not evenly distributed; most requests come from English-speaking countries, especially the United States. This study also finds that datasets that receive the most requests soon after release continue to go on to be more requested, and that datasets covering common diseases are requested more than datasets on rare diseases. These findings have implications for several stakeholders, including funders and institutions developing policies to reward and incentivize data sharing, researchers who share data and those who reuse it, and repositories and data curators who must make choices about which datasets to curate and preserve

    CRAWDAD wireless network data citation bibliography

    No full text
    <p>This BibTeX file contains the corpus of papers that cite CRAWDAD wireless network datasets, as used in the paper:</p> <p>Tristan Henderson and David Kotz. Data citation practices in the CRAWDAD wireless network data archive. Proceedings of the Second Workshop on Linking and Contextualizing Publications and Datasets, London, UK, September 2014.</p> <p>Most of the fields are standard BibTeX fields. There are two that require further explanation.</p> <p>"citations" - this field contains the citations for a paper as counted<br>by Google Scholar as of 24 September 2014.</p> <p>"keywords" - this field contains a set of tags indicating data citation practice. These are as follows:<br>- "uses_crawdad_data" - this paper uses a CRAWDAD dataset<br>- "cites_insufficiently" - this paper does not meet our sufficiency criteria<br>- "cites_by_description" - this paper cites a dataset by description rather than dataset identifier<br>- "cites_canonical_paper" - this paper cites the original ("canonical") paper that collected a dataset, rather than pointing to the dataset<br>- "cites_by_name" - this paper cites a dataset by a colloquial name rather than dataset identifier<br>- "cites_crawdad_url" - this paper cites the main CRAWDAD URL rather than a particular dataset<br>- "cites_without_url" - this paper does not provide a URL for dataset access<br>- "cites_wrong_attribution" - this paper attributes a dataset to CRAWDAD, Dartmouth etc rather than the dataset authors<br>- "cites_vaguely" - this paper cites the used datasets (if any) too vaguely to be sufficient</p> <p>If you have any questions about the data, please contact us at<br>[email protected]</p> <p> </p
    corecore