166 research outputs found

    Archiving social survey data in Africa : an overview of African microdata curation and the role of survey data archives in data management in Africa

    Get PDF
    Includes bibliographical references (p. 124-157).This study examines current practice in the curation of social survey data in African countries and makes suggestions for future improvements in this regard. Curation of data refers to its preservation and management for reuse. Utilising survey data for the study of social phenomena other than those for which the original survey was initiated is a relatively new research approach in Africa. Thus best practice for this type of research is still being put in place by African organisations. This involves the development of optimal means of processing and storing the data for re-use. Of concern to this study is what constitutes the most effective way of managing and sharing the information garnered from these surveys as a resource for economic and social development in Africa. Social survey data refers to both the statistical information which is the final product of censuses or sample surveys, and the documentation provided with the data to facilitate its reuse. Documentation includes technical notes and questionnaires used in the survey process, as well as meta data (detailed information about the data) and reports produced concerning the final survey findings. The research looks at the history of the management of social survey data worldwide and in African countries, and the policies and processes involved in curating survey information in these countries. The comparative component of the study examines developments in this field internationally and compares these to practices on the African continent. International best practice in the field has been used to evaluate current methods of survey data archiving in African countries. The study presents strategies to ensure the optimal preservation and effective sharing of survey data among countries of the region. Strategies for the establishment of a Pan African network of data sharing organisations are suggested to support future repurposing of African census and survey data

    Committing to Data Quality Review

    Full text link

    A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature

    Full text link
    Discovering authoritative links between publications and the datasets that they use can be a labor-intensive process. We introduce a natural language processing pipeline that retrieves and reviews publications for informal references to research datasets, which complements the work of data librarians. We first describe the components of the pipeline and then apply it to expand an authoritative bibliography linking thousands of social science studies to the data-related publications in which they are used. The pipeline increases recall for literature to review for inclusion in data-related collections of publications and makes it possible to detect informal data references at scale. We contribute (1) a novel Named Entity Recognition (NER) model that reliably detects informal data references and (2) a dataset connecting items from social science literature with datasets they reference. Together, these contributions enable future work on data reference, data citation networks, and data reuse.Comment: 13 pages, 7 figures, 3 table

    Managing Big Data Issues Within a Research Data Repository: Dealing With the 21st Century Data Explosion

    Full text link
    Increasingly, organizations in both the public and private sector who fund the collection of research information expect and in many cases mandate the public sharing of these data as a condition of support. While representing a positive expression of the idea that information represents a public good, it has also resulted in a veritable flood of new studies, surveys and administrative records entering the public domain; the emergence of the Big Data model in the secondary analysis of research information. Much of these data are managed by data repositories that ingest, process, clean and enhance these files so they are accurate and consistent and introduce as little error as possible into the analysis stream of information. Conversely, this huge influx of Big Data resources has also resulted in higher expectations for the rapid release of data that complicates the need for a thorough review and cleaning of files before distribution. This paper reviews emerging approaches within a research data repository that seek to maintain high quality control while managing Big Data streams as they enter the system.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/139597/1/Managing Big Data Issues Within a Research Data Repository McNally Paper Submission.pdfDescription of Managing Big Data Issues Within a Research Data Repository McNally Paper Submission.pdf : Main Articl

    Detecting Informal Data References in Academic Literature

    Get PDF
    The Inter-university Consortium for Political and Social Research (ICPSR) is developing a machine learning approach using natural language processing (NLP) to assist in the detection of informal data references. Formal data citations that reference unique identifiers are readily discoverable; however, informal references indicating research data reuse are challenging to infer and detect. We contribute a model that uses a combination of cues, such as the presence of indicator terms and syntactical patterns, to assign a likelihood score to dataset mentions and extract candidate data citations from academic text. In production, the model will support the evaluation of candidate documents for ingest into the ICPSR Bibliography of Data-related Literature. This work supports a larger effort to measure the impact of research data.http://deepblue.lib.umich.edu/bitstream/2027.42/168392/1/Detecting_Informal_Data_Refs.pdfDescription of Detecting_Informal_Data_Refs.pdf : PreprintSEL

    Qualitative data sharing practices in social sciences

    Get PDF
    Social scientists have been sharing data for a long time. Sharing qualitative data, however, has not become a common practice, despite the context of e-Research, information growth, and funding agencies’ mandates on research data archiving and sharing. Since most systematic and comprehensive studies are based on quantitative data practices, little is known about how social scientists share their qualitative data. This dissertation study aims to fill this void. By synergizing the theory of Knowledge Infrastructure (KI) and the Theory of Remote Scientific Collaboration (TORSC), this dissertation study develops a series of instruments to investigate data-sharing practices in social sciences. Five sub-studies (two preliminary studies and three case studies) are conducted to gather information from different stakeholder groups in social sciences, including early career social scientists, social scientists who have deposited qualitative data at research data repositories, and eight information professionals at the world’s largest social science data repository, ICPSR. The sub-studies are triangulated using four dimensions: data characteristics, individual, technological, and organizational aspects. The results confirm the inactive data sharing practices in social sciences: the majority of faculty and students do not share data or are unaware of data sharing. Additional findings regarding social scientists’ qualitative data-sharing behaviors include: 1) those who have shared qualitative data in data repositories are more likely to share research tools than their raw data; and 2) the perceived technical supports and extrinsic motivations are both strong predictors for qualitative data sharing. These findings also confirm that preparing qualitative data sharing packages is time- and labor-consuming, because both researchers and data repositories need to spend extra effort to prevent sensitive data from disclosure. This dissertation makes contributions in three key aspects: 1) descriptive facts regarding current data-sharing practices in social sciences based on empirical data collection, 2) an in-depth analysis of determinants leading to qualitative data sharing, and 3) managerial recommendations for different stakeholders in developing a sustainable data-sharing environment in social sciences and beyond

    Research Libraries and Research Data Management within the Humanities and Social Sciences

    Get PDF
    Research Data Management (RDM) is a process that is designed to deliver high quality datasets, which comply with scholarly, legal and ethical requirements. There are two outputs of the RDM process: 1. Long term preservation of datasets through archiving 2. Sharing and reuse of datasets for further research and other purposes in society at large. This proposal outlines the creation of a coherent Research Data Management organization at Lund University that utilizes existing resources both within and outside the university and establishes new organizational units and information systems, specific to this new task. We propose the establishment of a new unit for Research Data Management and Coordination at the University Library whose responsibility would be to coordinate the network of existing agents who support research activities such as faculty libraries and ethical, legal, archival and data management experts. We further propose the creation of a new information system, the Lund University Dataset Directory, which will facilitate management of datasets and information retrieval throughout the data lifecycle. We expect that research datasets could be deposited for sharing at national or disciplinary repositories and eventually archived when a solution is in place at the University Archive. Advanced RDM - like semantic web technologies - will require online data services not currently provided by national agents. We therefor propose a Data Laboratory within the RDM network at Lund University. Finally, it's important to recognize that Research Data Management is a new way of organizing information with its own set of tasks for the library organization. Our efforts in RDM will require us to invest significant effort in learning new systems, ways of working and collaboration

    Research Data Management Practices And Impacts on Long-term Data Sustainability: An Institutional Exploration

    Get PDF
    With the \u27data deluge\u27 leading to an institutionalized research environment for data management, U.S. academic faculty have increasingly faced pressure to deposit research data into open online data repositories, which, in turn, is engendering a new set of practices to adapt formal mandates to local circumstances. When these practices involve reorganizing workflows to align the goals of local and institutional stakeholders, we might call them \u27data articulations.\u27 This dissertation uses interviews to establish a grounded understanding of the data articulations behind deposit in 3 studies: (1) a phenomenological study of genomics faculty data management practices; (2) a grounded theory study developing a theory of data deposit as articulation work in genomics; and (3) a comparative case study of genomics and social science researchers to identify factors associated with the institutionalization of research data management (RDM). The findings of this research offer an in-depth understanding of the data management and deposit practices of academic research faculty, and surfaced institutional factors associated with data deposit. Additionally, the studies led to a theoretical framework of data deposit to open research data repositories. The empirical insights into the impacts of institutionalization of RDM and data deposit on long-term data sustainability update our knowledge of the impacts of increasing guidelines for RDM. The work also contributes to the body of data management literature through the development of the data articulation framework which can be applied and further validated by future work. In terms of practice, the studies offer recommendations for data policymakers, data repositories, and researchers on defining strategies and initiatives to leverage data reuse and employ computational approaches to support data management and deposit

    Disaster Planning and Trustworthy Digital Repositories

    Get PDF
    Master's ThesisThe goal of this study is to understand if digital repositories that have a preservation mandate are engaging in disaster planning, particularly in relation to their pursuit of trusted digital repository status. For those that are engaging in disaster planning, the study examines the creation of formal disaster response and recovery plans, finding that in most cases the process of going through an audit for certification as a trusted repository provides the impetus for the creation of formalized disaster planning documentation. This paper also discusses obstacles that repositories encounter and finds that most repositories struggle with making their documentation available.https://deepblue.lib.umich.edu/bitstream/2027.42/137664/1/Frank_MSI_Thesis_DeepBlue.pdfDescription of Frank_MSI_Thesis_DeepBlue.pdf : Thesi

    Revolutionary or evolutionary? Making research data management manageable

    Get PDF
    This chapter investigates the role of academic librarians, particularly those at small liberal arts institutions, in providing research data management services. Research data management may not seem like an obvious fit for curricular libraries whose primary mission is supporting teaching rather than faculty research, nor is data curation an obvious need for schools without a data repository or staff who specialize in the preservation and dissemination of data. Yet numerous reports cite data management and data services as critical services for the future of academic libraries (ACRL Planning and Review Committee, 2013; Johnson, 2014; Cox, 2013; Tenopir, 2012). The question raised, then, is how and why are data management services important in the liberal arts context? What can librarians at these institutions do to develop expertise in this growing area of the profession? What services are college and university libraries beginning to provide, and how successfully can existing models be adapted to other institutions? Does the addition of data services transform the mission of liberal arts libraries, and if so, is that transition revolutionary or evolutionary? Liberal arts librarians, as they have with numerous other shifts and trends in librarianship, can turn to models in the literature from research universities, develop communities of practice amongst themselves, and also innovate from within their own unique contexts. The authors argue that such collaboration and innovation reflect an evolutionary process as librarians build on existing skills, strategies, workflows, and knowledge. The following pages of this chapter survey the current environment, offer case studies from two small liberal arts institutions, the College of Saint Benedict/Saint John’s University and Carleton College, and provide readers with recommended action steps to develop a path of gradual, manageable, shared, and sustainable work in research data management
    • …
    corecore