68 research outputs found
GIS and Data: Making Space @ MIT: Development of the MIT Libraries GIS and Data Lab
The objective of this presentation is to: understand the MIT context for GIS (geographic information system) and RDM (research data management) services; follow the development of the GIS and Data Lab space; understand the space assessment goals and results; and, explore future plans.
Keynote presentation at the National Network of Libraries of Medicine New England Region e-Science Forum, Marlborough, MA, USA, on March 29, 2019
Recommended from our members
From Data Citation to Scholarly Impact: Marking a Path and Clearing a Way for Access and Analysis
Starting from Mooney and Newtonâs work on data citation (http://dx.doi.org/10.7710/2162-3309.1035) we decided to examine what happens to a data set after it is set on its path as a piece of scholarly communication. Briefly reviewing a selection of datasets, repositories, and platforms we found an uneven application of commonly accepted standards. Although guides for repositories such as Dryad or identifier registrars such as DataCite recommend inclusion of the key elements of Author, Title, Published date, and Publisher, there are two notable trends: to leave off the Material designator, leading to confusion when differentiating between data sets, and other data publications, and articles; and to unify the Electronic retrieval location and Persistent identifier. To encourage scholars to develop and cite data, consistent practices must be promulgated for data citation and indexing
Recommended from our members
Columbiaâs Evolving Research Data Storage Strategy
The Academic Commons repository hosted by the Columbia University Libraries / Information Servicesâ(CUL/IS) Center for Digital Research and Scholarship (CDRS) continues to not only collect and preserve the scholarship of the faculty, but also to make it accessible through search and discovery tools. The scholarly products preserved by Academic Commons are not limited to datasets and raw data, but may additionally include works based on those data such as articles, book chapters, essays, monographs, working papers, technical reports, conference presentations, multimedia creations (e.g., simulations, three-dimensional maps), and other materials in digital formats. As the scale and types of data being generated by research are constantly, and respectively, growing and evolving, CUL/IS is developing policies, procedures, communications, and training to accommodate these transformations
Building Blocks: Laying the Foundation for a Research Data Management Program
Establishing a research data management (RDM) program has become a pressing imperative for many research libraries, but relatively few have a program in place. The challenges are many; these include learning about RDM principles and issues, assessing the local institution's greatest needs, selecting and implementing a repository environment, working with researchers to convey the importance of this work, preparing training materials, building expertise among library staff, and establishing metadata guidelines.Building Blocks offers detailed guidance at two levels: Part 1, Laying the Foundation, is directed at institutions that have yet to begin implementation, with the objective of guiding them through the steps necessary to establish a firm, supportive foundation on which to build. Part 2, Building Up and Out, is for those who are somewhat further along and ready to create the structure of a full RDM program.In addition to guiding readers through the full array of stages in building a program, Building Blocks includes more than 100 citations to resources that implementers can learn from and leverage.This work is part of our research collections and support efforts to inform current thinking about research collections and the emerging services that libraries are offering to support contemporary modes of scholarship. We are encouraging the development of new ways for libraries to build and provide these types of collections and deliver distinctive services
Skills, Standards, and Sapp Nelson\u27s Matrix: Evaluating Research Data Management Workshop Offerings
Objective: To evaluate library workshops on their coverage of data management topics.
Methods: We used a modified version of Sapp Nelsonâs Competency Matrix for Data Management Skills, a matrix of learning goals organized by data management competency and complexity level, against which we compared our educational materials: slide decks and worksheets. We examined each of the educational materials against the 333 learning objectives in our modified version of the Matrix to determine which of the learning objectives applied.
Conclusions: We found it necessary to change certain elements of the Matrixâs structure to increase its clarity and functionality: reinterpreting the âbehaviors,â shifting the organization from the three domains of Bloomâs taxonomy to increasing complexity solely within the cognitive domain, as well as creating a comprehensive identifier schema. We appreciated the Matrix for its specificity of learning objectives, its organizational structure, the comprehensive range of competencies included, and its ease of use. On the whole, the Matrix is a useful instrument for the assessment of data management programming
Improving Discovery of and Access to Digital Repository Contents Using Semantic Web Standards: Columbia Universityâs Academic Commons
This article describes the progress made towards developing Academic Commons (AC), Columbia Universityâs digital repository, as an interoperable repository through the use of RDF and non-RDF Semantic Web technologies. Approaches taken include the implementation of microdata to add semantic markup to HTML content; a collaboration with Oregon State Universityâs (OSU) digital repository, ScholarsArchive@OSU (SA@OSU), to implement an application that indexes RDF data from OSU for use in AC; as well as an exploration of the recently released MODS RDF
Addressing the Gaps: Recommendations for Supporting the Long Tail of Research Data
Horstmann W, Nurnberger A, Shearer K, Wolski M. Addressing the Gaps: Recommendations for Supporting the Long Tail of Research Data. Research Data Alliance; 2017.Major societal challenges such as health, climate change, energy, food availability, migration and peace depend on the contributions of a distributed and diverse international network of researchers and subject experts. The aim of open science is to improve the accessibility of research outputs, including articles, data and other research objects, so that researchers, industry and the public can make use of, build on, and ensure the validity of these research outputs.
Among research outputs, research data are often the most diverse - as diverse as the international network of experts that perform research. Datasets may be small or large, simple or complex, structured or unstructured. Data may stem from hundreds of different subjects, may be produced by numerous methodologies, and exist in a plethora of different formats. The diversity of data is also characterized by a variety of data management practices, of varying quality and comprehensiveness. Historically, large structured datasets in well-established disciplines are more likely to adopt unified and standardized formats that are disciplinarily defined and accepted. Similarly well established disciplines tend to have common and understood workflows, where as in the long tail of research it is not unusual for researchers to use a variety of tools and to develop ad-hoc data workflows. Long tail datasets, on the other hand, which vary radically in source, discipline, size, subject, provenance, funding, format, longevity, location and complexity, are less likely to adhere to common standards. The wide distribution and diversity of long-tail data means that ensuring such data is discoverable and stored in appropriate formats with relevant curation and metadata to facilitate reuse is challenging, and that these data have received less attention historically. Furthermore, the terms used to refer to long tail data, e.g. âsmall dataâ, âlegacy dataâ or âorphan dataâ have contributed to diminishing the perceived importance of such data.
Considering that a large portion of research datasets (and associated research funding) are found in the long tail, it is paramount that we address the specific and unique data management challenges for this data. The risks of neglecting long-tail data are real and significant. These include both limiting the reproducibility, transparency, and verifiability of research results, and RDA Long Tail of Research Data Interest Group unnecessary costs associated with the duplication of research data. Moreover, the potential benefits for reuse are significantly reduced.
The Research Data Alliance (RDA) âLong Tail of Research Data Interest Groupâ has been assessing the situation of long tail data over the last three years, and urges the broader community to consider the risks and opportunities related to long-tail data. This document provides seven recommendations for a variety of stakeholders, including governments, funders, research institutions and researchers to help improve the current approach to managing long tail data. We call on the community to work together to create necessary and sufficient conditions to ensure we are able to properly steward these valuable research outputs for future generations of researchers
Connecting Data Publication to the Research Workflow: A Preliminary Analysis
The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects âFAIRâ, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity âupstreamâ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchersâ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the âFAIRnessâ of research data publication workflows themselves
Environmental Math in the Classroom: What Do Your Walls Say?
This article shares a template and framework to help teachers see their classrooms in new ways by thinking about Environmental Math, a type of environmental print. Who, what, where, when, and how do the walls in your classroom talk about mathematics?
Environment And Genetics in Lung cancer Etiology (EAGLE) study: An integrative population-based case-control study of lung cancer
Background: Lung cancer is the leading cause of cancer mortality worldwide. Tobacco smoking is its primary cause, and yet the precise molecular alterations induced by smoking in lung tissue that lead to lung cancer and impact survival have remained obscure. A new framework of research is needed to address the challenges offered by this complex disease.
Methods/Design: We designed a large population-based case-control study that combines a traditional molecular epidemiology design with a more integrative approach to investigate the dynamic process that begins with smoking initiation, proceeds through dependency/smoking persistence, continues with lung cancer development and ends with progression to disseminated disease or response to therapy and survival. The study allows the integration of data from multiple sources in the same subjects (risk factors, germline variation, genomic alterations in tumors, and clinical endpoints) to tackle the disease etiology from different angles. Before beginning the study, we conducted a phone survey and pilot investigations to identify the best approach to ensure an acceptable participation in the study from cases and controls. Between 2002 and 2005, we enrolled 2101 incident primary lung cancer cases and 2120 population controls, with 86.6% and 72.4% participation rate, respectively, from a catchment area including 216 municipalities in the Lombardy region of Italy. Lung cancer cases were enrolled in 13 hospitals and population controls were randomly sampled from the area to match the cases by age, gender and residence. Detailed epidemiological information and biospecimens were collected from each participant, and clinical data and tissue specimens from the cases. Collection of follow-up data on treatment and survival is ongoing.
Discussion: EAGLE is a new population-based case-control study that explores the full spectrum of lung cancer etiology, from smoking addiction to lung cancer outcome, through examination of epidemiological, molecular, and clinical data. We have provided a detailed description of the study design, field activities, management, and opportunities for research following this integrative approach, which allows a sharper and more comprehensive vision of the complex nature of this disease. The study is poised to accelerate the emergence of new preventive and therapeutic strategies with potentially enormous impact on public health
- âŠ