1,128,020 research outputs found
A practitioners guide to managing geoscience information
In the UK the Natural Environment Research Council manages its scientific data holdings through a series of Environmental Data Centres1
Within the Earth Science sector the National Geoscience Data Centre covering Atmosphere, Bioinformatics, Earth Sciences, Earth Observation, Hydrology, Marine Science and Polar Science.
2
- Risk Reduction; (NGDC), a component of the British Geological Survey (BGS), is responsible for managing the geosciences data resource. The purpose of the NGDC is to maintain the national geoscience database and to ensure efficient and effective delivery by providing geoscientists with ready to access data and information that is timely, fit for purpose, and in which the user has confidence. The key benefits that NERC derives from this approach are:
- Increased Productivity; and
- Higher Quality Science.
The paper briefly describes the key benefits of managing geoscientific information effectively and describes how these benefits are realised within the NGDC and BGS
Earth Observations in Social Science Research for Management of Natural Resources and the Environment: Identifying the Contribution of the U.S. Land Remote Sensing (Landsat) Program
This paper surveys and describes the peer-reviewed social science literature in which data from the U.S. land remote sensing program, Landsat, inform public policy in managing natural resources and the environment. The Landsat program has provided the longest collection of observations of Earth from the vantage point of space. The paper differentiates two classes of research: methodology exploring how to use the data (for example, designing and testing algorithms or verifying the accuracy of the data) and applications of data to decisionmaking or policy implementation in managing land, air quality, water, and other natural and environmental resources. Selection of the studies uses social science-oriented bibliographic search indices and expands results of previous surveys that target only researchers specializing in remote sensing or photogrammetry. The usefulness of Landsat as a basis for informing public investment in the Landsat program will be underestimated if this body of research goes unrecognized.natural resources policy, environmental policy, Landsat, social science, environmental management
Recommended from our members
University of Massachusetts Amherst Response to Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research
Response to the Office of Science and Technology Policy\u27s Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting from Federally Funded Research.
Original call for public response is available at https://www.federalregister.gov/documents/2020/01/17/2020-00689/request-for-public-comment-on-draft-desirable-characteristics-of-repositories-for-managing-an
Data science: a game changer for science and innovation
AbstractThis paper shows data science's potential for disruptive innovation in science, industry, policy, and people's lives. We present how data science impacts science and society at large in the coming years, including ethical problems in managing human behavior data and considering the quantitative expectations of data science economic impact. We introduce concepts such as open science and e-infrastructure as useful tools for supporting ethical data science and training new generations of data scientists. Finally, this work outlines SoBigData Research Infrastructure as an easy-to-access platform for executing complex data science processes. The services proposed by SoBigData are aimed at using data science to understand the complexity of our contemporary, globally interconnected society
Enabling Interactive Analytics of Secure Data using Cloud Kotta
Research, especially in the social sciences and humanities, is increasingly
reliant on the application of data science methods to analyze large amounts of
(often private) data. Secure data enclaves provide a solution for managing and
analyzing private data. However, such enclaves do not readily support discovery
science---a form of exploratory or interactive analysis by which researchers
execute a range of (sometimes large) analyses in an iterative and collaborative
manner. The batch computing model offered by many data enclaves is well suited
to executing large compute tasks; however it is far from ideal for day-to-day
discovery science. As researchers must submit jobs to queues and wait for
results, the high latencies inherent in queue-based, batch computing systems
hinder interactive analysis. In this paper we describe how we have augmented
the Cloud Kotta secure data enclave to support collaborative and interactive
analysis of sensitive data. Our model uses Jupyter notebooks as a flexible
analysis environment and Python language constructs to support the execution of
arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing,
Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page
A Data Science Course for Undergraduates: Thinking with Data
Data science is an emerging interdisciplinary field that combines elements of
mathematics, statistics, computer science, and knowledge in a particular
application domain for the purpose of extracting meaningful information from
the increasingly sophisticated array of data available in many settings. These
data tend to be non-traditional, in the sense that they are often live, large,
complex, and/or messy. A first course in statistics at the undergraduate level
typically introduces students with a variety of techniques to analyze small,
neat, and clean data sets. However, whether they pursue more formal training in
statistics or not, many of these students will end up working with data that is
considerably more complex, and will need facility with statistical computing
techniques. More importantly, these students require a framework for thinking
structurally about data. We describe an undergraduate course in a liberal arts
environment that provides students with the tools necessary to apply data
science. The course emphasizes modern, practical, and useful skills that cover
the full data analysis spectrum, from asking an interesting question to
acquiring, managing, manipulating, processing, querying, analyzing, and
visualizing data, as well communicating findings in written, graphical, and
oral forms.Comment: 21 pages total including supplementary material
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Challenges in Complex Systems Science
FuturICT foundations are social science, complex systems science, and ICT.
The main concerns and challenges in the science of complex systems in the
context of FuturICT are laid out in this paper with special emphasis on the
Complex Systems route to Social Sciences. This include complex systems having:
many heterogeneous interacting parts; multiple scales; complicated transition
laws; unexpected or unpredicted emergence; sensitive dependence on initial
conditions; path-dependent dynamics; networked hierarchical connectivities;
interaction of autonomous agents; self-organisation; non-equilibrium dynamics;
combinatorial explosion; adaptivity to changing environments; co-evolving
subsystems; ill-defined boundaries; and multilevel dynamics. In this context,
science is seen as the process of abstracting the dynamics of systems from
data. This presents many challenges including: data gathering by large-scale
experiment, participatory sensing and social computation, managing huge
distributed dynamic and heterogeneous databases; moving from data to dynamical
models, going beyond correlations to cause-effect relationships, understanding
the relationship between simple and comprehensive models with appropriate
choices of variables, ensemble modeling and data assimilation, modeling systems
of systems of systems with many levels between micro and macro; and formulating
new approaches to prediction, forecasting, and risk, especially in systems that
can reflect on and change their behaviour in response to predictions, and
systems whose apparently predictable behaviour is disrupted by apparently
unpredictable rare or extreme events. These challenges are part of the FuturICT
agenda
Overview of the Kepler Science Processing Pipeline
The Kepler Mission Science Operations Center (SOC) performs several critical
functions including managing the ~156,000 target stars, associated target
tables, science data compression tables and parameters, as well as processing
the raw photometric data downlinked from the spacecraft each month. The raw
data are first calibrated at the pixel level to correct for bias, smear induced
by a shutterless readout, and other detector and electronic effects. A
background sky flux is estimated from ~4500 pixels on each of the 84 CCD
readout channels, and simple aperture photometry is performed on an optimal
aperture for each star. Ancillary engineering data and diagnostic information
extracted from the science data are used to remove systematic errors in the
flux time series that are correlated with these data prior to searching for
signatures of transiting planets with a wavelet-based, adaptive matched filter.
Stars with signatures exceeding 7.1 sigma are subjected to a suite of
statistical tests including an examination of each star's centroid motion to
reject false positives caused by background eclipsing binaries. Physical
parameters for each planetary candidate are fitted to the transit signature,
and signatures of additional transiting planets are sought in the residual
light curve. The pipeline is operational, finding planetary signatures and
providing robust eliminations of false positives.Comment: 8 pages, 3 figure
Modeling Mortality Data: A Case Study in Data Management for Computer Science Research
This poster presents the results of a case study completed as part of the Simmons College course Scientific Research Data Management. The case study focuses on research conducted by a computer science laboratory on factors that influence patient mortality, using topic modeling and other computational techniques. Based on the research narrative, data management practices and needs specific to computer science research are mapped to the New England Collaborative Data Management Curriculum Modules for Managing Research Data. These areas of focus will help information professionals to identify the data management challenges presented by computer science research, as well as the tools and techniques recommended to create an effective data management plan. Overall, this case study demonstrates the importance of valuing and managing source code as data, in order to ensure reproducibility of results and open access to data
- …