1,128,020 research outputs found

    A practitioners guide to managing geoscience information

    Get PDF
    In the UK the Natural Environment Research Council manages its scientific data holdings through a series of Environmental Data Centres1 Within the Earth Science sector the National Geoscience Data Centre covering Atmosphere, Bioinformatics, Earth Sciences, Earth Observation, Hydrology, Marine Science and Polar Science. 2 - Risk Reduction; (NGDC), a component of the British Geological Survey (BGS), is responsible for managing the geosciences data resource. The purpose of the NGDC is to maintain the national geoscience database and to ensure efficient and effective delivery by providing geoscientists with ready to access data and information that is timely, fit for purpose, and in which the user has confidence. The key benefits that NERC derives from this approach are: - Increased Productivity; and - Higher Quality Science. The paper briefly describes the key benefits of managing geoscientific information effectively and describes how these benefits are realised within the NGDC and BGS

    Earth Observations in Social Science Research for Management of Natural Resources and the Environment: Identifying the Contribution of the U.S. Land Remote Sensing (Landsat) Program

    Get PDF
    This paper surveys and describes the peer-reviewed social science literature in which data from the U.S. land remote sensing program, Landsat, inform public policy in managing natural resources and the environment. The Landsat program has provided the longest collection of observations of Earth from the vantage point of space. The paper differentiates two classes of research: methodology exploring how to use the data (for example, designing and testing algorithms or verifying the accuracy of the data) and applications of data to decisionmaking or policy implementation in managing land, air quality, water, and other natural and environmental resources. Selection of the studies uses social science-oriented bibliographic search indices and expands results of previous surveys that target only researchers specializing in remote sensing or photogrammetry. The usefulness of Landsat as a basis for informing public investment in the Landsat program will be underestimated if this body of research goes unrecognized.natural resources policy, environmental policy, Landsat, social science, environmental management

    Data science: a game changer for science and innovation

    Get PDF
    AbstractThis paper shows data science's potential for disruptive innovation in science, industry, policy, and people's lives. We present how data science impacts science and society at large in the coming years, including ethical problems in managing human behavior data and considering the quantitative expectations of data science economic impact. We introduce concepts such as open science and e-infrastructure as useful tools for supporting ethical data science and training new generations of data scientists. Finally, this work outlines SoBigData Research Infrastructure as an easy-to-access platform for executing complex data science processes. The services proposed by SoBigData are aimed at using data science to understand the complexity of our contemporary, globally interconnected society

    Enabling Interactive Analytics of Secure Data using Cloud Kotta

    Full text link
    Research, especially in the social sciences and humanities, is increasingly reliant on the application of data science methods to analyze large amounts of (often private) data. Secure data enclaves provide a solution for managing and analyzing private data. However, such enclaves do not readily support discovery science---a form of exploratory or interactive analysis by which researchers execute a range of (sometimes large) analyses in an iterative and collaborative manner. The batch computing model offered by many data enclaves is well suited to executing large compute tasks; however it is far from ideal for day-to-day discovery science. As researchers must submit jobs to queues and wait for results, the high latencies inherent in queue-based, batch computing systems hinder interactive analysis. In this paper we describe how we have augmented the Cloud Kotta secure data enclave to support collaborative and interactive analysis of sensitive data. Our model uses Jupyter notebooks as a flexible analysis environment and Python language constructs to support the execution of arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing, Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page

    A Data Science Course for Undergraduates: Thinking with Data

    Get PDF
    Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be non-traditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students with a variety of techniques to analyze small, neat, and clean data sets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that is considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms.Comment: 21 pages total including supplementary material

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    Challenges in Complex Systems Science

    Get PDF
    FuturICT foundations are social science, complex systems science, and ICT. The main concerns and challenges in the science of complex systems in the context of FuturICT are laid out in this paper with special emphasis on the Complex Systems route to Social Sciences. This include complex systems having: many heterogeneous interacting parts; multiple scales; complicated transition laws; unexpected or unpredicted emergence; sensitive dependence on initial conditions; path-dependent dynamics; networked hierarchical connectivities; interaction of autonomous agents; self-organisation; non-equilibrium dynamics; combinatorial explosion; adaptivity to changing environments; co-evolving subsystems; ill-defined boundaries; and multilevel dynamics. In this context, science is seen as the process of abstracting the dynamics of systems from data. This presents many challenges including: data gathering by large-scale experiment, participatory sensing and social computation, managing huge distributed dynamic and heterogeneous databases; moving from data to dynamical models, going beyond correlations to cause-effect relationships, understanding the relationship between simple and comprehensive models with appropriate choices of variables, ensemble modeling and data assimilation, modeling systems of systems of systems with many levels between micro and macro; and formulating new approaches to prediction, forecasting, and risk, especially in systems that can reflect on and change their behaviour in response to predictions, and systems whose apparently predictable behaviour is disrupted by apparently unpredictable rare or extreme events. These challenges are part of the FuturICT agenda

    Overview of the Kepler Science Processing Pipeline

    Full text link
    The Kepler Mission Science Operations Center (SOC) performs several critical functions including managing the ~156,000 target stars, associated target tables, science data compression tables and parameters, as well as processing the raw photometric data downlinked from the spacecraft each month. The raw data are first calibrated at the pixel level to correct for bias, smear induced by a shutterless readout, and other detector and electronic effects. A background sky flux is estimated from ~4500 pixels on each of the 84 CCD readout channels, and simple aperture photometry is performed on an optimal aperture for each star. Ancillary engineering data and diagnostic information extracted from the science data are used to remove systematic errors in the flux time series that are correlated with these data prior to searching for signatures of transiting planets with a wavelet-based, adaptive matched filter. Stars with signatures exceeding 7.1 sigma are subjected to a suite of statistical tests including an examination of each star's centroid motion to reject false positives caused by background eclipsing binaries. Physical parameters for each planetary candidate are fitted to the transit signature, and signatures of additional transiting planets are sought in the residual light curve. The pipeline is operational, finding planetary signatures and providing robust eliminations of false positives.Comment: 8 pages, 3 figure

    Modeling Mortality Data: A Case Study in Data Management for Computer Science Research

    Get PDF
    This poster presents the results of a case study completed as part of the Simmons College course Scientific Research Data Management. The case study focuses on research conducted by a computer science laboratory on factors that influence patient mortality, using topic modeling and other computational techniques. Based on the research narrative, data management practices and needs specific to computer science research are mapped to the New England Collaborative Data Management Curriculum Modules for Managing Research Data. These areas of focus will help information professionals to identify the data management challenges presented by computer science research, as well as the tools and techniques recommended to create an effective data management plan. Overall, this case study demonstrates the importance of valuing and managing source code as data, in order to ensure reproducibility of results and open access to data
    • …
    corecore