10 research outputs found

    Workshop Report: Container Based Analysis Environments for Research Data Access and Computing

    Get PDF
    Report of the first workshop on Container Based Analysis Environments for Research Data Access and Computing supported by the National Data Service and Data Exploration Lab and held at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign

    A CyberGIS Integration and Computation Framework for High‐Resolution Continental‐Scale Flood Inundation Mapping

    Get PDF
    We present a Digital Elevation Model (DEM)-based hydrologic analysis methodology for continental flood inundation mapping (CFIM), implemented as a cyberGIS scientific workflow in which a 1/3rd arc-second (10m) Height Above Nearest Drainage (HAND) raster data for the conterminous U.S. (CONUS) was computed and employed for subsequent inundation mapping. A cyberGIS framework was developed to enable spatiotemporal integration and scalable computing of the entire inundation mapping process on a hybrid supercomputing architecture. The first 1/3rd arc-second CONUS HAND raster dataset was computed in 1.5 days on the CyberGIS ROGER supercomputer. The inundation mapping process developed in our exploratory study couples HAND with National Water Model (NWM) forecast data to enable near real-time inundation forecasts for CONUS. The computational performance of HAND and the inundation mapping process was profiled to gain insights into the computational characteristics in high-performance parallel computing scenarios. The establishment of the CFIM computational framework has broad and significant research implications that may lead to further development and improvement of flood inundation mapping methodologies

    Toward Open and Reproducible Environmental Modeling by Integrating Online Data Repositories, Computational Environments, and Model Application Programming Interfaces

    Get PDF
    Cyberinfrastructure needs to be advanced to enable open and reproducible environmental modeling research. Recent efforts toward this goal have focused on advancing online repositories for data and model sharing, online computational environments along with containerization technology and notebooks for capturing reproducible computational studies, and Application Programming Interfaces (APIs) for simulation models to foster intuitive programmatic control. The objective of this research is to show how these efforts can be integrated to support reproducible environmental modeling. We present first the high-level concept and general approach for integrating these three components. We then present one possible implementation that integrates HydroShare (an online repository), CUAHSI JupyterHub and CyberGIS-Jupyter for Water (computational environments), and pySUMMA (a model API) to support open and reproducible hydrologic modeling. We apply the example implementation for a hydrologic modeling use case to demonstrate how the approach can advance reproducible environmental modeling through the seamless integration of cyberinfrastructure services

    Enabling collaborative numerical modeling in earth sciences using knowledge infrastructure

    Get PDF
    Knowledge Infrastructure is an intellectual framework for creating, sharing, and distributing knowledge. In this paper, we use Knowledge Infrastructure to address common barriers to entry to numerical modeling in Earth sciences: computational modeling education, replicating published model results, and reusing published models to extend research. We outline six critical functional requirements: 1) workflows designed for new users; 2) a community-supported collaborative web platform; 3) distributed data storage; 4) a software environment; 5) a personalized cloud-based high-performance computing platform; and 6) a standardized open source modeling framework. Our methods meet these functional requirements by providing three interactive computational narratives for hands-on, problem-based research demonstrating how to use Landlab on HydroShare. Landlab is an open-source toolkit for building, coupling, and exploring two-dimensional numerical models. HydroShare is an online collaborative environment for the sharing of data and models. We describe the methods we are using to accelerate knowledge development by providing a suite of modular and interoperable process components that allows students, domain experts, collaborators, researchers, and sponsors to learn by exploring shared data and modeling resources. The system is designed to support uses on the continuum from fully-developed modeling applications to prototyping research software tools

    Diseño e implementación de un módulo Python para representar datos geográficos en Dataframes

    Get PDF
    Es extremadamente difícil obtener información útil de datos espaciales utilizando tablas como forma de representación, en este proyecto se han combinado estas tareas. Por lo que a partir de tablas, usando Dataframes de entrada (con una cantidad significante de datos), se ha encontrado la manera de representar sus valores mediante datos espaciales en un mapa (utilizando la librería Folium de Python). Este proyecto consiste en crear un módulo en Python para la representación geográfica de datos de manera visual. Además de ser código abierto, destaca la facilidad de su uso. Durante el desarrollo del módulo se fusionan dos soluciones que plantea Python, el análisis espacial y la gestión de grandes volúmenes de datos, por lo que puede ser empleado en empresas de Big Data. Este módulo trata de crear una librería Python la cual de manera sencilla es capaz de transformar un conjunto grande de datos de entrada en representaciones geográficas de dichos datos. Esta librería engloba unas funciones concretas que tienen la capacidad de ser adaptables a la representación de estadísticas y estudios geográficos. Las funciones de las que consta la librería descrita son: • Representaciones de Dataframes como puntos de interés. • Representación de Dataframes en mapas cloropléticos. • Representación de Dataframes con URLs de Google Maps. • Representaciones de Dataframes en Geohash. Python se introdujo en el mundo de los sistemas de información geográfica (SIG) como un lenguaje de programación relativamente fácil de aprender y de utilizar. Este lenguaje con el paso del tiempo y por su clara expansión, se ha vuelto omnipresente, ofreciendo soluciones para distintos tipos de usuarios. Todo esto, se realiza a través de la creación de un módulo Python que lo combinará en un solo bloque, transportable a cualquier navegador y a cualquier dispositivo móvil. Este módulo está completamente basado en varias librerías muy importantes en Python: Numpy, Pandas y Folium. Principalmente Leaflet es la base de este proyecto, está basada en una librería JavaScript que nos permite publicar mapas, estadísticas y gráficos en la web de forma rápida y eficaz. Mediante Folium se ha conseguido unir ambos mundos, para manipular datos en Python y visualizar grandes volúmenes de datos para una representación SIG. Visualizar datos espaciales utilizando mapas, tiene sus ventajas ya que se puede obtener una representación visual de la ubicación exacta de los puntos, conjunto de datos geográficos, esto nos permite relacionar fácilmente los puntos que tenemos con el mundo real. También nos permite generar perspectivas geográficas a partir del conjunto de datos y posiciones que tenemos. El trazado de datos espaciales en un mapa nos permite obtener información geográfica de un modo que no podremos obtener con otras formas de representación, ni mediante otros tipos de gráficos. Por eso, el uso de mapas en lugar de otras formas de gráficos nos permite resaltar tendencias, estadísticas, descubrir patrones y revelar realidades no visibles con representaciones de datos espaciales. Enseñándonos una claridad sobre los datos. Como declaró Alberto Cairo en su libro “El arte funcional: una introducción a los gráficos y la visualización de la información”: “Los gráficos no deben simplificar los mensajes. Deben aclararlos, resaltar tendencias, descubrir patrones y revelar realidades que antes no se veían”Ingeniería de Sistemas Audiovisuale

    Detection and analysis of thermokarst related landscape processes using temporally and spatially high-resolution Planet Cube Sat data

    Get PDF
    This Master’s thesis provides an overview of the methods to automatically detect different landscape processes in thermokarst areas. As the Arctic region is vulnerable to climate change, different developments lead to a fast-changing landscape. Especially three different processes are of interest: coastal erosion, retrogressive thaw slumps and thermokarst lakes. To detect the influence of these processes on the environment, different methods have been tested. As a basis for the evaluation Planet images were used. This data is acquired via nanosatellites with a resolution of 3 meters. Due to their small size and low cost, more than 200 active satellites are in the orbit monitoring the entire earth daily. With the help of a Python script, an automatic detection is possible, which leads to a classification afterwards. This is followed by an evaluation of the generated data. This data showed that the coast of Alaska is eroded at over 20 meters per year, the growth rate of thaw slumps in Noatak Valley exceed more than 25 meters per year. Surprisingly, thermokarst lakes in Siberia tend to be stable and no drainage could be detected

    IDENTIFYING AN OPTIMIZATION TECHNIQUE FOR MAKER USAGE TO ADDRESS COVID-19 SUPPLY SHORTFALLS

    Get PDF
    Fused Deposition Modeling (FDM) can be purchased for under five hundred dollars. The availability of these inexpensive systems has created a large hobbyist (or maker) community. For makers, FDM printing is used numerous uses. With the onset of the COVID-19 pandemic, the needs for Personal Protective Equipment (PPE) skyrocketed. COVID-19 mitigation strategies such as social distancing, businesses closures, and shipping delays created significant supply shortfalls. The maker community stepped in to fill gaps in PPE supplies. In the case of 3DP, optimization remains the domain of commercial entities. Optimization is, at best, ad-hoc for makers. With the need to PPE supplies and COVID-19 related supply delays, optimization techniques would be of great value to makers. The objective functions in this research is throughput and cost with quality factored into both. There are several parameters common to both throughput and surface roughness, including layer thickness, print speed, infill density, raster width, and wall thickness. This research will utilize a 2-level fractional factorial design, in which process parameter had a specified upper (+1) and lower (-1) level. By using the upper and lower limits, this study will more closely align with the common maker workflow. The design will have a total of 16 trials, no main effect or 2-factor interactions are confounded with any other main effect or 2-factor interactions, this will allow the parameters to be estimated separately from one another without the requirement for conducting a full factorial (32 trials). Least Squares Regression (OLS) will be completed on throughput and cost independently. Quality will be considered a component of both. For example, an OLS will be completed for the throughput to determine the respective effects of the process parameters on throughput. Using a 95% confidence interval, a process parameter with a P-value smaller that .05 will show that the process parameter has a significant effect on the throughput. Upon completion of each OLS model -Contraint methodology will be used to jointly optimize the process parameters. Validation trials will be completed to test the optimized process parameters. The results will be documented and discussed

    Evaluating and Enabling Scalable High Performance Computing Workloads on Commercial Clouds

    Get PDF
    Performance, usability, and accessibility are critical components of high performance computing (HPC). Usability and performance are especially important to academic researchers as they generally have little time to learn a new technology and demand a certain type of performance in order to ensure the quality and quantity of their research results. We have observed that while not all workloads run well in the cloud, some workloads perform well. We have also observed that although commercial cloud adoption by industry has been growing at a rapid pace, its use by academic researchers has not grown as quickly. We aim to help close this gap and enable researchers to utilize the commercial cloud more efficiently and effectively. We present our results on architecting and benchmarking an HPC environment on Amazon Web Services (AWS) where we observe that there are particular types of applications that are and are not suited for the commercial cloud. Then, we present our results on architecting and building a provisioning and workflow management tool (PAW), where we developed an application that enables a user to launch an HPC environment in the cloud, execute a customizable workflow, and after the workflow has completed delete the HPC environment automatically. We then present our results on the scalability of PAW and the commercial cloud for compute intensive workloads by deploying a 1.1 million vCPU cluster. We then discuss our research into the feasibility of utilizing commercial cloud infrastructure to help tackle the large spikes and data-intensive characteristics of Transportation Cyberphysical Systems (TCPS) workloads. Then, we present our research in utilizing the commercial cloud for urgent HPC applications by deploying a 1.5 million vCPU cluster to process 211TB of traffic video data to be utilized by first responders during an evacuation situation. Lastly, we present the contributions and conclusions drawn from this work

    FROM RECHARGE TO REEF: ASSESSING THE SOURCES, QUANTITY, AND TRANSPORT OF GROUNDWATER ON TUTUILA ISLAND, AMERICAN SAMOA

    Get PDF
    Ph.D.Ph.D. Thesis. University of Hawaiʻi at Mānoa 201

    Web service-based exploration of Earth Observation time-series data for analyzing environmental changes

    Get PDF
    The increasing amount of Earth observation (EO) data requires a tremendous change, in order to property handle the number of observations and storage size thereof. Due to open data strategies and the increasing size of data archives, a new market has been developed to provide analysis and application-ready data, services, and platforms. It is not only scientists and geospatial processing specialists who work with EO data; stakeholders, thematic experts, and software developers do too. There is thus a great demand for improving the discovery, access, and analysis of EO data in line with new possibilities of web-based infrastructures. With the aim of bridging the gap between users and EO data archives, various topics have been researched: 1) user requirements and their relation to web services and output formats; 2) technical requirements for the discovery and access of multi-source EO time-series data, and 3) management of EO time-series data focusing on application-ready data. Web services for EO data discovery and access, time-series data processing, and EO platforms have been reviewed and related to the requirements of users. The diversity of data providers and web services requires specific knowledge of systems and specifications. Although service specifications for the discovery of EO data exist, improvements are still necessary to meet the requirements of different user personas. For the processing of EO time-series data, various data formats and processing steps need to be handled. Still, there remains a gap between EO time-series data access and analysis tools, which needs to be addressed to simplify work with such data. Within this thesis, web services for the discovery, access, and analysis of EO time-series data have been described and evaluated based on different user requirements. Standardized web services specifications, output and data formats are proposed, introduced and described to meet the needs of the different user personas
    corecore