Search CORE

28 research outputs found

Limiting Data Friction by Reducing Data Download Using Spatiotemporally Aligned Data Organization Through STARE

Author: Kuo Kwo-Sen
Rilee Michael Lee
Publication venue
Publication date
Field of study

Current data processing practice limits the volume and variety of relevant geoscience data that can practically be applied to important problems. File archives in centralized data centers are the principal means by which Earth Science data are accessed. This approach, however, requires laborious search, retrieval, and eventual customization/adaptation for the data to be used. Such fractionation makes it even more difficult to share outcomes, i.e. research artifacts and data products, hampering reusability and repeatability, since end users generally have their own research agenda and preferences as well as scarce resources. Thus, while finding and downloading data files from central data centers are already costly for end users working in their own field, using data products from other disciplines rapidly becomes prohibitive. This curtails scientific productivity, limits avenues of study, and endangers quality and reproducibility. The Spatio-Temporal Adaptive Resolution Encoding (STARE) is a unifying scheme that facilitates the indexing, access, and fusion of diverse Earth Science data. STARE implements an innovative encoding of geo-spatiotemporal information, originally developed for aligning datasets with diverse spatiotemporal characteristics in an array database. The spatial component of STARE recursively quadfurcates a root polyhedron, producing a hierarchical scheme for addressing geographic locations and regions. The temporal component of STARE uses conventional date-time units as an indexing hierarchy. The additional encoding of spatial and temporal resolution information in STARE enables comparisons and conditional selections across diverse datasets. Moreover, spatiotemporal set-operations, e.g. union and intersection, are mapped to efficient integer operations with STARE. Applied to existing data models (point, grid, spacecraft swath) and corresponding granules, STARE indexes provide a streamlined description usable as geo-spatiotemporal metadata. When coupled with large scale, distributed hardware and software, STARE-based data access reduces pre-analysis data preparation costs by offering a convenient means to align different datasets spatiotemporally without specialized effort in parallel computing or distributed data management

NASA Technical Reports Server

An Innovative Infrastructure with a Universal Geo-Spatiotemporal Data Representation Supporting Cost-Effective Integration of Diverse Earth Science Data

Author: Kuo Kwo-Sen
Rilee Michael Lee
Publication venue
Publication date
Field of study

The SpatioTemporal Adaptive Resolution Encoding (STARE) is a unifying scheme encoding geospatial and temporal information for organizing data on scalable computing/storage resources, minimizing expensive data transfers. STARE provides a compact representation that turns set-logic functions into integer operations, e.g. conditional sub-setting, taking into account representative spatiotemporal resolutions of the data in the datasets. STARE geo-spatiotemporally aligns data placements of diverse data on massive parallel resources to maximize performance. Automating important scientific functions (e.g. regridding) and computational functions (e.g. data placement) allows scientists to focus on domain-specific questions instead of expending their efforts and expertise on data processing. With STARE-enabled automation, SciDB (Scientific Database) plus STARE provides a database interface, reducing costly data preparation, increasing the volume and variety of interoperable data, and easing result sharing. Using SciDB plus STARE as part of an integrated analysis infrastructure dramatically eases combining diametrically different datasets

NASA Technical Reports Server

Collaborative WorkBench for Researchers - Work Smarter, Not Harder

Author: Kuo Kwo-sen
Lynnes Christopher
Maskey Manil
Ramachandran Rahul
Publication venue
Publication date
Field of study

It is important to define some commonly used terminology related to collaboration to facilitate clarity in later discussions. We define provisioning as infrastructure capabilities such as computation, storage, data, and tools provided by some agency or similarly trusted institution. Sharing is defined as the process of exchanging data, programs, and knowledge among individuals (often strangers) and groups. Collaboration is a specialized case of sharing. In collaboration, sharing with others (usually known colleagues) is done in pursuit of a common scientific goal or objective. Collaboration entails more dynamic and frequent interactions and can occur at different speeds. Synchronous collaboration occurs in real time such as editing a shared document on the fly, chatting, video conference, etc., and typically requires a peer-to-peer connection. Asynchronous collaboration is episodic in nature based on a push-pull model. Examples of asynchronous collaboration include email exchanges, blogging, repositories, etc. The purpose of a workbench is to provide a customizable framework for different applications. Since the workbench will be common to all the customized tools, it promotes building modular functionality that can be used and reused by multiple tools. The objective of our Collaborative Workbench (CWB) is thus to create such an open and extensible framework for the Earth Science community via a set of plug-ins. Our CWB is based on the Eclipse [2] Integrated Development Environment (IDE), which is designed as a small kernel containing a plug-in loader for hundreds of plug-ins. The kernel itself is an implementation of a known specification to provide an environment for the plug-ins to execute. This design enables modularity, where discrete chunks of functionality can be reused to build new applications. The minimal set of plug-ins necessary to create a client application is called the Eclipse Rich Client Platform (RCP) [3]; The Eclipse RCP also supports thousands of community-contributed plug-ins, making it a popular development platform for many diverse applications including the Science Activity Planner developed at JPL for the Mars rovers [4] and the scientific experiment tool Gumtree [5]. By leveraging the Eclipse RCP to provide an open, extensible framework, a CWB supports customizations via plug-ins to build rich user applications specific for Earth Science. More importantly, CWB plug-ins can be used by existing science tools built off Eclipse such as IDL or PyDev to provide seamless collaboration functionalities

NASA Technical Reports Server

An Application Programming Interface for Synthetic Snowflake Particle Structure and Scattering Data

Author: Kuo Kwo-Sen
Lammers Matthew
Publication venue
Publication date
Field of study

The work by Kuo and colleagues on growing synthetic snowflakes and calculating their single-scattering properties has demonstrated great potential to improve the retrievals of snowfall. To grant colleagues flexible and targeted access to their large collection of sizes and shapes at fifteen (15) microwave frequencies, we have developed a web-based Application Programming Interface (API) integrated with NASA Goddard's Precipitation Processing System (PPS) Group. It is our hope that the API will enable convenient programmatic utilization of the database. To help users better understand the API's capabilities, we have developed an interactive web interface called the OpenSSP API Query Builder, which implements an intuitive system of mechanisms for selecting shapes, sizes, and frequencies to generate queries, with which the API can then extract and return data from the database. The Query Builder also allows for the specification of normalized particle size distributions by setting pertinent parameters, with which the API can also return mean geometric and scattering properties for each size bin. Additionally, the Query Builder interface enables downloading of raw scattering and particle structure data packages. This presentation will describe some of the challenges and successes associated with developing such an API. Examples of its usage will be shown both through downloading output and pulling it into a spreadsheet, as well as querying the API programmatically and working with the output in code

NASA Technical Reports Server

Leveraging Data Intensive Computing to Support Automated Event Services

Author: Clune Thomas L.
Freeman Shawn M.
Kuo Kwo-Sen
Publication venue
Publication date
Field of study

A large portion of Earth Science investigations is phenomenon- or event-based, such as the studies of Rossby waves, mesoscale convective systems, and tropical cyclones. However, except for a few high-impact phenomena, e.g. tropical cyclones, comprehensive records are absent for the occurrences or events of these phenomena. Phenomenon-based studies therefore often focus on a few prominent cases while the lesser ones are overlooked. Without an automated means to gather the events, comprehensive investigation of a phenomenon is at least time-consuming if not impossible. An Earth Science event (ES event) is defined here as an episode of an Earth Science phenomenon. A cumulus cloud, a thunderstorm shower, a rogue wave, a tornado, an earthquake, a tsunami, a hurricane, or an EI Nino, is each an episode of a named ES phenomenon," and, from the small and insignificant to the large and potent, all are examples of ES events. An ES event has a finite duration and an associated geolocation as a function of time; its therefore an entity in four-dimensional . (4D) spatiotemporal space. The interests of Earth scientists typically rivet on Earth Science phenomena with potential to cause massive economic disruption or loss of life, but broader scientific curiosity also drives the study of phenomena that pose no immediate danger. We generally gain understanding of a given phenomenon by observing and studying individual events - usually beginning by identifying the occurrences of these events. Once representative events are identified or found, we must locate associated observed or simulated data prior to commencing analysis and concerted studies of the phenomenon. Knowledge concerning the phenomenon can accumulate only after analysis has started. However, except for a few high-impact phenomena. such as tropical cyclones and tornadoes, finding events and locating associated data currently may take a prohibitive amount of time and effort on the part of an individual investigator. And even for these high-impact phenomena, the availability of comprehensive records is still only a recent development. A major reason for the lack of comprehensive ,records for the majority of the ES phenomena is the perception that they do not pose immediate and/or severe threat to life and property and are thus not consistently tracked. monitored, and catalogued. Many phenomena even lack commonly accepted criteria for definitions. However. the lack of comprehensive records is also due to the increasingly prohibitive volume of observations and model data that must be examined. NASA Earth Observing System Data Information System (EOSDIS) alone archives several petabytes (PB) of satellite remote sensing data and steadily increases. All of these factors contribute to the difficulty of methodically identifying events corresponding to a given phenomenon and significantly impede systematic investigations. In the following we present a couple motivating scenarios, demonstrating the issues faced by Earth scientists studying ES phenomena

NASA Technical Reports Server

Anticipated Changes in Conducting Scientific Data-Analysis Research in the Big-Data Era

Author: Clune Thomas
Kuo Kwo-Sen
Ramachandran Rahul
Seablom Michael
Publication venue
Publication date
Field of study

A Big-Data environment is one that is capable of orchestrating quick-turnaround analyses involving large volumes of data for numerous simultaneous users. Based on our experiences with a prototype Big-Data analysis environment, we anticipate some important changes in research behaviors and processes while conducting scientific data-analysis research in the near future as such Big-Data environments become the mainstream. The first anticipated change will be the reduced effort and difficulty in most parts of the data management process. A Big-Data analysis environment is likely to house most of the data required for a particular research discipline along with appropriate analysis capabilities. This will reduce the need for researchers to download local copies of data. In turn, this also reduces the need for compute and storage procurement by individual researchers or groups, as well as associated maintenance and management afterwards. It is almost certain that Big-Data environments will require a different "programming language" to fully exploit the latent potential. In addition, the process of extending the environment to provide new analysis capabilities will likely be more involved than, say, compiling a piece of new or revised code.We thus anticipate that researchers will require support from dedicated organizations associated with the environment that are composed of professional software engineers and data scientists. A major benefit will likely be that such extensions are of higherquality and broader applicability than ad hoc changes by physical scientists. Another anticipated significant change is improved collaboration among the researchers using the same environment. Since the environment is homogeneous within itself, many barriers to collaboration are minimized or eliminated. For example, data and analysis algorithms can be seamlessly shared, reused and re-purposed. In conclusion, we will be able to achieve a new level of scientific productivity in the Big-Data analysis environments

NASA Technical Reports Server

Assessing Deep Neural Networks as Probability Estimators

Author: Kuo Kwo-Sen
Pan Yu
Rilee Michael L.
Yu Hongfeng
Publication venue
Publication date: 16/11/2021
Field of study

Deep Neural Networks (DNNs) have performed admirably in classification tasks. However, the characterization of their classification uncertainties, required for certain applications, has been lacking. In this work, we investigate the issue by assessing DNNs' ability to estimate conditional probabilities and propose a framework for systematic uncertainty characterization. Denoting the input sample as x and the category as y, the classification task of assigning a category y to a given input x can be reduced to the task of estimating the conditional probabilities p(y|x), as approximated by the DNN at its last layer using the softmax function. Since softmax yields a vector whose elements all fall in the interval (0, 1) and sum to 1, it suggests a probabilistic interpretation to the DNN's outcome. Using synthetic and real-world datasets, we look into the impact of various factors, e.g., probability density f(x) and inter-categorical sparsity, on the precision of DNNs' estimations of p(y|x), and find that the likelihood probability density and the inter-categorical sparsity have greater impacts than the prior probability to DNNs' classification uncertainty.Comment: Accepted at 2021 IEEE International Conference on Big Data (IEEE BigData 2021

arXiv.org e-Print Archive

Invariant Imbedded T-Matrix Method for Axial Symmetric Hydrometeors with Extreme Aspect Ratios

Author: Adams Ian
Clune Thomas
Kuo Kwo-Sen
Munchak Stephen
Pelissier Craig
Publication venue
Publication date
Field of study

The single-scattering properties (SSPs) of hydrometeors are the fundamental quantities for physics-based precipitation retrievals. Thus, efficient computation of their electromagnetic scattering is of great value. Whereas the semi-analytical T-Matrix methods are likely the most efficient for nonspherical hydrometeors with axial symmetry, they are not suitable for arbitrarily shaped hydrometeors absent of any significant symmetry, for which volume integral methods such as those based on Discrete Dipole Approximation (DDA) are required. Currently the two leading T-matrix methods are the Extended Boundary Condition Method (EBCM) and the Invariant Imbedding T-matrix Method incorporating Lorentz-Mie Separation of Variables (IITM+SOV). EBCM is known to outperform IITM+SOV for hydrometeors with modest aspect ratios. However, in cases when aspect ratios become extreme, such as needle-like particles with large height to diameter values, EBCM fails to converge. Such hydrometeors with extreme aspect ratios are known to be present in solid precipitation and their SSPs are required to model the radiative responses accurately. In these cases, IITM+SOV is shown to converge. An efficient, parallelized C++ implementation for both EBCM and IITM+SOV has been developed to conduct a performance comparison between EBCM, IITM+SOV, and DDSCAT (a popular implementation of DDA). We present the comparison results and discuss details. Our intent is to release the combined ECBM IITM+SOV software to the community under an open source license

NASA Technical Reports Server

MCRadar: A Monte Carlo Solver for Cloud and Precipitation Radar

Author: Adams Ian Stuart
Kuo Kwo-Sen
Munchak Joe
Publication venue
Publication date
Field of study

Multiple scattering produces anomalous echoes in observed radar profiles that cannot be explained by other phenomena. These effects are most obvious for spaceborne platforms, at shorter wavelengths, and in convection, as multiple scattering is governed by antenna beam width, optical depth, and albe do; however, multiple scattering has been observed in a range of precipitating conditions. To account for the effects of multiple scattering, Monte Carlo integration is employed in a flexible framework to enable arbitrary radar configurations with finite Gaussian beams for three-dimensional atmospheric scenarios. The three-dimensional nature of the model, coupled with the finite antenna response, also allows for consideration of nonuniform beam filling effects that often coincide with multiple scattering. Examples of airborne and spaceborne radars at various wavelengths are used to illustrate the effects of multiple scattering and nonuniform beam filling. The MCRadar code is currently available in the development version of ARTS

NASA Technical Reports Server

Tool for Automated Retrieval of Generic Event Tracks (TARGET)

Author: Burns Robert
Clune Thomas
Cruz Carlos
Freeman Shawn
Kouatchou Jules
Kuo Kwo-Sen
Publication venue
Publication date
Field of study

Methods have been developed to identify and track tornado-producing mesoscale convective systems (MCSs) automatically over the continental United States, in order to facilitate systematic studies of these powerful and often destructive events. Several data sources were combined to ensure event identification accuracy. Records of watches and warnings issued by National Weather Service (NWS), and tornado locations and tracks from the Tornado History Project (THP) were used to locate MCSs in high-resolution precipitation observations and GOES infrared (11-micron) Rapid Scan Operation (RSO) imagery. Thresholds are then applied to the latter two data sets to define MCS events and track their developments. MCSs produce a broad range of severe convective weather events that are significantly affecting the living conditions of the populations exposed to them. Understanding how MCSs grow and develop could help scientists improve their weather prediction models, and also provide tools to decision-makers whose goals are to protect populations and their property. Associating storm cells across frames of remotely sensed images poses a difficult problem because storms evolve, split, and merge. Any storm-tracking method should include the following processes: storm identification, storm tracking, and quantification of storm intensity and activity. The spatiotemporal coordinates of the tracks will enable researchers to obtain other coincident observations to conduct more thorough studies of these events. In addition to their tracked locations, their areal extents, precipitation intensities, and accumulations all as functions of their evolutions in time were also obtained and recorded for these events. All parameters so derived can be catalogued into a moving object database (MODB) for custom queries. The purpose of this software is to provide a generalized, cross-platform, pluggable tool for identifying events within a set of scientific data based upon specified criteria with the possibility of storing identified events into a searchable database. The core of the application uses an implementation of the connected component labeling (CCL) algorithm to identify areas of interest, then uses a set of criteria to establish spatial and temporal relationships between identified components. The CCL algorithm is used for identifying objects within images for computer vision. This application applies it to scientific data sets using arbitrary criteria. The most novel concept was applying a generalized CCL implementation to scientific data sets for establishing events both spatially and temporally. The combination of several existing concepts (pluggable components, generalized CCL algorithm, etc.) into one application is also novel. In addition, how the system is designed, i.e., its extensibility with pluggable components, and its configurability with a simple configuration file, is innovative. This allows the system to be applied to new scenarios with ease

NASA Technical Reports Server