9 research outputs found
Map Generation from Large Scale Incomplete and Inaccurate Data Labels
Accurately and globally mapping human infrastructure is an important and
challenging task with applications in routing, regulation compliance
monitoring, and natural disaster response management etc.. In this paper we
present progress in developing an algorithmic pipeline and distributed compute
system that automates the process of map creation using high resolution aerial
images. Unlike previous studies, most of which use datasets that are available
only in a few cities across the world, we utilizes publicly available imagery
and map data, both of which cover the contiguous United States (CONUS). We
approach the technical challenge of inaccurate and incomplete training data
adopting state-of-the-art convolutional neural network architectures such as
the U-Net and the CycleGAN to incrementally generate maps with increasingly
more accurate and more complete labels of man-made infrastructure such as roads
and houses. Since scaling the mapping task to CONUS calls for parallelization,
we then adopted an asynchronous distributed stochastic parallel gradient
descent training scheme to distribute the computational workload onto a cluster
of GPUs with nearly linear speed-up.Comment: This paper is accepted by KDD 202
Unlocking Insights into Crop Growth and Nutrient Distribution: A Geospatial Analysis Approach Using Satellite Imagery and Soil Data
Accurate monitoring of crop growth and nutrient distribution is crucial for optimizing agricultural practices, promoting a sustainable environment, and ensuring long-term food production. In this study, we propose a novel and comprehensive approach to monitor crop growth and nutrient distribution in large-scale agricultural landscapes. Our methodology combines advanced geospatial and temporal analysis techniques, providing valuable insights into the intricate relationships between crop health, soil nutrients, and other essential soil properties.
To monitor vegetation dynamics, we obtained data from the IBM EIS (Environment Intelligence Suite) and processed it using our HPC (High-Performance Computing) infrastructure. This is ingested into our CRADLE (Common Research Analytics and Data Lifecycle Environment). The IBM EIS consists of vast amounts of geospatial data curated from diverse sources, readily available for analysis. Leveraging the Normalized Difference Vegetation Index (NDVI) algorithm and MODIS Aqua satellite imagery, we classified vegetation on a daily basis, yielding a detailed assessment of land use and growth. Additionally, by integrating MODIS Aqua data with USDA Historical Crop planting data, we can identify the dominant crops in each region and monitor their growth and health across Texas and Ohio during 2019.
To investigate soil properties and their influence on crop health, we utilize prominent soil databases from IBM EIS such as The Soil Survey Geographic Database (SSURGO) and the World Soil Information Service (WoSIS). These databases provide essential information on key soil properties, including pH, texture, water holding capacity, and organic carbon. By correlating these properties with soil nitrogen content, we can assess their interdependencies and infer their impacts on crop health. Furthermore, we analyze the correlation between crop health and nitrogen content, gaining valuable insights into the effects of soil nitrogen on crop well-being.
By integrating remote sensing technology, soil science, and data science, this interdisciplinary study contributes to the development of sustainable agricultural management strategies. The findings of this research enhance food production capabilities and provide valuable information for policy decision-making, ultimately promoting environmental conservation within large-scale agricultural systems
Learning and Recognizing Archeological Features from LiDAR Data
We present a remote sensing pipeline that processes LiDAR (Light Detection
And Ranging) data through machine & deep learning for the application of
archeological feature detection on big geo-spatial data platforms such as e.g.
IBM PAIRS Geoscope.
Today, archeologists get overwhelmed by the task of visually surveying huge
amounts of (raw) LiDAR data in order to identify areas of interest for
inspection on the ground. We showcase a software system pipeline that results
in significant savings in terms of expert productivity while missing only a
small fraction of the artifacts.
Our work employs artificial neural networks in conjunction with an efficient
spatial segmentation procedure based on domain knowledge. Data processing is
constraint by a limited amount of training labels and noisy LiDAR signals due
to vegetation cover and decay of ancient structures. We aim at identifying
geo-spatial areas with archeological artifacts in a supervised fashion allowing
the domain expert to flexibly tune parameters based on her needs
Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data
With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access
software applications from web browsers while relieving them from the installation of any software applications in
their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based
systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing.
In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development