9 research outputs found

    Map Generation from Large Scale Incomplete and Inaccurate Data Labels

    Full text link
    Accurately and globally mapping human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc.. In this paper we present progress in developing an algorithmic pipeline and distributed compute system that automates the process of map creation using high resolution aerial images. Unlike previous studies, most of which use datasets that are available only in a few cities across the world, we utilizes publicly available imagery and map data, both of which cover the contiguous United States (CONUS). We approach the technical challenge of inaccurate and incomplete training data adopting state-of-the-art convolutional neural network architectures such as the U-Net and the CycleGAN to incrementally generate maps with increasingly more accurate and more complete labels of man-made infrastructure such as roads and houses. Since scaling the mapping task to CONUS calls for parallelization, we then adopted an asynchronous distributed stochastic parallel gradient descent training scheme to distribute the computational workload onto a cluster of GPUs with nearly linear speed-up.Comment: This paper is accepted by KDD 202

    Unlocking Insights into Crop Growth and Nutrient Distribution: A Geospatial Analysis Approach Using Satellite Imagery and Soil Data

    Get PDF
    Accurate monitoring of crop growth and nutrient distribution is crucial for optimizing agricultural practices, promoting a sustainable environment, and ensuring long-term food production. In this study, we propose a novel and comprehensive approach to monitor crop growth and nutrient distribution in large-scale agricultural landscapes. Our methodology combines advanced geospatial and temporal analysis techniques, providing valuable insights into the intricate relationships between crop health, soil nutrients, and other essential soil properties. To monitor vegetation dynamics, we obtained data from the IBM EIS (Environment Intelligence Suite) and processed it using our HPC (High-Performance Computing) infrastructure. This is ingested into our CRADLE (Common Research Analytics and Data Lifecycle Environment). The IBM EIS consists of vast amounts of geospatial data curated from diverse sources, readily available for analysis. Leveraging the Normalized Difference Vegetation Index (NDVI) algorithm and MODIS Aqua satellite imagery, we classified vegetation on a daily basis, yielding a detailed assessment of land use and growth. Additionally, by integrating MODIS Aqua data with USDA Historical Crop planting data, we can identify the dominant crops in each region and monitor their growth and health across Texas and Ohio during 2019. To investigate soil properties and their influence on crop health, we utilize prominent soil databases from IBM EIS such as The Soil Survey Geographic Database (SSURGO) and the World Soil Information Service (WoSIS). These databases provide essential information on key soil properties, including pH, texture, water holding capacity, and organic carbon. By correlating these properties with soil nitrogen content, we can assess their interdependencies and infer their impacts on crop health. Furthermore, we analyze the correlation between crop health and nitrogen content, gaining valuable insights into the effects of soil nitrogen on crop well-being. By integrating remote sensing technology, soil science, and data science, this interdisciplinary study contributes to the development of sustainable agricultural management strategies. The findings of this research enhance food production capabilities and provide valuable information for policy decision-making, ultimately promoting environmental conservation within large-scale agricultural systems

    Learning and Recognizing Archeological Features from LiDAR Data

    Full text link
    We present a remote sensing pipeline that processes LiDAR (Light Detection And Ranging) data through machine & deep learning for the application of archeological feature detection on big geo-spatial data platforms such as e.g. IBM PAIRS Geoscope. Today, archeologists get overwhelmed by the task of visually surveying huge amounts of (raw) LiDAR data in order to identify areas of interest for inspection on the ground. We showcase a software system pipeline that results in significant savings in terms of expert productivity while missing only a small fraction of the artifacts. Our work employs artificial neural networks in conjunction with an efficient spatial segmentation procedure based on domain knowledge. Data processing is constraint by a limited amount of training labels and noisy LiDAR signals due to vegetation cover and decay of ancient structures. We aim at identifying geo-spatial areas with archeological artifacts in a supervised fashion allowing the domain expert to flexibly tune parameters based on her needs

    Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data

    Get PDF
    With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access software applications from web browsers while relieving them from the installation of any software applications in their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing. In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development
    corecore