1,076 research outputs found

    Incremental Principal Component Analysis Based Outliers Detection Methods for Spatiotemporal Data Streams

    Get PDF
    In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams

    Detection of inconsistencies in geospatial data with geostatistics

    Get PDF
    Almost every researcher has come through observations that “drift” from the rest of the sample, suggesting some inconsistency. The aim of this paper is to propose a new inconsistent data detection method for continuous geospatial data based in Geostatistics, independently from the generative cause (measuring and execution errors and inherent variability data). The choice of Geostatistics is based in its ideal characteristics, as avoiding systematic errors, for example. The importance of a new inconsistent detection method proposal is in the fact that some existing methods used in geospatial data consider theoretical assumptions hardly attended. Equally, the choice of the data set is related to the importance of the LiDAR technology (Light Detection and Ranging) in the production of Digital Elevation Models (DEM). Thus, with the new methodology it was possible to detect and map discrepant data. Comparing it to a much utilized detections method, BoxPlot, the importance and functionality of the new method was verified, since the BoxPlot did not detect any data classified as discrepant. The proposed method pointed that, in average, 1,2% of the data of possible regionalized inferior outliers and, in average, 1,4% of possible regionalized superior outliers, in relation to the set of data used in the study

    The Influence of Measurement Scale and Uncertainty on Interpretations of River Migration

    Get PDF
    Environmental scientists increasingly use remotely-sensed images to measure how rivers develop over time and respond to upstream changes in environmental drivers such as land use, urbanization, deforestation and agricultural practices. These measurements are subject to uncertainty that can bias conclusions. The first step towards accurate interpretation of river channel change is properly quantifying and accounting for uncertainty involved in measuring changes in river morphology. In Chapter 2 we develop a comprehensive framework for quantifying uncertainty in measurements of river change derived from aerial images. The framework builds upon previous uncertainty research by describing best practices and context-specific strategies, comparing each approach and outlining how to best handle measurements that fall below the minimum level of detection. We use this framework in subsequent chapters to reduce the impact of erroneous measurements. Chapter 3 evaluates how the time interval between aerial images influences the rates at which river channels appear to laterally migrate across their floodplains. Multiple lines of evidence indicate that river migration measurements obtained over longer time intervals (20+ years) will underestimate the ‘true’ rate because the river channel is more likely to have reversed the direction of migration, which erases part of the record of gross erosion as seen from aerial images. If the images don’t capture channel reversals and periodic episodes of fast erosion, the river appears to have migrated a shorter distance (which corresponds to a slower rate) than reality. Obtaining multiple measurements over shorter time intervals (\u3c 5 years) and limiting direct comparisons to similar time intervals can reduce bias when inferring how river migration rates may have changed over time. Chapter 4 explores the physical processes governing the relationship between river curvature and the rate of river migration along a series of meander bends. We used fine-scale empirical measurements and geospatial analyses to confirm theory and models indicating that migration and curvature exhibit a monotonic relationship. The results will improve models seeking to emulate river meander migration patterns

    A Multi-Stage Machine Learning Approach to Predict Dengue Incidence: A Case Study in Mexico

    Get PDF
    © 2013 IEEE. The mosquito-borne dengue fever is a major public health problem in tropical countries, where it is strongly conditioned by climate factors such as temperature. In this paper, we formulate a holistic machine learning strategy to analyze the temporal dynamics of temperature and dengue data and use this knowledge to produce accurate predictions of dengue, based on temperature on an annual scale. The temporal dynamics are extracted from historical data by utilizing a novel multi-stage combination of auto-encoding, window-based data representation and trend-based temporal clustering. The prediction is performed with a trend association-based nearest neighbour predictor. The effectiveness of the proposed strategy is evaluated in a case study that comprises the number of dengue and dengue hemorrhagic fever cases collected over the period 1985-2010 in 32 federal states of Mexico. The empirical study proves the viability of the proposed strategy and confirms that it outperforms various state-of-the-art competitor methods formulated both in regression and in time series forecasting analysis

    Multisensor Fusion Remote Sensing Technology For Assessing Multitemporal Responses In Ecohydrological Systems

    Get PDF
    Earth ecosystems and environment have been changing rapidly due to the advanced technologies and developments of humans. Impacts caused by human activities and developments are difficult to acquire for evaluations due to the rapid changes. Remote sensing (RS) technology has been implemented for environmental managements. A new and promising trend in remote sensing for environment is widely used to measure and monitor the earth environment and its changes. RS allows large-scaled measurements over a large region within a very short period of time. Continuous and repeatable measurements are the very indispensable features of RS. Soil moisture is a critical element in the hydrological cycle especially in a semiarid or arid region. Point measurement to comprehend the soil moisture distribution contiguously in a vast watershed is difficult because the soil moisture patterns might greatly vary temporally and spatially. Space-borne radar imaging satellites have been popular because they have the capability to exhibit all weather observations. Yet the estimation methods of soil moisture based on the active or passive satellite imageries remain uncertain. This study aims at presenting a systematic soil moisture estimation method for the Choke Canyon Reservoir Watershed (CCRW), a semiarid watershed with an area of over 14,200 km2 in south Texas. With the aid of five corner reflectors, the RADARSAT-1 Synthetic Aperture Radar (SAR) imageries of the study area acquired in April and September 2004 were processed by both radiometric and geometric calibrations at first. New soil moisture estimation models derived by genetic programming (GP) technique were then developed and applied to support the soil moisture distribution analysis. The GP-based nonlinear function derived in the evolutionary process uniquely links a series of crucial topographic and geographic features. Included in this process are slope, aspect, vegetation cover, and soil permeability to compliment the well-calibrated SAR data. Research indicates that the novel application of GP proved useful for generating a highly nonlinear structure in regression regime, which exhibits very strong correlations statistically between the model estimates and the ground truth measurements (volumetric water content) on the basis of the unseen data sets. In an effort to produce the soil moisture distributions over seasons, it eventually leads to characterizing local- to regional-scale soil moisture variability and performing the possible estimation of water storages of the terrestrial hydrosphere. A new evolutionary computational, supervised classification scheme (Riparian Classification Algorithm, RICAL) was developed and used to identify the change of riparian zones in a semi-arid watershed temporally and spatially. The case study uniquely demonstrates an effort to incorporating both vegetation index and soil moisture estimates based on Landsat 5 TM and RADARSAT-1 imageries while trying to improve the riparian classification in the Choke Canyon Reservoir Watershed (CCRW), South Texas. The CCRW was selected as the study area contributing to the reservoir, which is mostly agricultural and range land in a semi-arid coastal environment. This makes the change detection of riparian buffers significant due to their interception capability of non-point source impacts within the riparian buffer zones and the maintenance of ecosystem integrity region wide. The estimation of soil moisture based on RADARSAT-1 Synthetic Aperture Radar (SAR) satellite imagery as previously developed was used. Eight commonly used vegetation indices were calculated from the reflectance obtained from Landsat 5 TM satellite images. The vegetation indices were individually used to classify vegetation cover in association with genetic programming algorithm. The soil moisture and vegetation indices were integrated into Landsat TM images based on a pre-pixel channel approach for riparian classification. Two different classification algorithms were used including genetic programming, and a combination of ISODATA and maximum likelihood supervised classification. The white box feature of genetic programming revealed the comparative advantage of all input parameters. The GP algorithm yielded more than 90% accuracy, based on unseen ground data, using vegetation index and Landsat reflectance band 1, 2, 3, and 4. The detection of changes in the buffer zone was proved to be technically feasible with high accuracy. Overall, the development of the RICAL algorithm may lead to the formulation of more effective management strategies for the handling of non-point source pollution control, bird habitat monitoring, and grazing and live stock management in the future. Soil properties, landscapes, channels, fault lines, erosion/deposition patches, and bedload transport history show geologic and geomorphologic features in a variety of watersheds. In response to these unique watershed characteristics, the hydrology of large-scale watersheds is often very complex. Precipitation, infiltration and percolation, stream flow, plant transpiration, soil moisture changes, and groundwater recharge are intimately related with each other to form water balance dynamics on the surface of these watersheds. Within this chapter, depicted is an optimal site selection technology using a grey integer programming (GIP) model to assimilate remote sensing-based geo-environmental patterns in an uncertain environment with respect to some technical and resources constraints. It enables us to retrieve the hydrological trends and pinpoint the most critical locations for the deployment of monitoring stations in a vast watershed. Geo-environmental information amassed in this study includes soil permeability, surface temperature, soil moisture, precipitation, leaf area index (LAI) and normalized difference vegetation index (NDVI). With the aid of a remote sensing-based GIP analysis, only five locations out of more than 800 candidate sites were selected by the spatial analysis, and then confirmed by a field investigation. The methodology developed in this remote sensing-based GIP analysis will significantly advance the state-of-the-art technology in optimum arrangement/distribution of water sensor platforms for maximum sensing coverage and information-extraction capacity. Effective water resources management is a critically important priority across the globe. While water scarcity limits the uses of water in many ways, floods also have caused so many damages and lives. To more efficiently use the limited amount of water or to resourcefully provide adequate time for flood warning, the results have led us to seek advanced techniques for improving streamflow forecasting. The objective of this section of research is to incorporate sea surface temperature (SST), Next Generation Radar (NEXRAD) and meteorological characteristics with historical stream data to forecast the actual streamflow using genetic programming. This study case concerns the forecasting of stream discharge of a complex-terrain, semi-arid watershed. This study elicits microclimatological factors and the resultant stream flow rate in river system given the influence of dynamic basin features such as soil moisture, soil temperature, ambient relative humidity, air temperature, sea surface temperature, and precipitation. Evaluations of the forecasting results are expressed in terms of the percentage error (PE), the root-mean-square error (RMSE), and the square of the Pearson product moment correlation coefficient (r-squared value). The developed models can predict streamflow with very good accuracy with an r-square of 0.84 and PE of 1% for a 30-day prediction

    Rethinking the Polar Cap: Eccentric Dipole Structuring of ULF Power at the Highest Corrected Geomagnetic Latitudes

    Get PDF
    The day-to-day evolution and statistical features of Pc3-Pc7 band ultralow frequency (ULF) power throughout the southern polar cap suggest that the corrected geomagnetic (CGM) coordinates do not adequately organize the observed hydromagnetic spatial structure. It is shown that that the local-time distribution of ULF power at sites along CGM latitudinal parallels exhibit fundamental differences and that the CGM latitude of a site in general is not indicative of the site\u27s projection into the magnetosphere. Thus, ULF characteristics observed at a single site in the polar cap cannot be freely generalized to other sites of similar CGM latitude but separated in magnetic local time, and the inadequacy of CGM coordinates in the polar cap has implications for conjugacy/mapping studies in general. In seeking alternative, observationally motivated systems of “polar cap latitudes,” it is found that eccentric dipole (ED) coordinates have several strengths in organizing the hydromagnetic spatial structure in the polar cap region. ED latitudes appear to better classify the local-time ULF power in both magnitude and morphology and better differentiate the “deep polar cap” (where the ULF power is largely UT dependent and nearly free of local-time structure) from the “peripheral polar cap” (where near-magnetic noon pulsations dominate at lower and lower frequencies as one increases in ED latitude). Eccentric local time is shown to better align the local-time profiles in the magnetic east component over several PcX bands but worsen in the magnetic north component. It is suggested that a hybrid ED-CGM coordinate system might capture the strengths of both CGM and ED coordinates. It is shown that the local-time morphology of median ULF power at high-latitude sites is dominantly driven by where they project into the magnetosphere, which is best quantified by their proximity to the low-altitude cusp on the dayside (which is not necessarily quantified by a site\u27s CGM latitude), and that variations in the local-time morphology at sites similar in ED latitude are due to both geographic local-time control (relative amplification or dampening by the diurnal variation in the local ionospheric conductivity) and geomagnetic coastal effects (enhanced power in a coastally mediated direction). Regardless of cause, it is emphasized that the application of CGM latitudes in the polar cap region is not entirely meaningful and likely should be dispensed with in favor of a scheme that is in better accord with the observed hydromagnetic spatial structure

    The Challenge of Machine Learning in Space Weather Nowcasting and Forecasting

    Get PDF
    The numerous recent breakthroughs in machine learning (ML) make imperative to carefully ponder how the scientific community can benefit from a technology that, although not necessarily new, is today living its golden age. This Grand Challenge review paper is focused on the present and future role of machine learning in space weather. The purpose is twofold. On one hand, we will discuss previous works that use ML for space weather forecasting, focusing in particular on the few areas that have seen most activity: the forecasting of geomagnetic indices, of relativistic electrons at geosynchronous orbits, of solar flares occurrence, of coronal mass ejection propagation time, and of solar wind speed. On the other hand, this paper serves as a gentle introduction to the field of machine learning tailored to the space weather community and as a pointer to a number of open challenges that we believe the community should undertake in the next decade. The recurring themes throughout the review are the need to shift our forecasting paradigm to a probabilistic approach focused on the reliable assessment of uncertainties, and the combination of physics-based and machine learning approaches, known as gray-box.Comment: under revie

    Evaluating Coastal Landscape Response to Sea-Level Rise in the Northeastern United States - Approach and Methods

    Get PDF
    The U.S. Geological Survey is examining effects of future sea-level rise on the coastal landscape from Maine to Virginia by producing spatially explicit, probabilistic predictions using sea-level projections, vertical land movement rates (due to isostacy), elevation data, and land-cover data. Sea-level-rise scenarios used as model inputs are generated by using multiple sources of information, including Coupled Model Intercomparison Project Phase 5 models following representative concentration pathways 4.5 and 8.5 in the Intergovernmental Panel on Climate Change Fifth Assessment Report. A Bayesian network is used to develop a predictive coastal response model that integrates the sea-level, elevation, and land-cover data with assigned probabilities that account for interactions with coastal geomorphology as well as the corresponding ecological and societal systems it supports. The effects of sea-level rise are presented as (1) level of landscape submergence and (2) coastal response type characterized as either static (that is, inundation) or dynamic (that is, landform or landscape change). Results are produced at a spatial scale of 30 meters for four decades (the 2020s, 2030s, 2050s, and 2080s). The probabilistic predictions can be applied to landscape management decisions based on sea-level-rise effects as well as on assessments of the prediction uncertainty and need for improved data or fundamental understanding. This report describes the methods used to produce predictions, including information on input datasets; the modeling approach; model outputs; data-quality-control procedures; and information on how to access the data and metadata online

    A dependability framework for WSN-based aquatic monitoring systems

    Get PDF
    Wireless Sensor Networks (WSN) are being progressively used in several application areas, particularly to collect data and monitor physical processes. Moreover, sensor nodes used in environmental monitoring applications, such as the aquatic sensor networks, are often subject to harsh environmental conditions while monitoring complex phenomena. Non-functional requirements, like reliability, security or availability, are increasingly important and must be accounted for in the application development. For that purpose, there is a large body of knowledge on dependability techniques for distributed systems, which provides a good basis to understand how to satisfy these non-functional requirements of WSN-based monitoring applications. Given the data-centric nature of monitoring applications, it is of particular importance to ensure that data is reliable or, more generically, that it has the necessary quality. The problem of ensuring the desired quality of data for dependable monitoring using WSNs is studied herein. With a dependability-oriented perspective, it is reviewed the possible impairments to dependability and the prominent existing solutions to solve or mitigate these impairments. Despite the variety of components that may form a WSN-based monitoring system, it is given particular attention to understanding which faults can affect sensors, how they can affect the quality of the information, and how this quality can be improved and quantified. Open research issues for the specific case of aquatic monitoring applications are also discussed. One of the challenges in achieving a dependable system behavior is to overcome the external disturbances affecting sensor measurements and detect the failure patterns in sensor data. This is a particular problem in environmental monitoring, due to the difficulty in distinguishing a faulty behavior from the representation of a natural phenomenon. Existing solutions for failure detection assume that physical processes can be accurately modeled, or that there are large deviations that may be detected using coarse techniques, or more commonly that it is a high-density sensor network with value redundant sensors. This thesis aims at defining a new methodology for dependable data quality in environmental monitoring systems, aiming to detect faulty measurements and increase the sensors data quality. The framework of the methodology is overviewed through a generically applicable design, which can be employed to any environment sensor network dataset. The methodology is evaluated in various datasets of different WSNs, where it is used machine learning to model each sensor behavior, exploiting the existence of correlated data provided by neighbor sensors. It is intended to explore the data fusion strategies in order to effectively detect potential failures for each sensor and, simultaneously, distinguish truly abnormal measurements from deviations due to natural phenomena. This is accomplished with the successful application of the methodology to detect and correct outliers, offset and drifting failures in real monitoring networks datasets. In the future, the methodology can be applied to optimize the data quality control processes of new and already operating monitoring networks, and assist in the networks maintenance operations.As redes de sensores sem fios (RSSF) têm vindo cada vez mais a serem utilizadas em diversas áreas de aplicação, em especial para monitorizar e capturar informação de processos físicos em meios naturais. Neste contexto, os sensores que estão em contacto direto com o respectivo meio ambiente, como por exemplo os sensores em meios aquáticos, estão sujeitos a condições adversas e complexas durante o seu funcionamento. Esta complexidade conduz à necessidade de considerarmos, durante o desenvolvimento destas redes, os requisitos não funcionais da confiabilidade, da segurança ou da disponibilidade elevada. Para percebermos como satisfazer estes requisitos da monitorização com base em RSSF para aplicações ambientais, já existe uma boa base de conhecimento sobre técnicas de confiabilidade em sistemas distribuídos. Devido ao foco na obtenção de dados deste tipo de aplicações de RSSF, é particularmente importante garantir que os dados obtidos na monitorização sejam confiáveis ou, de uma forma mais geral, que tenham a qualidade necessária para o objetivo pretendido. Esta tese estuda o problema de garantir a qualidade de dados necessária para uma monitorização confiável usando RSSF. Com o foco na confiabilidade, revemos os possíveis impedimentos à obtenção de dados confiáveis e as soluções existentes capazes de corrigir ou mitigar esses impedimentos. Apesar de existir uma grande variedade de componentes que formam ou podem formar um sistema de monitorização com base em RSSF, prestamos particular atenção à compreensão das possíveis faltas que podem afetar os sensores, a como estas faltas afetam a qualidade dos dados recolhidos pelos sensores e a como podemos melhorar os dados e quantificar a sua qualidade. Tendo em conta o caso específico dos sistemas de monitorização em meios aquáticos, discutimos ainda as várias linhas de investigação em aberto neste tópico. Um dos desafios para se atingir um sistema de monitorização confiável é a deteção da influência de fatores externos relacionados com o ambiente monitorizado, que afetam as medições obtidas pelos sensores, bem como a deteção de comportamentos de falha nas medições. Este desafio é um problema particular na monitorização em ambientes naturais adversos devido à dificuldade da distinção entre os comportamentos associados às falhas nos sensores e os comportamentos dos sensores afetados pela à influência de um evento natural. As soluções existentes para este problema, relacionadas com deteção de faltas, assumem que os processos físicos a monitorizar podem ser modelados de forma eficaz, ou que os comportamentos de falha são caraterizados por desvios elevados do comportamento expectável de forma a serem facilmente detetáveis. Mais frequentemente, as soluções assumem que as redes de sensores contêm um número suficientemente elevado de sensores na área monitorizada e, consequentemente, que existem sensores redundantes relativamente à medição. Esta tese tem como objetivo a definição de uma nova metodologia para a obtenção de qualidade de dados confiável em sistemas de monitorização ambientais, com o intuito de detetar a presença de faltas nas medições e aumentar a qualidade dos dados dos sensores. Esta metodologia tem uma estrutura genérica de forma a ser aplicada a uma qualquer rede de sensores ambiental ou ao respectivo conjunto de dados obtido pelos sensores desta. A metodologia é avaliada através de vários conjuntos de dados de diferentes RSSF, em que aplicámos técnicas de aprendizagem automática para modelar o comportamento de cada sensor, com base na exploração das correlações existentes entre os dados obtidos pelos sensores da rede. O objetivo é a aplicação de estratégias de fusão de dados para a deteção de potenciais falhas em cada sensor e, simultaneamente, a distinção de medições verdadeiramente defeituosas de desvios derivados de eventos naturais. Este objectivo é cumprido através da aplicação bem sucedida da metodologia para detetar e corrigir outliers, offsets e drifts em conjuntos de dados reais obtidos por redes de sensores. No futuro, a metodologia pode ser aplicada para otimizar os processos de controlo da qualidade de dados quer de novos sistemas de monitorização, quer de redes de sensores já em funcionamento, bem como para auxiliar operações de manutenção das redes.Laboratório Nacional de Engenharia Civi
    corecore