2,855 research outputs found

    Development and Application of a Statistically-Based Quality Control for Crowdsourced Air Temperature Data

    Get PDF
    In urban areas, dense atmospheric observational networks with high-quality data are still a challenge due to high costs for installation and maintenance over time. Citizen weather stations (CWS) could be one answer to that issue. Since more and more owners of CWS share their measurement data publicly, crowdsourcing, i.e., the automated collection of large amounts of data from an undefined crowd of citizens, opens new pathways for atmospheric research. However, the most critical issue is found to be the quality of data from such networks. In this study, a statistically-based quality control (QC) is developed to identify suspicious air temperature (T) measurements from crowdsourced data sets. The newly developed QC exploits the combined knowledge of the dense network of CWS to statistically identify implausible measurements, independent of external reference data. The evaluation of the QC is performed using data from Netatmo CWS in Toulouse, France, and Berlin, Germany, over a 1-year period (July 2016 to June 2017), comparing the quality-controlled data with data from two networks of reference stations. The new QC efficiently identifies erroneous data due to solar exposition and siting issues, which are common error sources of CWS. Estimation of T is improved when averaging data from a group of stations within a restricted area rather than relying on data of individual CWS. However, a positive deviation in CWS data compared to reference data is identified, particularly for daily minimum T. To illustrate the transferability of the newly developed QC and the applicability of CWS data, a mapping of T is performed over the city of Paris, France, where spatial density of CWS is especially high.DFG, 322579844, Hitzewellen in Berlin, Deutschland - StadtklimamodifkationenBMBF, 01LP1602A, Verbundprojekt Stadtklima: Evaluierung von Stadtklimamodellen (Modul B), 3DO Teilprojekt 1: Dreidimensionales Monitoring atmosphärischer Prozesse in Berli

    Mining sensor datasets with spatiotemporal neighborhoods

    Get PDF
    Many spatiotemporal data mining methods are dependent on how relationships between a spatiotemporal unit and its neighbors are defined. These relationships are often termed the neighborhood of a spatiotemporal object. The focus of this paper is the discovery of spatiotemporal neighborhoods to find automatically spatiotemporal sub-regions in a sensor dataset. This research is motivated by the need to characterize large sensor datasets like those found in oceanographic and meteorological research. The approach presented in this paper finds spatiotemporal neighborhoods in sensor datasets by combining an agglomerative method to create temporal intervals and a graph-based method to find spatial neighborhoods within each temporal interval. These methods were tested on real-world datasets including (a) sea surface temperature data from the Tropical Atmospheric Ocean Project (TAO) array in the Equatorial Pacific Ocean and (b) NEXRAD precipitation data from the Hydro-NEXRAD system. The results were evaluated based on known patterns of the phenomenon being measured. Furthermore the results were quantified by performing hypothesis testing to establish the statistical significance using Monte Carlo simulations. The approach was also compared with existing approaches using validation metrics namely spatial autocorrelation and temporal interval dissimilarity. The results of these experiments show that our approach indeed identifies highly refined spatiotemporal neighborhoods

    Detection of Anomalous Traffic Patterns and Insight Analysis from Bus Trajectory Data

    Full text link
    © 2019, Springer Nature Switzerland AG. Detection of anomalous patterns from traffic data is closely related to analysis of traffic accidents, fault detection, flow management, and new infrastructure planning. Existing methods on traffic anomaly detection are modelled on taxi trajectory data and have shortcoming that the data may lose much information about actual road traffic situation, as taxi drivers can select optimal route for themselves to avoid traffic anomalies. We employ bus trajectory data as it reflects real traffic conditions on the road to detect city-wide anomalous traffic patterns and to provide broader range of insights into these anomalies. Taking these considerations, we first propose a feature visualization method by mapping extracted 3-dimensional hidden features to red-green-blue (RGB) color space with a deep sparse autoencoder (DSAE). A color trajectory (CT) is produced by encoding a trajectory with RGB colors. Then, a novel algorithm is devised to detect spatio-temporal outliers with spatial and temporal properties extracted from the CT. We also integrate the CT with the geographic information system (GIS) map to obtain insights for understanding the traffic anomaly locations, and more importantly the road influence affected by the corresponding anomalies. Our proposed method was tested on three real-world bus trajectory data sets to demonstrate the excellent performance of high detection rates and low false alarm rates

    Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

    Full text link
    With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean

    Potpuni i homogeni nizovi mjesečnih temperatura zraka za konstruiranje klimatskih normala 1981.–2010. za Hrvatsku

    Get PDF
    Providing climatological normals is one of the most important tasks for national meteorological services. Estimating the statistical characteristics of climate variables from incomplete and inhomogeneous data can result in biased estimations; thus, it is necessary to fill in missing values and remove inhomogeneities. Though it is very important, the homogenization procedure is still not a part of data quality-check procedures. In this work, monthly temperature data from 39 meteorological stations in Croatia for the period 1981–2010 were examined for missing data and inhomogeneities. Stations were divided into three climatic regions, and homogenization was performed for each one separately. The performance of the homogenization method was tested by: (1) comparison of correlation coefficients amongst stations and (2) changes in rotated principal components for datasets before and after homogenization. Obtained homogeneity breaks were compared with metadata and published literature. Changes in the statistical characteristics of temperature climate normals 1981–2010 (e.g., long-term means and decadal trends) were observed at annual and seasonal scales between original and homogenized series. The significance of the changes in mean was tested using the Student’s t-test, while the significance of trends was tested with the Mann-Kendall test. The homogenization software used was the R package, climatol.Pružanje informacija o klimatskim normalama pripada u najvažnije zadatke nacionalnih meteoroloških službi. Statistička obilježja klimatskih varijabli određena iz nepotpunih i nehomogenih podataka daju pristranu procjenu te je nedostajuće podatke nužno nadopuniti i ukloniti nehomogenosti. Homogenizacija podataka, iako vrlo važna, još uvijek nije dio procedura za kontrolu kvalitete podataka. U radu je ispitan obim nedostajućih podataka i homogenost na nizovima mjesečnih temperatura zraka s 39 meteoroloških postaja u Hrvatskoj iz razdoblja 1981.–2010. Postaje su podijeljene prema pripadnosti klimatskim područjima i homogenizacija je provedena za svako područje posebno. Uspješnost metode homogenizacije testirana je: (1) usporedbom koeficijenata korelacije mjesečnih temperatura na postajama i (2) usporedbom rotiranih glavnih komponenti prije i nakon homogenizacije. Prekidi u homogenosti uspoređeni su s meta podacima i objavljenom literaturom. Promjene u statističkim obilježjima temperaturnih klimatskih normala 1981.–2010. kao što su višegodišnji srednjak i dekadni trend uočene su na godišnjoj i sezonskim skalama između originalnih i homogeniziranih nizova. Značajnost razlika u srednjaku testirana je Studentovim t-testom dok je značajnost trenda testirana Mann-Kendalovim testom. Za homogenizaciju je korišten R paket climatol
    corecore