10,042 research outputs found

    A disposition of interpolation techniques

    Get PDF
    A large collection of interpolation techniques is available for application in environmental research. To help environmental scientists in choosing an appropriate technique a disposition is made, based on 1) applicability in space, time and space-time, 2) quantification of accuracy of interpolated values, 3) incorporation of ancillary information, and 4) incorporation of process knowledge. The described methods include inverse distance weighting, nearest neighbour methods, geostatistical interpolation methods, Kalman filter methods, Bayesian Maximum Entropy methods, etc. The applicability of methods in aggregation (upscaling) and disaggregation (downscaling) is discussed. Software for interpolation is described. The application of interpolation techniques is illustrated in two case studies: temporal interpolation of indicators for ecological water quality, and spatio-temporal interpolation and aggregation of pesticide concentrations in Dutch surface waters. A valuable next step will be to construct a decision tree or decision support system, that guides the environmental scientist to easy-to-use software implementations that are appropriate to solve their interpolation problem. Validation studies are needed to assess the quality of interpolated values, and the quality of information on uncertainty provided by the interpolation method

    Event detection in location-based social networks

    Get PDF
    With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft

    Searching and mining in enriched geo-spatial data

    Get PDF
    The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released. This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were accepted into various renowned international peer-reviewed venues. A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences. This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data. In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots. Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend. Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner. This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in rĂ€umlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten ĂŒber sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhĂ€renter Unsicherheit. Beispiele fĂŒr solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusĂ€tzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hĂ€lt viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll. Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in rĂ€umlichen Daten. In all diesen FĂ€llen haben die entwickelten Methoden signifikante ForschungsbeitrĂ€ge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals. Ein gĂ€ngiges Anwendungsgebiet rĂ€umlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulĂ€rer Maße optimieren, d.h., der kĂŒrzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusĂ€tzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder PrĂ€ferenzen erfĂŒllen. Diese Arbeit prĂ€sentiert zwei AnsĂ€tze, solche multi-enriched Daten in Straßennetze einzufĂŒgen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen ĂŒber NutzerprĂ€ferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden prĂ€sentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen. Im zweiten Fall wird ein Framework prĂ€sentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfĂŒgbaren Ressourcen. Eine Anwendung ist die Suche nach ParkplĂ€tzen. Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulĂ€sst. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant ĂŒber den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit prĂ€sentiert diese Arbeit Methoden, um Trends nach Archetypen zu klassifizieren und ihren zukĂŒnftigen Weg vorherzusagen. Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfĂŒgbar und Anfragen lassen sich aufgrund der \P-VollstĂ€ndigkeit des Problems nicht ohne Weiteres auf grĂ¶ĂŸere DatensĂ€tze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wĂ€re. Diese Arbeit prĂ€sentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen

    A new approach to spatial data interpolation using higher-order statistics

    Get PDF
    Interpolation techniques for spatial data have been applied frequently in various fields of geosciences. Although most conventional interpolation methods assume that it is sufficient to use first- and second-order statistics to characterize random fields, researchers have now realized that these methods cannot always provide reliable interpolation results, since geological and environmental phenomena tend to be very complex, presenting non-Gaussian distribution and/or non-linear inter-variable relationship. This paper proposes a new approach to the interpolation of spatial data, which can be applied with great flexibility. Suitable cross-variable higher-order spatial statistics are developed to measure the spatial relationship between the random variable at an unsampled location and those in its neighbourhood. Given the computed cross-variable higher-order spatial statistics, the conditional probability density function is approximated via polynomial expansions, which is then utilized to determine the interpolated value at the unsampled location as an expectation. In addition, the uncertainty associated with the interpolation is quantified by constructing prediction intervals of interpolated values. The proposed method is applied to a mineral deposit dataset, and the results demonstrate that it outperforms kriging methods in uncertainty quantification. The introduction of the cross-variable higher-order spatial statistics noticeably improves the quality of the interpolation since it enriches the information that can be extracted from the observed data, and this benefit is substantial when working with data that are sparse or have non-trivial dependence structures

    Geographically weighted evidence combination approaches for combining discordant and inconsistent volunteered geographical information

    Get PDF
    There is much interest in being able to combine crowdsourced data. One of the critical issues in information sciences is how to combine data or information that are discordant or inconsistent in some way. Many previous approaches have taken a majority rules approach under the assumption that most people are correct most of the time. This paper analyses crowdsourced land cover data generated by the Geo-Wiki initiative in order to infer the land cover present at locations on a 50 km grid. It compares four evidence combination approaches (Dempster Shafer, Bayes, Fuzzy Sets and Possibility) applied under a geographically weighted kernel with the geographically weighted average approach applied in many current Geo-Wiki analyses. A geographically weighted approach uses a moving kernel under which local analyses are undertaken. The contribution (or salience) of each data point to the analysis is weighted by its distance to the kernel centre, reflecting Tobler’s 1st law of geography. A series of analyses were undertaken using different kernel sizes (or bandwidths). Each of the geographically weighted evidence combination methods generated spatially distributed measures of belief in hypotheses associated with the presence of individual land cover classes at each location on the grid. These were compared with GlobCover, a global land cover product. The results from the geographically weighted average approach in general had higher correspondence with the reference data and this increased with bandwidth. However, for some classes other evidence combination approaches had higher correspondences possibly because of greater ambiguity over class conceptualisations and / or lower densities of crowdsourced data. The outputs also allowed the beliefs in each class to be mapped. The differences in the soft and the crisp maps are clearly associated with the logics of each evidence combination approach and of course the different questions that they ask of the data. The results show that discordant data can be combined (rather than being removed from analysis) and that data integrated in this way can be parameterised by different measures of belief uncertainty. The discussion highlights a number of critical areas for future research

    Post-drought decline of the Amazon carbon sink

    Get PDF
    Amazon forests have experienced frequent and severe droughts in the past two decades. However, little is known about the large-scale legacy of droughts on carbon stocks and dynamics of forests. Using systematic sampling of forest structure measured by LiDAR waveforms from 2003 to 2008, here we show a significant loss of carbon over the entire Amazon basin at a rate of 0.3 ± 0.2 (95% CI) PgC yr−1 after the 2005 mega-drought, which continued persistently over the next 3 years (2005–2008). The changes in forest structure, captured by average LiDAR forest height and converted to above ground biomass carbon density, show an average loss of 2.35 ± 1.80 MgC ha−1 a year after (2006) in the epicenter of the drought. With more frequent droughts expected in future, forests of Amazon may lose their role as a robust sink of carbon, leading to a significant positive climate feedback and exacerbating warming trends.The research was partially supported by NASA Terrestrial Ecology grant at the Jet Propulsion Laboratory, California Institute of Technology and partial funding to the UCLA Institute of Environment and Sustainability from previous National Aeronautics and Space Administration and National Science Foundation grants. The authors thank NSIDC, BYU, USGS, and NASA Land Processes Distributed Active Archive Center (LP DAAC) for making their data available. (NASA Terrestrial Ecology grant at the Jet Propulsion Laboratory, California Institute of Technology)Published versio

    Regional geochemical and geophysical surveys in the Berwyn Dome and adjacent areas, north Wales

    Get PDF
    This report describes stream sediment and gravity surveys carried out across the Berwyn Dome and adjacent areas. The gravity survey confirmed the presence of a broad regional Bouguer anomaly low in the central part of the Dome, on which is superimposed several smaller irregular highs and lows. Some of these local anomalies possibly reflect small igneous bodies but more detailed gravity surveys would be needed to determine their form. Near Corwen the Bryneglwys Fault coincides with a 4.5 mGa1 anomaly but southwards the two features diverge, suggesting that the density interface is related either to a splay fault or to the eastern margin of the Lower Palaeozoic Montgomery trough. Some other structural trends are weakly reflected on the Bouguer anomaly and aeromagnetic maps, but there is no clear correlation with known base metal mineralisation. The Bouguer anomalies cannot be attributed to particular structures with any certainty but are probably due to a number of factors, including variation in the Precambrian basement and changes in the lithology and thickness of Lower Palaeozoic sedimentary rocks. There is no evidence for a large granitic body in Lower Palaeozoic rocks underlying the mineralisation at Llangynog. The aeromagnetic map suggests the presence of a magnetic basement at a depth of 3-4 km centred beneath the northwestern margin of the Dome. The stream sediment survey involved the collection of a - 100 mesh stream sediment, panned concentrate and water sample from each of the 399 sites sampled. The sample density was 1 site per 1.5 km*. Cu, Pb, Zn, Ba, Fe, Mn, Co, V, Cr, Ni, Zr, MO and Sn were determined in the stream sediments, Cu, Pb, Zn, Ba, Fe, Mn, Ce, Sn, Sb, Ti, Ni and As in the panned concentrates and Cu, Pb and Zn in stream waters. Major variations in the results are related to (i) hydrous oxide precipitation processes, (ii) contamination from human activities, (iii) base metal and baryte mineralisation, (iv) monazite concentrations in panned concentrates, (v) hitherto unrecorded gold mineralisation and (vi) lithological variations. The latter were related principally to shale-sandstone variation, but groups of elements attributable to the presence of basic intrusions, phosphatic rocks, coal measures, sandstones, limestones and volcanics were also discerned. Threshold levels were established from cumulative frequency curve analysis, and some anomalous sites were examined in the field. Anomalies did not form prominent coherent groups and were generally weak and scattered, with a wide variety of element groupings reflecting a range of causes. Many anomalous panned concentrates were examined mineralogically to try to . determine whether anomalies were related to chemically extreme background lithologies, contamination, or mineralisation. All the anomalies were related to one or more of the major causes of variation, although because of the very limited amount of follow-up work carried out the precise cause of many anomalies remains uncertain. No anomaly is considered to represent a strong prospect but several deserve further limited investigation, notably those associated with (i) gold mineralisation in the northwest of the area, (ii) baryte, perhaps accompanied by base metal ’ mineralisation, associated with Caradocian volcanics and phosphatic rocks at several localities, (iii) mineralisation associated with Llandeilian limestones and volcanic rocks north of Llanrhaeadr, and (iv) copper mineralisation associated with intrusives near the eastern margin of the Dome, where survey data is most incomplete

    Feature-rich networks: going beyond complex network topologies.

    Get PDF
    Abstract The growing availability of multirelational data gives rise to an opportunity for novel characterization of complex real-world relations, supporting the proliferation of diverse network models such as Attributed Graphs, Heterogeneous Networks, Multilayer Networks, Temporal Networks, Location-aware Networks, Knowledge Networks, Probabilistic Networks, and many other task-driven and data-driven models. In this paper, we propose an overview of these models and their main applications, described under the common denomination of Feature-rich Networks, i. e. models where the expressive power of the network topology is enhanced by exposing one or more peculiar features. The aim is also to sketch a scenario that can inspire the design of novel feature-rich network models, which in turn can support innovative methods able to exploit the full potential of mining complex network structures in domain-specific applications
    • 

    corecore