1,038 research outputs found

    Discovery of Spatiotemporal Event Sequences

    Get PDF
    Finding frequent patterns plays a vital role in many analytics tasks such as finding itemsets, associations, correlations, and sequences. In recent decades, spatiotemporal frequent pattern mining has emerged with the main goal focused on developing data-driven analysis frameworks for understanding underlying spatial and temporal characteristics in massive datasets. In this thesis, we will focus on discovering spatiotemporal event sequences from large-scale region trajectory datasetes with event annotations. Spatiotemporal event sequences are the series of event types whose trajectory-based instances follow each other in spatiotemporal context. We introduce new data models for storing and processing evolving region trajectories, provide a novel framework for modeling spatiotemporal follow relationships, and present novel spatiotemporal event sequence mining algorithms

    Representation learning on heterogeneous spatiotemporal networks

    Get PDF
    “The problem of learning latent representations of heterogeneous networks with spatial and temporal attributes has been gaining traction in recent years, given its myriad of real-world applications. Most systems with applications in the field of transportation, urban economics, medical information, online e-commerce, etc., handle big data that can be structured into Spatiotemporal Heterogeneous Networks (SHNs), thereby making efficient analysis of these networks extremely vital. In recent years, representation learning models have proven to be quite efficient in capturing effective lower-dimensional representations of data. But, capturing efficient representations of SHNs continues to pose a challenge for the following reasons: (i) Spatiotemporal data that is structured as SHN encapsulate complex spatial and temporal relationships that exist among real-world objects, rendering traditional feature engineering approaches inefficient and compute-intensive; (ii) Due to the unique nature of the SHNs, existing representation learning techniques cannot be directly adopted to capture their representations. To address the problem of learning representations of SHNs, four novel frameworks that focus on their unique spatial and temporal characteristics are introduced: (i) collective representation learning, which focuses on quantifying the importance of each latent feature using Laplacian scores; (ii) modality aware representation learning, which learns from the complex user mobility pattern; (iii) distributed representation learning, which focuses on learning human mobility patterns by leveraging Natural Language Processing algorithms; and (iv) representation learning with node sense disambiguation, which learns contrastive senses of nodes in SHNs. The developed frameworks can help us capture higher-order spatial and temporal interactions of real-world SHNs. Through data-driven simulations, machine learning and deep learning models trained on the representations learned from the developed frameworks are proven to be much more efficient and effective”--Abstract, page iii

    Soil fungi, but not bacteria, track vegetation reassembly across a 30-year restoration chronosequence in the northern jarrah forest, Western Australia

    Get PDF
    Plant communities have been the primary focus of ecological restoration initiatives; however, the integration of the soil microbiome has become of interest to restoration practice and theory. The inter-dependent nature of the above- and belowground biological environments has led to assumptions that reciprocal shifts in community compositions will occur in response to disturbance and restoration. Ecological restoration of post-mining landscapes within the northern jarrah forest re-instates vegetation communities that are representative of those in adjacent reference forest. The limited studies of soil microbial communities have not addressed whether these communities recover along similar trajectories to plant communities aboveground. Here, a 30-year restoration chronosequence of vegetation development was compared with that of the belowground assemblages of bacteria and fungi, identified using environmental DNA methods. Novel findings of this study highlight similarities between restoration trajectories of fungal and vegetation assemblages, though both remained distinct from reference jarrah forest compositions after 27-years. In contrast, soil bacterial assemblages in restored jarrah forest re-assembled rapidly, with substrate depth being a greater driver of composition than vegetation. Explanatory environmental variables, such as litter cover and initial fertiliser application, were significantly associated with vegetation composition. High covariance among physico-chemical factors made it difficult to establish influences of individual variables on bacterial and fungal communities. Litter depth was significantly associated with fungal composition across the restoration chronosequence, whilst available potassium was associated with both bacterial and fungal community composition. My findings add to a growing body of literature which acknowledges the rich diversity of the belowground microbial community, and the potential for their use as predictors of restoration trajectories. Future research could focus on direct associations between fungi and plant communities, such as potential for fungal inoculation to assist in the rapid reinstatement of missing plants which rely on symbiotic associations with the belowground microbiome

    Spatial Big Data Analytics: Classification Techniques for Earth Observation Imagery

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2016. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); xi, 120 pages.Spatial Big Data (SBD), e.g., earth observation imagery, GPS trajectories, temporally detailed road networks, etc., refers to geo-referenced data whose volume, velocity, and variety exceed the capability of current spatial computing platforms. SBD has the potential to transform our society. Vehicle GPS trajectories together with engine measurement data provide a new way to recommend environmentally friendly routes. Satellite and airborne earth observation imagery plays a crucial role in hurricane tracking, crop yield prediction, and global water management. The potential value of earth observation data is so significant that the White House recently declared that full utilization of this data is one of the nation's highest priorities. However, SBD poses significant challenges to current big data analytics. In addition to its huge dataset size (NASA collects petabytes of earth images every year), SBD exhibits four unique properties related to the nature of spatial data that must be accounted for in any data analysis. First, SBD exhibits spatial autocorrelation effects. In other words, we cannot assume that nearby samples are statistically independent. Current analytics techniques that ignore spatial autocorrelation often perform poorly such as low prediction accuracy and salt-and-pepper noise (i.e., pixels predicted as different from neighbors by mistake). Second, spatial interactions are not isotropic and vary across directions. Third, spatial dependency exists in multiple spatial scales. Finally, spatial big data exhibits heterogeneity, i.e., identical feature values may correspond to distinct class labels in different regions. Thus, learned predictive models may perform poorly in many local regions. My thesis investigates novel SBD analytics techniques to address some of these challenges. To date, I have been mostly focusing on the challenges of spatial autocorrelation and anisotropy via developing novel spatial classification models such as spatial decision trees for raster SBD (e.g., earth observation imagery). To scale up the proposed models, I developed efficient learning algorithms via computational pruning. The proposed techniques have been applied to real world remote sensing imagery for wetland mapping. I also had developed spatial ensemble learning framework to address the challenge of spatial heterogeneity, particularly the class ambiguity issues in geographical classification, i.e., samples with the same feature values belong to different classes in different spatial zones. Evaluations on three real world remote sensing datasets confirmed that proposed spatial ensemble learning outperforms current approaches such as bagging, boosting, and mixture of experts when class ambiguity exists

    CLIMATE ANOMALIES AND PRIMARY PRODUCTION IN LAKE SUPERIOR

    Get PDF
    This dissertation supports the modeling of primary production in Lake Superior by offering site specific kinetics and algorithms developed from lab experiments performed on the natural phytoplankton assemblage of Lake Superior. Functions, developed for temperature, light and nutrient conditions and the maximum specific rate of primary production, were incorporated in a 1D specific primary production model and confirmed to published in-situ measured rates of primary production. An extensive data set (supporting model calibration and confirmation), with a fine spatiotemporal resolution, was developed from field measurements taken bi-weekly during the sampling seasons of 2011, 2012 and 2014; considered to be meteorologically average, extremely warm and cold years, respectively. Samplings were taken at 11 stations along a 26 km transect extending lakeward from Michigan’s Keweenaw Peninsula covering the nearshore to offshore gradient. Measurements included: temperature, solar radiation, transparency, beam attenuation, chlorophyll-a fluorescence, colored dissolved organic matter, suspended solids and phosphorus and carbon constituents. Based on these measurements and application of the developed primary production model, patterns in primary production and driving forces (i.e. temperature, light and nutrients) are described in a seasonal, spatial, and interannual fashion. The signal feature in 2011 was the development of a mid-summer “desert” in the offshore surface waters (a period of suboptimal temperatures coincident with a high degree of phosphorus limitation). The manifestation of the “summer desert”, however, was most extreme during the warm year and nonexistent during the cold year. Offshore primary production in all years manifested a subsurface maximum in the upper area of the metalimnion, distinctly above the deep chlorophyll maximum, with rates of production being highest In 2011 (~20 mg C m-3 d-1) followed by 2012 (~17 mg Cm-3 d-1) and lowest in 2014 (~12 mg Cm-3 d-1). Driven by variances in biomass and forcing conditions, offshore areal primary production manifested differences in seasonal patterns between years as well. In 2011 and 2014 a negatively skewed bell-shape pattern was observed, differing in magnitude and timing. The pattern in 2012 differed from these years in magnitude and timing, manifesting elevated production in April and decreased production in September. Greatest areal production in 2012 occurred in July and August (~320 mg Cm-2 d-1), in 2014 in August (~265 mg Cm-2 d-1) and in 2011 production was greatest in July (253 mg C m-2 d-1). Areal production in the summer of 1998, calculated for EPA’s 19 offshore stations in Lake Superior, manifested comparable rates and averaged 224 ± 90 mg C m-2·d-1. Although in all years the development of the thermal bar (TB) occurred after the spring runoff event, an increase in chlorophyll-a concentration during the presence of the TB was observed in 2012. Rates of primary production during this period, however, decreased while the opposite occurred in 2014, signifying that changes in chlorophyll-a concentration should be interpreted carefully (especially if used to identify spring blooms). The information presented in this work not only offers site specific kinetics, appropriate algorithms in support of primary production modeling and an extensive dataset supporting model calibration and confirmation, it also offers new insights into the dynamics of the Lake Superior ecosystem and the forces driving its function

    Linking Environmental Variability to the Biogeography of Placopecten magellanicus in the Gulf of Maine

    Get PDF
    The Atlantic sea scallop (Placopecten magellanicus) supports a highly valuable fishery in the United States over its range on the Northwest Atlantic Shelf. Scallop distribution has been shown to be highly affected by changes in climactic variables. Therefore, long-term changes in the thermal regime of the Gulf of Maine are expected to greatly impact scallop ecology; however, these projected changes have rarely been quantified. The modeling framework developed for my dissertation research will improve our understanding of the distribution of scallop habitat as well as the biogeography for this species. Additionally, this modeling capacity will provide several tangible tools to visualize species distribution over space and time as well as to evaluate potential impacts of a changing Gulf of Maine ecosystem. The framework for my dissertation research is comprised of 1) a bioclimate envelope covering the Gulf of Maine to quantify spatiotemporal variability in scallop habitat; 2) a statistical species distribution model to predict spatiotemporal changes in scallop distribution in the Gulf of Maine; 3) the design of a dredge survey in the Northern Gulf of Maine to obtain scallop biomass estimates; and 4) a two-stage modeling and computer simulation framework to refine fisheries surveys. Due to changing oceanographic conditions within the Gulf of Maine ecosystem it is becoming increasingly important to view resource management from within the context of climate change. Effective management of marine resources requires knowledge of population distribution and dynamics, however; fisheries managers must frequently base decisions on limited information. The modeling framework developed in my dissertation establishes the ability to better visualize sea scallop distribution as well as to evaluate the potential impacts of a changing ecosystem on this species. The results provided by this research increase the extent of knowledge about sea scallop ecology and have the potential to contribute to the conservation of this species. Additionally, the modeling approaches developed throughout my dissertation are highly generalizable to a variety of commercially important species and may be useful in advising conservation efforts for other fisheries in the Northwest Atlantic to help ensure the implementation of adaptive management strategies under uncertain climate conditions

    Computation in Complex Networks

    Get PDF
    Complex networks are one of the most challenging research focuses of disciplines, including physics, mathematics, biology, medicine, engineering, and computer science, among others. The interest in complex networks is increasingly growing, due to their ability to model several daily life systems, such as technology networks, the Internet, and communication, chemical, neural, social, political and financial networks. The Special Issue “Computation in Complex Networks" of Entropy offers a multidisciplinary view on how some complex systems behave, providing a collection of original and high-quality papers within the research fields of: • Community detection • Complex network modelling • Complex network analysis • Node classification • Information spreading and control • Network robustness • Social networks • Network medicin

    Modeling Visit Potential of Geographic Locations Based on Mobility Data

    Get PDF
    Every day people interact with the environment by passing or visiting geographic locations. Information about such entity-location interactions can be used in a number of applications and its value has been recognized by companies and public institutions. However, although the necessary tracking technologies such as GPS, GSM or RFID have long found their way into everyday life, the practical usage of visit information is still limited. Besides economic and ethical reasons for the restricted usage of entity-location interactions there are also two very basic problems. First, no formal definition of entity-location interaction quantities exists. Second, at the current state of technology, no tracking technology guarantees complete observations, and the treatment of missing data in mobility applications has been neglected in trajectory data mining so far. This thesis therefore focuses on the definition and estimation of quantities about the visiting behavior between mobile entities and geographic locations from incomplete mobility data. In a first step we provide an application-independent language to evaluate entity-location interactions. Based on a uniform notation, we define a family of quantities called visit potential, which contains the most basic interaction quantities and can be extended on need. By identifying the common background of all quantities we are able to analyze relationships between different quantities and to infer consistency requirements between related parameterizations of the quantities. We demonstrate the general applicability of visit potential using two real-world applications for which we give a precise definition of the employed entity-location interaction quantities in terms of visit potential. Second, this thesis provides the first systematic analysis of methods for the handling of missing data in mobility mining. We select a set of promising methods that take different approaches to handling missing data and test their robustness with respect to different scenarios. Our analyses consider different mechanisms and intensities of missing data under artificial censoring as well as varying visit intensities. We hereby analyze not only the applicability of the selected methods but also provide a systematic approach for parameterization and testing that can also be applied to the analysis of other mobility data sets. Our experiments show that only two of the tested methods supply unbiased estimates of visit potential quantities and are applicable to the domain. In addition, both methods supply unbiased estimates only of a single quantity. Therefore, it will be a future challenge to design methods for the entire collection of visit potential quantities. The topic of this thesis is motivated by applied research at the Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS for business applications in outdoor advertisement. We will use the outdoor advertisement scenario throughout this thesis for demonstration and experimentation.Modellierung von Besuchsgrößen geographischer Orte anhand von Mobilitätsdaten Täglich interagieren Menschen mit ihrer Umgebung, indem sie sich im geografischen Raum bewegen oder gezielt geografische Orte aufsuchen. Informationen über derartige Besuche sind sehr wertvoll und können in einer Reihe von Anwendungen eingesetzt werden. Üblicherweise werden dazu die Bewegungen von Personen mit Hilfe von GPS, GSM oder RFID Technologien verfolgt. Durch eine räumliche Verschneidung der Trajektorien mit der Positionsangabe eines bestimmten Ortes können dann die Besuche extrahiert werden. Allerdings ist derzeitig die Verwendung von Besuchsinformationen in der Praxis begrenzt. Dies hat, neben ökonomischen und ethischen Gründen, vor allem zwei grundlegende Ursachen. Erstens existiert keine formelle Definition von Größen, um Besuchsinformationen einheitlich auszuwerten. Zweitens können aktuelle Technologien keine vollständige Erfassung von Bewegungsinformationen garantieren. Das bedeutet, dass die Basisdaten zur Auswertung von Besuchsinformationen grundsätzlich Lücken enthalten. Für eine fehlerfreie Auswertung der Daten müssen diese Lücken adäquat behandelt werden. Allerdings wurde dieses Thema in der bisherigen Data Mining Literatur zur Auswertung von Bewegungsdaten vernachlässigt. Daher widmet sich diese Dissertation der Definition von Größen zur Auswertung von Besuchsinformationen sowie dem Schätzen dieser Größen aus unvollständigen Bewegungsdaten. Im ersten Teil der Dissertation wird eine anwendungsunabhängige Beschreibungssprache formuliert, um Besuchsinformationen auszuwerten. Auf Basis einer einheitlichen Notation wird eine Familie von Größen namens visit potential definiert, die grundlegende Besuchsgrößen enthält und offen für Erweiterungen ist. Die gemeinsame Basis aller Besuchsgrößen erlaubt weiterhin, Beziehungen zwischen verschiedenen Größen zu analysieren sowie Konsistenzanforderungen zwischen ähnlichen Parametrisierungen der Größen abzuleiten. Abschließend zeigt die Arbeit die generelle Anwendbarkeit der definierten Besuchsgrößen in zwei realen Anwendungen, für die eine präzise Definition der eingesetzten Statistiken mit Hilfe der Besuchsgrößen gegeben wird. Der zweite Teil der Dissertation enthält die erste systematische Methodenanalyse für die Handhabung von unvollständigen Bewegungsdaten. Hierfür werden vier vielversprechende Methoden aus unterschiedlichen Bereichen zur Behandlung von fehlenden Daten ausgewählt und auf ihre Robustheit unter verschiedenen Annahmen getestet. Mit Hilfe einer künstlichen Zensur werden verschiedene Mechanismen und Grade von fehlenden Daten untersucht. Außerdem wird die Robustheit der Methoden für verschieden hohe Besuchsniveaus betrachtet. Die durchgeführten Experimente geben dabei nicht nur Auskunft über die Anwendbarkeit der getesteten Methoden, sondern stellen auch ein systematisches Vorgehen für das Testen und Parametrisieren weiterer Methoden zur Verfügung. Die Ergebnisse der Experimente belegen, dass nur zwei der vier ausgewählten Methoden für die Schätzung von Besuchsgrößen geeignet sind. Beide Methoden liefern jedoch nur für jeweils eine Besuchsgröße erwartungstreue Schätzwerte. Daher besteht eine zukünftige Herausforderung darin, Schätzmethoden für die Gesamtheit an Besuchsgrößen zu entwickeln. Diese Arbeit ist durch anwendungsorientierte Forschung am Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS im Bereich der Außenwerbung motiviert. Das Außenwerbeszenario sowie die darüber zur Verfügung gestellten Anwendungsdaten werden durchgängig zur Demonstration und für die Experimente in der Arbeit eingesetzt
    corecore