1,345 research outputs found

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Bayesian methods for source attribution using HIV deep sequence data

    Get PDF
    The advent of pathogen deep-sequencing technology provides new opportunities for infec- tious disease surveillance, especially for fast-evolving viruses like human immunodeficiency virus (HIV). In particular, multiple reads per host contain detailed information on viral within- host diversity. This information allows the reconstruction of partial directed transmission networks, where estimates of who is source and who is recipient are directly available from the phylogenetic ordering of the viruses of any two individuals. This is a new approach for phylodynamics, and the topic of my thesis. In this thesis, I present updates to the bioinformatics pipeline used by the Phylogenetics And Networks for Generalised Epidemics in Africa consortium for processing HIV deep sequence data and running the phyloscanner program. I then present a semi-parametric Bayesian Poisson model for inferring infectious disease transmission flows and the sources of infection at the population level. The framework is computationally scalable in high- dimensional flow spaces thanks to Hilbert Space Gaussian process approximations, allows for sampling bias adjustments, and estimation of gender- and age-specific transmission flows at a finer resolution than previously possible. In this sense, the methods that I developed enable us to overcome some problems which have been unable to be solved by conventional phylodynamic approaches. We apply the approach to densely sampled, population-based HIV deep-sequence data from Rakai, Uganda. I focus on characterising age-specific transmission dynamics, and examining the sources of HIV infections in adolescent and young women in particular.Open Acces

    Scale challenges in inventory of forests aided by remote sensing

    Get PDF
    The impact of changing the scale of observation on information derived from forest inventories is the basis of scale-related research in forest inventory and analysis (FIA). Interactions between the scale of observation and observed heterogeneity in studied variables highlight a dependence on scale that affects measurements, estimates, and relationships between inventory data from terrestrial and remote sensing surveys. This doctoral research defines "scale" as the divisions of continuous space over which measurements are made, or hierarchies of discrete units of study/analysis in space. Therefore, the "scale of observation" (also known as support) refers to that integral of space over which statistics are computed and forest inventory variables regionalized. Given the ubiquitous nature of scale issues, a case study approach was undertaken in this research (Articles I-IV) with the goal to provide fundamental understanding of responses to the scale of observation for specific FIA variables. The studied forest inventory variables are; forest stand structural heterogeneity, forest cover proportion and tree species identities. Forest cover proportion (or simply forest area) and tree species are traditional and fundamental forest inventory variables commonly assessed over large areas using both terrestrial samples and remote sensing data whereas, forest stand structural heterogeneity is a contemporary FIA variable that is increasingly demanded in multi-resource inventories to inform management and conservation efforts as it is linked to biodiversity, productivity, ecosystem functioning and productivity, and used as auxiliary data in forest inventory. This research has two overall aims: 1. To improve the understanding of the association between the scale of observation and observed heterogeneity in inventory of forest stand structural heterogeneity, forest-cover proportions, and identification of tree species from a combination of terrestrial samples and remote sensing data. 2. To contribute knowledge to the estimation of scale-dependence in inventory of forest stand structural heterogeneity, forest-cover proportions, and identification of tree species from a combination of terrestrial samples and remote sensing data. Different scales of observation were considered across the four case studies encompassing individual leaf, crown-part or branch, single-tree crown, forest stand, landscape and global levels of analysis. Terrestrial and remote sensing data sets from a variety of temperate forests in Germany and France were utilized across case studies. In cases where no inventory data were available, synthetic data was simulated at different scales of observation. Heterogeneity in FIA variable estimates was monitored across scales of observation using estimators of variance and associated precision. As too much heterogeneity is hardly interpreted due to a low signal to noise ratio, object-based image analysis (OBIA) methods were used to manage heterogeneity in high resolution remote sensing data before evaluating scale dependence or scaling across observed scales. Similarly, ensemble classification techniques were applied to address methodological heterogeneity across classifiers in a case study on classification of two physically and spectrally similar Pinus species. Across case studies, a dependence on the scale of observation was determined by linking estimates of heterogeneity to their respective scales of observation using linear regression and a combination of geo-statistics and Monte-Carlo approaches. In order to address scale-dependence, thresholds to scale domains were identified so as to enable efficient observation of studied FIA variables and scaling approaches proposed to bridge observations across scales. For scaling, this research evaluated the potential of different regression techniques to map forest stand structural heterogeneity and tree species wall-to-wall from remote sensing data. In addition, radiative transfer modelling was evaluated in the transfer between leaf and crown hyperspectra, and a global sampling grid framework proposed to efficiently link different stages of survey sampling. This research shows that the scale of observation affected all studied FIA variables albeit to varying degrees, conditioned on the spatial structure and aggregation properties of the assessed FIA variable (i.e. whether the variable is extensive, intensive or scale-specific) and the method used in aggregation on support (e.g. mean, variance, quantile etc.). The scale of observation affected measurements or estimates of the studied FIA variables as well as relationships between spatially structured FIA variables. The scale of observation determined observed heterogeneity in FIA variables, affected parameter retrieval from radiative transfer models, and affected variable selection and performance of models linking terrestrial and remote sensing data. On the other hand, this research shows that it is possible to determine domains of scale dependence within which to efficiently observe the studied FIA variables and to bridge between scales of observation using various scaling methods. The findings of this doctoral research are relevant for the general understanding of scale issues in FIA. Research in Article I, for example, informs optimization of plot sizes for efficient inventory and mapping of forest structural heterogeneity, as well as for the design of natural resource inventories. Similarly, research in Article II is applicable in large area forest (or general land) cover monitoring from sampling by both visual interpretation of high resolution remote sensing imagery and terrestrial surveys. This research is also useful to determine observation design for efficient inventory of land cover. Research in Article III contributes in many contexts of remote sensing assisted inventory of forests especially in management and conservation planning, pest and diseases control and in the estimation of biomass. Lastly, research in Article IV highlights scale-related effects in passive optical remote sensing of forests currently understudied and can ultimately contribute to sensor calibration and modelling approaches

    Graph Deep Learning for Time Series Forecasting

    Full text link
    Graph-based deep learning methods have become popular tools to process collections of correlated time series. Differently from traditional multivariate forecasting methods, neural graph-based predictors take advantage of pairwise relationships by conditioning forecasts on a (possibly dynamic) graph spanning the time series collection. The conditioning can take the form of an architectural inductive bias on the neural forecasting architecture, resulting in a family of deep learning models called spatiotemporal graph neural networks. Such relational inductive biases enable the training of global forecasting models on large time-series collections, while at the same time localizing predictions w.r.t. each element in the set (i.e., graph nodes) by accounting for local correlations among them (i.e., graph edges). Indeed, recent theoretical and practical advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing frameworks appealing and timely. However, most of the studies in the literature focus on proposing variations of existing neural architectures by taking advantage of modern deep learning practices, while foundational and methodological aspects have not been subject to systematic investigation. To fill the gap, this paper aims to introduce a comprehensive methodological framework that formalizes the forecasting problem and provides design principles for graph-based predictive models and methods to assess their performance. At the same time, together with an overview of the field, we provide design guidelines, recommendations, and best practices, as well as an in-depth discussion of open challenges and future research directions

    BusTr: Predicting Bus Travel Times from Real-Time Traffic

    Full text link
    We present BusTr, a machine-learned model for translating road traffic forecasts into predictions of bus delays, used by Google Maps to serve the majority of the world's public transit systems where no official real-time bus tracking is provided. We demonstrate that our neural sequence model improves over DeepTTE, the state-of-the-art baseline, both in performance (-30% MAPE) and training stability. We also demonstrate significant generalization gains over simpler models, evaluated on longitudinal data to cope with a constantly evolving world.Comment: 14 pages, 2 figures, 5 tables. Citation: "Richard Barnes, Senaka Buthpitiya, James Cook, Alex Fabrikant, Andrew Tomkins, Fangzhou Xu (2020). BusTr: Predicting Bus Travel Times from Real-Time Traffic. 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. doi: 10.1145/3394486.3403376

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Operationalization of Remote Sensing Solutions for Sustainable Forest Management

    Get PDF
    The great potential of remote sensing technologies for operational use in sustainable forest management is addressed in this book, which is the reprint of papers published in the Remote Sensing Special Issue “Operationalization of Remote Sensing Solutions for Sustainable Forest Management”. The studies come from three continents and cover multiple remote sensing systems (including terrestrial mobile laser scanning, unmanned aerial vehicles, airborne laser scanning, and satellite data acquisition) and a diversity of data processing algorithms, with a focus on machine learning approaches. The focus of the studies ranges from identification and characterization of individual trees to deriving national- or even continental-level forest attributes and maps. There are studies carefully describing exercises on the case study level, and there are also studies introducing new methodologies for transdisciplinary remote sensing applications. Even though most of the authors look forward to continuing their research, nearly all studies introduced are ready for operational use or have already been implemented in practical forestry
    • …
    corecore