132 research outputs found
Machine Learning Approaches for Natural Resource Data
Abstract
Real life applications involving efficient management of natural resources are dependent on accurate geographical information. This information is usually obtained by manual on-site data collection, via automatic remote sensing methods, or by the mixture of the two. Natural resource management, besides accurate data collection, also requires detailed analysis of this data, which in the era of data flood can be a cumbersome process. With the rising trend in both computational power and storage capacity, together with lowering hardware prices, data-driven decision analysis has an ever greater role.
In this thesis, we examine the predictability of terrain trafficability conditions and forest attributes by using a machine learning approach with geographic information system data. Quantitative measures on the prediction performance of terrain conditions using natural resource data sets are given through five distinct research areas located around Finland. Furthermore, the estimation capability of key forest attributes is inspected with a multitude of modeling and feature selection techniques. The research results provide empirical evidence on whether the used natural resource data is sufficiently accurate enough for practical applications, or if further refinement on the data is needed. The results are important especially to forest industry since even slight improvements to the natural resource data sets utilized in practice can result in high saves in terms of operation time and costs.
Model evaluation is also addressed in this thesis by proposing a novel method for estimating the prediction performance of spatial models. Classical model goodness of fit measures usually rely on the assumption of independently and identically distributed data samples, a characteristic which normally is not true in the case of spatial data sets. Spatio-temporal data sets contain an intrinsic property called spatial autocorrelation, which is partly responsible for breaking these assumptions. The proposed cross validation based evaluation method provides model performance estimation where optimistic bias due to spatial autocorrelation is decreased by partitioning the data sets in a suitable way.
Keywords: Open natural resource data, machine learning, model evaluationTiivistelmä
Käytännön sovellukset, joihin sisältyy luonnonvarojen hallintaa ovat riippuvaisia tarkasta paikkatietoaineistosta. Tämä paikkatietoaineisto kerätään usein manuaalisesti paikan päällä, automaattisilla kaukokartoitusmenetelmillä tai kahden edellisen yhdistelmällä. Luonnonvarojen hallinta vaatii tarkan aineiston keräämisen lisäksi myös sen yksityiskohtaisen analysoinnin, joka tietotulvan aikakautena voi olla vaativa prosessi. Nousevan laskentatehon, tallennustilan sekä alenevien laitteistohintojen myötä datapohjainen päätöksenteko on yhä suuremmassa roolissa.
Tämä väitöskirja tutkii maaston kuljettavuuden ja metsäpiirteiden ennustettavuutta käyttäen koneoppimismenetelmiä paikkatietoaineistojen kanssa. Maaston kuljettavuuden ennustamista mitataan kvantitatiivisesti käyttäen kaukokartoitusaineistoa viideltä eri tutkimusalueelta ympäri Suomea. Tarkastelemme lisäksi tärkeimpien metsäpiirteiden ennustettavuutta monilla eri mallintamistekniikoilla ja piirteiden valinnalla. Väitöstyön tulokset tarjoavat empiiristä todistusaineistoa siitä, onko käytetty luonnonvaraaineisto riittävän laadukas käytettäväksi käytännön sovelluksissa vai ei. Tutkimustulokset ovat tärkeitä erityisesti metsäteollisuudelle, koska pienetkin parannukset luonnonvara-aineistoihin käytännön sovelluksissa voivat johtaa suuriin säästöihin niin operaatioiden ajankäyttöön kuin kuluihin.
Tässä työssä otetaan kantaa myös mallin evaluointiin esittämällä uuden menetelmän spatiaalisten mallien ennustuskyvyn estimointiin. Klassiset mallinvalintakriteerit nojaavat yleensä riippumattomien ja identtisesti jakautuneiden datanäytteiden oletukseen, joka ei useimmiten pidä paikkaansa spatiaalisilla datajoukoilla. Spatio-temporaaliset datajoukot sisältävät luontaisen ominaisuuden, jota kutsutaan spatiaaliseksi autokorrelaatioksi. Tämä ominaisuus on osittain vastuussa näiden oletusten rikkomisesta. Esitetty ristiinvalidointiin perustuva evaluointimenetelmä tarjoaa mallin ennustuskyvyn mitan, missä spatiaalisen autokorrelaation vaikutusta vähennetään jakamalla datajoukot sopivalla tavalla.
Avainsanat: Avoin luonnonvara-aineisto, koneoppiminen, mallin evaluoint
Machine learned daily life history classification using low frequency tracking data and automated modelling pipelines: application to North American waterfowl
Background: Identifying animal behaviors, life history states, and movement patterns is a prerequisite for many animal behavior analyses and effective management of wildlife and habitats. Most approaches classify short-term movement patterns with high frequency location or accelerometry data. However, patterns reflecting life history across longer time scales can have greater relevance to species biology or management needs, especially when available in near real-time. Given limitations in collecting and using such data to accurately classify complex behaviors in the long-term, we used hourly GPS data from 5 waterfowl species to produce daily activity classifications with machine-learned models using “automated modelling pipelines”. Methods: Automated pipelines are computer-generated code that complete many tasks including feature engineering, multi-framework model development, training, validation, and hyperparameter tuning to produce daily classifications from eight activity patterns reflecting waterfowl life history or movement states. We developed several input features for modeling grouped into three broad categories, hereafter “feature sets”: GPS locations, habitat information, and movement history. Each feature set used different data sources or data collected across different time intervals to develop the “features” (independent variables) used in models. Results: Automated modelling pipelines rapidly developed easily reproducible data preprocessing and analysis steps, identification and optimization of the best performing model and provided outputs for interpreting feature importance. Unequal expression of life history states caused unbalanced classes, so we evaluated feature set importance using a weighted F1-score to balance model recall and precision among individual classes. Although the best model using the least restrictive feature set (only 24 hourly relocations in a day) produced effective classifications (weighted F1 = 0.887), models using all feature sets performed substantially better (weighted F1 = 0.95), particularly for rarer but demographically more impactful life history states (i.e., nesting). Conclusions: Automated pipelines generated models producing highly accurate classifications of complex daily activity patterns using relatively low frequency GPS and incorporating more classes than previous GPS studies. Near real-time classification is possible which is ideal for time-sensitive needs such as identifying reproduction. Including habitat and longer sequences of spatial information produced more accurate classifications but incurred slight delays in processing
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)
Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The PhD Symposium was a very good opportunity for the young researchers to share information and knowledge, to
present their current research, and to discuss topics with other students in order to look for synergies and common research
topics. The idea was very successful and the assessment made by the PhD Student was very good. It also helped to
achieve one of the major goals of the NESUS Action: to establish an open European research network targeting sustainable
solutions for ultrascale computing aiming at cross fertilization among HPC, large scale distributed systems, and big
data management, training, contributing to glue disparate researchers working across different areas and provide a meeting
ground for researchers in these separate areas to exchange ideas, to identify synergies, and to pursue common activities in
research topics such as sustainable software solutions (applications and system software stack), data management, energy
efficiency, and resilience.European Cooperation in Science and Technology. COS
Exploiting Multi-Level Parallelism in Streaming Applications for Heterogeneous Platforms with GPUs
Heterogeneous computing platforms support the traditional types of
parallelism, such as e.g., instruction-level, data, task, and pipeline
parallelism, and provide the opportunity to exploit a combination of
different types of parallelism at different platform levels. The
architectural diversity of platform components makes tapping into the
platform potential a challenging programming task. This thesis makes an
important step in this direction by introducing a novel methodology for
automatic generation of structured, multi-level parallel programs from
sequential applications. We introduce a novel hierarchical intermediate
program representation (HiPRDG) that captures the notions of structure
and hierarchy in the polyhedral model used for compile-time program
transformation and code generation. Using the HiPRDG as the starting
point, we present a novel method for generation of multi-level programs
(MLPs) featuring different types of parallelism, such as task, data, and
pipeline parallelism. Moreover, we introduce concepts and techniques for
data parallelism identification, GPU code generation, and asynchronous
data-driven execution on heterogeneous platforms with efficient
overlapping of host-accelerator communication and computation. By
enabling the modular, hybrid parallelization of program model components
via HiPRDG, this thesis opens the door for highly efficient tailor-made
parallel program generation and auto-tuning for next generations of
multi-level heterogeneous platforms with diverse accelerators.Computer Systems, Imagery and Medi
Action-oriented Scene Understanding
In order to allow robots to act autonomously it is crucial that they do not only describe their environment accurately but also identify how to interact with their surroundings.
While we witnessed tremendous progress in descriptive computer vision, approaches that explicitly target action are scarcer.
This cumulative dissertation approaches the goal of interpreting visual scenes “in the wild” with respect to actions implied by the scene. We call this approach action-oriented scene understanding. It involves identifying and judging opportunities for interaction with constituents of the scene (e.g. objects and their parts) as well as understanding object functions and how interactions will impact the future. All of these aspects are addressed on three levels of abstraction: elements, perception and reasoning.
On the elementary level, we investigate semantic and functional grouping of objects by analyzing annotated natural image scenes. We compare object label-based and visual context definitions with respect to their suitability for generating meaningful object class representations. Our findings suggest that representations generated from visual context are on-par in terms of semantic quality with those generated from large quantities of text.
The perceptive level concerns action identification. We propose a system to identify possible interactions for robots and humans with the environment (affordances) on a pixel level using state-of-the-art machine learning methods. Pixel-wise part annotations of images are transformed into 12 affordance maps. Using these maps, a convolutional neural network is trained to densely predict affordance maps from unknown RGB images. In contrast to previous work, this approach operates exclusively on RGB images during both, training and testing, and yet achieves state-of-the-art performance.
At the reasoning level, we extend the question from asking what actions are possible to what actions are plausible. For this, we gathered a dataset of household images associated with human ratings of the likelihoods of eight different actions. Based on the judgement provided by the human raters, we train convolutional neural networks to generate plausibility scores from unseen images.
Furthermore, having considered only static scenes previously in this thesis, we propose a system that takes video input and predicts plausible future actions. Since this requires careful identification of relevant features in the video sequence, we analyze this particular aspect in detail using a synthetic dataset for several state-of-the-art video models. We identify feature learning as a major obstacle for anticipation in natural video data.
The presented projects analyze the role of action in scene understanding from various angles and in multiple settings while highlighting the advantages of assuming an action-oriented perspective.
We conclude that action-oriented scene understanding can augment classic computer vision in many real-life applications, in particular robotics
Development of Deep Learning Hybrid Models for Hydrological Predictions
The Abstract is currently unavailable, due to the thesis being under Embargo
Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach
Institute for Adaptive and Neural ComputationAward number: 98318242.This thesis is about the dynamic generalisation of continuous action spaces in
reinforcement learning problems.
The standard Reinforcement Learning (RL) account provides a principled and comprehensive
means of optimising a scalar reward signal in a Markov Decision Process.
However, the theory itself does not directly address the imperative issue of generalisation
which naturally arises as a consequence of large or continuous state and action
spaces. A current thrust of research is aimed at fusing the generalisation capabilities
of supervised (and unsupervised) learning techniques with the RL theory. An example
par excellence is Tesauro’s TD-Gammon.
Although much effort has gone into researching ways to represent and generalise over
the input space, much less attention has been paid to the action space. This thesis
first considers the motivation for learning real-valued actions, and then proposes a
set of key properties desirable in any candidate algorithm addressing generalisation
of both input and action spaces. These properties include: Provision of adaptive and
online generalisation, adherence to the standard theory with a central focus on estimating
expected reward, provision for real-valued states and actions, and full support
for a real-valued discounted reward signal. Of particular interest are issues pertaining
to robustness in non-stationary environments, scalability, and efficiency for real-time
learning in applications such as robotics. Since exploring the action space is discovered
to be a potentially costly process, the system should also be flexible enough to
enable maximum reuse of learned actions.
A new approach is proposed which succeeds for the first time in addressing all of the
key issues identified. The algorithm, which is based on the ubiquitous self-organising
map, is analysed and compared with other techniques including those based on the
backpropagation algorithm. The investigation uncovers some important implications
of the differences between these two particular approaches with respect to RL. In particular,
the distributed representation of the multi-layer perceptron is judged to be
something of a double-edged sword offering more sophisticated and more scalable
generalising power, but potentially causing problems in dynamic or non-equiprobable
environments, and tasks involving a highly varying input-output mapping.
The thesis concludes that the self-organising map can be used in conjunction with current
RL theory to provide real-time dynamic representation and generalisation of continuous
action spaces. The proposed model is shown to be reliable in non-stationary,
unpredictable and noisy environments and judged to be unique in addressing and satisfying
a number of desirable properties identified as important to a large class of RL
problems
Machine Intelligence in Africa: a survey
In the last 5 years, the availability of large audio datasets in African
countries has opened unlimited opportunities to build machine intelligence (MI)
technologies that are closer to the people and speak, learn, understand, and do
businesses in local languages, including for those who cannot read and write.
Unfortunately, these audio datasets are not fully exploited by current MI
tools, leaving several Africans out of MI business opportunities. Additionally,
many state-of-the-art MI models are not culture-aware, and the ethics of their
adoption indexes are questionable. The lack thereof is a major drawback in many
applications in Africa. This paper summarizes recent developments in machine
intelligence in Africa from a multi-layer multiscale and culture-aware ethics
perspective, showcasing MI use cases in 54 African countries through 400
articles on MI research, industry, government actions, as well as uses in art,
music, the informal economy, and small businesses in Africa. The survey also
opens discussions on the reliability of MI rankings and indexes in the African
continent as well as algorithmic definitions of unclear terms used in MI.Comment: Accepted and to be presented at DSAI 202
- …