Search CORE

8 research outputs found

MML Probabilistic Principal Component Analysis

Author: Makalic Enes
Schmidt Daniel F.
Publication venue
Publication date: 16/02/2023
Field of study

Principal component analysis (PCA) is perhaps the most widely method for data dimensionality reduction. A key question in PCA decomposition of data is deciding how many factors to retain. This manuscript describes a new approach to automatically selecting the number of principal components based on the Bayesian minimum message length method of inductive inference. We also derive a new estimate of the isotropic residual variance and demonstrate, via numerical experiments, that it improves on the usual maximum likelihood approach

arXiv.org e-Print Archive

Managing uncertainty in integrated environmental modelling:the UncertWeb framework

Author: Bastin Lucy
Cornford Dan
Heuvelink Gerard B.M.
Jones Richard
Mazzetti Paolo
Nativi Stefano
Pebesma Edzer
Stasch Christoph
Williams Matthew
Publication venue: 'Elsevier BV'
Publication date: 12/05/2011
Field of study

Web-based distributed modelling architectures are gaining increasing recognition as potentially useful tools to build holistic environmental models, combining individual components in complex workflows. However, existing web-based modelling frameworks currently offer no support for managing uncertainty. On the other hand, the rich array of modelling frameworks and simulation tools which support uncertainty propagation in complex and chained models typically lack the benefits of web based solutions such as ready publication, discoverability and easy access. In this article we describe the developments within the UncertWeb project which are designed to provide uncertainty support in the context of the proposed ‘Model Web’. We give an overview of uncertainty in modelling, review uncertainty management in existing modelling frameworks and consider the semantic and interoperability issues raised by integrated modelling. We describe the scope and architecture required to support uncertainty management as developed in UncertWeb. This includes tools which support elicitation, aggregation/disaggregation, visualisation and uncertainty/sensitivity analysis. We conclude by highlighting areas that require further research and development in UncertWeb, such as model calibration and inference within complex environmental models

JRC Publications Repository

Aston Publications Explorer

Wageningen University & Research Publications

Ergonomics of the Operative Field in Paediatric Minimal Access Surgery

Author: Lee Alex Chi Hang
Lee Alex Chi Hang
Publication venue
Publication date: 01/01/2009
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Outside The Machine Learning Blackbox: Supporting Analysts Before And After The Learning Algorithm

Author: Munson Miles
Publication venue
Publication date: 05/08/2010
Field of study

Applying machine learning to real problems is non-trivial because many important steps are needed to prepare for learning and to interpret the results after learning. This dissertation investigates four problems that arise before and after applying learning algorithms. First, how can we verify a dataset contains "good" information? I propose cross-data validation for quantifying the quality of a dataset relative to a benchmark dataset and define a data efficiency ratio that measures how efficiently the dataset in question collects information (relative to the benchmark). Using these methods I demonstrate the quality of bird observations collected by the eBird citizen science project which has few quality controls. Second, can off-the-shelf algorithms learn a model with good task-specific performance, or must the user have expertise both in the domain and in machine learning? In many applications, standard performance metrics are inappropriate, and most analysts lack the expertise or time to customize algorithms to optimize task-specific metrics. Ensemble selection offers a potential solution: build an ensemble to optimize the desired metric. I evaluate ensemble selection's ability to optimize for domain-specific metrics on natural language processing tasks and show that ensemble selection usually improves performance but sometimes overfits. Third, how can we understand complex models? Understanding a model often is as important its accuracy. I propose and evaluate statistics for measuring the importance of inputs used by a decision tree ensemble. The statistics agree with sensitivity analysis and, in an application to bird distribution models, are 500 times faster to compute. The statistics have been used to study hundreds of bird distribution models. Fourth, how should data be pre-processed when learning a high-performing ensemble? I examine the behavior of variable selection and bagging using a bias-variance analysis of error. The results show that the most accurate variable subset corresponds to the best bias-variance trade-off point. Often, this is not the point separating relevant from irrelevant inputs. Variable selection should be viewed as a variance reduction method and thus is often redundant for low variance methods like bagging. The best bagged model performance usually is obtained using all available inputs

eCommons@Cornell

UNDERSTANDING OF THE VARIABILITY OF PHYTOPLANKTON ECOSYSTEM FUNCTION PROPERTIES: A SYNERGISTIC USE OF REMOTE SENSING AND IN SITU DATA

Author: RAITSOS-EXARCHOPOULOS DIONYSIOS
Publication venue: 'University of Plymouth'
Publication date: 01/01/2006
Field of study

The majority of the earth's surface (-71%) is covered by the aquatic environment where 97% of that is the oceanic regime. Almost every part of the aquatic regime is dominated by microscopic plants called phytoplankton. Being at the bottom of the food chain, these ecological drivers influence the earth's climate system as well as the biodiversity trends of other organisms such as zooplankton, fish, sea birds and marine mammals. The aim of this research was to understand the ecology of phytoplankton and assess which environmental, physical, biological, and spatiotemporal factors influence their distribution and abundance. Using this information a knowledge-based expert system discriminated phytoplankton functional types. The ecological knowledge was derived from the Continuous Plankton Recorder (CPR) survey, whereas information regarding the physical regime was acquired from satellite remote sensing. The data matrix was analysed using Generalised Additive Models (GAMs) and Artificial Neural Networks (ANNs). The significant relationships developed by the synergistic use of CPR measure of phytoplankton biomass and satellite chlorophyll a (Chl-a), allowed the production of a >50 years Chl-a dataset in the Northeast Atlantic and North Sea. It was found that the documented mid-80s regime shift corresponded to a 60% increase in Chl-a since 1948; a result of an 80% increase in Chl-a during winter alongside a smaller summer increase. GAMs indicated that the combined effects of high solar radiation, shallow mixed layer depth and increased temperatures explained more than 89% of the coccolithophore variation. The June 1998 bloom, which was associated with high light intensity, unusually high sea-surface temperature (SST) and a very shallow mixed layer, was found to be one of the most extensive ( -1 million km² ) blooms ever recorded. There was a pronounced SST shift in the mid-1990s with a peak in 1998, suggesting that exceptionally large blooms are caused by pronounced environmental conditions and the variability of the physical environment strongly affects the spatial extent of these blooms. Diatom abundance in the epipelagic zone of the Northern North Atlantic was mainly driven by SST. The ANNs indicated that higher SSTs could lead to a rapid decrease in diatom abundance; increased SST can stratify the water column for longer preventing nutrients from being available. Therefore, further increases may be devastating to diatoms but may benefit smaller plankton such as coccolithophores and/or dinoflagellates. Finally, the knowledge gained though the developed methodological approaches was used to identify/discriminate phytoplankton functional groups (diatoms, dinoflagellates, coccolithophores and silicoflagellates) with an accuracy of greater than 70%. The most important information for phytoplankton functional group discrimination was spatiotemporal information, and for the physical environment was SST. Future research aimed at the identification of functional groups from remotely sensed data should include fundamental information on the physical environment as well as spatiotemporal information and not just based on bio-optical measurements. Further development, potential applications and future research are discussed.Sir Alister Hardy Foundation for Ocean Scienc

Plymouth Electronic Archive and Research Library

OpenGrey Repository

Isotope geochemistry and petrology of Dalradian metacarbonate rocks

Author: Thomas Christopher Walter
Publication venue: The University of Edinburgh
Publication date: 01/01/1999
Field of study

Edinburgh Research Archive