Search CORE

181,493 research outputs found

Using individual tracking data to validate the predictions of species distribution models

Author: Cornulier Thomas
Neat Francis
Pinto Cecilia
Scott Beth E.
Thorburn James A.
Travis Justin M. J.
Wright Peter J.
Wright Serena
Publication venue: 'Wiley'
Publication date: 23/03/2017
Field of study

The authors would like to thank the College of Life Sciences of Aberdeen University and Marine Scotland Science which funded CP's PhD project. Skate tagging experiments were undertaken as part of Scottish Government project SP004. We thank Ian Burrett for help in catching the fish and the other fishermen and anglers who returned tags. We thank José Manuel Gonzalez-Irusta for extracting and making available the environmental layers used as environmental covariates in the environmental suitability modelling procedure. We also thank Jason Matthiopoulos for insightful suggestions on habitat utilization metrics as well as Stephen C.F. Palmer, and three anonymous reviewers for useful suggestions to improve the clarity and quality of the manuscript.Peer reviewedPostprintPostprintPostprintPostprintPostprin

Aberdeen University Research

Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning

Author: de Rooij Mark
Fokkema Marjolein
Szabo Botond
van Loon Wouter
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

In biomedical research, many different types of patient data can be collected, such as various types of omics data and medical imaging modalities. Applying multi-view learning to these different sources of information can increase the accuracy of medical classification models compared with single-view procedures. However, collecting biomedical data can be expensive and/or burdening for patients, so that it is important to reduce the amount of required data collection. It is therefore necessary to develop multi-view learning methods which can accurately identify those views that are most important for prediction. In recent years, several biomedical studies have used an approach known as multi-view stacking (MVS), where a model is trained on each view separately and the resulting predictions are combined through stacking. In these studies, MVS has been shown to increase classification accuracy. However, the MVS framework can also be used for selecting a subset of important views. To study the view selection potential of MVS, we develop a special case called stacked penalized logistic regression (StaPLR). Compared with existing view-selection methods, StaPLR can make use of faster optimization algorithms and is easily parallelized. We show that nonnegativity constraints on the parameters of the function which combines the views play an important role in preventing unimportant views from entering the model. We investigate the performance of StaPLR through simulations, and consider two real data examples. We compare the performance of StaPLR with an existing view selection method called the group lasso and observe that, in terms of view selection, StaPLR is often more conservative and has a consistently lower false positive rate.Comment: 26 pages, 9 figures. Accepted manuscrip

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Recommended from our members

Reliability Assessment of Legacy Safety-Critical Systems Upgraded with Fault-Tolerant Off-the-Shelf Software

Author: Popov P. T.
Publication venue: Centre for Software Reliability, City University London
Publication date: 01/01/2012
Field of study

This paper presents a new way of applying Bayesian assessment to systems, which consist of many components. Full Bayesian inference with such systems is problematic, because it is computationally hard and, far more seriously, one needs to specify a multivariate prior distribution with many counterintuitive dependencies between the probabilities of component failures. The approach taken here is one of decomposition. The system is decomposed into partial views of the systems or part thereof with different degrees of detail and then a mechanism of propagating the knowledge obtained with the more refined views back to the coarser views is applied (recalibration of coarse models). The paper describes the recalibration technique and then evaluates the accuracy of recalibrated models numerically on contrived examples using two techniques: u-plot and prequential likelihood, developed by others for software reliability growth models. The results indicate that the recalibrated predictions are often more accurate than the predictions obtained with the less detailed models, although this is not guaranteed. The techniques used to assess the accuracy of the predictions are accurate enough for one to be able to choose the model giving the most accurate prediction

City Research Online

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Author: Banzhaf W.
Bergstra J.
Feurer M.
Hastie T. J.
Snoek J.
Urbanowicz R. J.
Publication venue
Publication date: 19/03/2016
Field of study

As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.Comment: 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comment

arXiv.org e-Print Archive

Crossref

Scipedia

Inference of the genetic network regulating lateral root initiation in Arabidopsis thaliana

Author: Bennett M.
Byrne H. M.
de Smet I.
Hodgman C.
King J. R.
Muraro D.
Voß U.
Wilson M.
Publication venue
Publication date: 01/01/2012
Field of study

Regulation of gene expression is crucial for organism growth, and it is one of the challenges in Systems Biology to reconstruct the underlying regulatory biological networks from transcriptomic data. The formation of lateral roots in Arabidopsis thaliana is stimulated by a cascade of regulators of which only the interactions of its initial elements have been identified. Using simulated gene expression data with known network topology, we compare the performance of inference algorithms, based on different approaches, for which ready-to-use software is available. We show that their performance improves with the network size and the inclusion of mutants. We then analyse two sets of genes, whose activity is likely to be relevant to lateral root initiation in Arabidopsis, by integrating sequence analysis with the intersection of the results of the best performing methods on time series and mutants to infer their regulatory network. The methods applied capture known interactions between genes that are candidate regulators at early stages of development. The network inferred from genes significantly expressed during lateral root formation exhibits distinct scale-free, small world and hierarchical properties and the nodes with a high out-degree may warrant further investigation

Ghent University Academic Bibliography

Oxford University Research Archive