Search CORE

409 research outputs found

HCmodelSets: An R package for specifying sets of well-fitting models in high dimensions

Author: Battey H
Hoeltgebaum H
Publication venue: 'The R Foundation'
Publication date: 25/12/2019
Field of study

In the context of regression with a large number of explanatory variables, Cox and Battey(2017) emphasize that if there are alternative reasonable explanations of the data that are statisticallyindistinguishable, one should aim to specify as many of these explanations as is feasible. The standardpractice, by contrast, is to report a single model effective for prediction. The present paper illustratesthe R implementation of the new ideas in the packageHCmodelSets, using simple reproducibleexamples and real data. Results of some simulation experiments are also reported

Spiral - Imperial College Digital Repository

On inference in high-dimensional logistic regression models with separated data

Author: Battey H
Lewis R
Publication venue: Oxford University Press
Publication date: 12/10/2023
Field of study

Direct use of the likelihood function typically produces severely biased estimates when the dimension of the parameter vector is large relative to the effective sample size. With linearly separable data generated from a logistic regression model, the loglikelihood function asymptotes and the maximum likelihood estimator does not exist. We show that an exact analysis for each regression coefficient produces half-infinite confidence sets for some parameters when the data are separable. Such conclusions are not vacuous, but an honest portrayal of the limitations of the data. Finite confidence sets are only achievable when additional, perhaps implicit, assumptions are made. Under a notional double-asymptotic regime in which the dimension of the logistic coefficient vector increases with the sample size, the present paper considers the implications of enforcing a natural constraint on the vector of logistic-transformed probabilities. We derive a relationship between the logistic coefficients and a notional parameter obtained as a probability limit of an ordinary least squares estimator. The latter exists even when the data are separable. Consistency is ascertained under weak conditions on the design matrix

Spiral - Imperial College Digital Repository

Nonparametric estimation of the intensity function of a spatial point process on a Riemannian manifold

Author: Battey H
Cohen E
Ward S
Publication venue: 'Oxford University Press (OUP)'
Publication date: 13/02/2023
Field of study

This paper is concerned with nonparametric estimation of the intensity function of a point process on a Riemannian manifold. It provides a first-order asymptotic analysis of the proposed kernel estimator for Poisson processes, supplemented by empirical work to probe the behaviour in finite samples and under other generative regimes. The investigation highlights the scope for finite-sample improvements by allowing the bandwidth to adapt to local curvature

Spiral - Imperial College Digital Repository

Communication-constrained distributed quantile regression with optimal statistical guarantees

Author: Battey H
Tan KM
Zhou W-X
Publication venue: Microtome Publishing
Publication date: 01/08/2022
Field of study

We address the problem of how to achieve optimal inference in distributed quantile regression without stringent scaling conditions. This is challenging due to the non-smooth nature of the quantile regression (QR) loss function, which invalidates the use of existing methodology. The difficulties are resolved through a double-smoothing approach that is applied to the local (at each data source) and global objective functions. Despite the reliance on a delicate combination of local and global smoothing parameters, the quantile regression model is fully parametric, thereby facilitating interpretation. In the low-dimensional regime, we establish a finite-sample theoretical framework for the sequentially defined distributed QR estimators. This reveals a trade-off between the communication cost and statistical error. We further discuss and compare several alternative confidence set constructions, based on inversion of Wald and score-type tests and resampling techniques, detailing an improvement that is effective for more extreme quantile coefficients. In high dimensions, a sparse framework is adopted, where the proposed doubly-smoothed objective function is complemented with an ℓ1-penalty. We show that the corresponding distributed penalized QR estimator achieves the global convergence rate after a near-constant number of communication rounds. A thorough simulation study further elucidates our findings

Spiral - Imperial College Digital Repository

D. R. Cox: Extracts from a memorial lecture

Author: Battey H
Publication venue: 'MIT Press - Journals'
Publication date: 31/03/2023
Field of study

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

Recommended from our members

Can you make morphometrics work when you know the right answer? Pick and mix approaches for apple identification

Author: Battey Nick H.
Christodoulou Maria D.
Culham Alastair
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Morphological classification of living things has challenged science for several centuries and has led to a wide range of objective morphometric approaches in data gathering and analysis. In this paper we explore those methods using apple cultivars, a model biological system in which discrete groups are pre-defined but in which there is a high level of overall morphological similarity. The effectiveness of morphometric techniques in discovering the groups is evaluated using statistical learning tools. No one technique proved optimal in classification on every occasion, linear morphometric techniques slightly out-performing geometric (72.6% accuracy on test set versus 66.7%). The combined use of these techniques with post-hoc knowledge of their individual successes with particular cultivars achieves a notably higher classification accuracy (77.8%). From this we conclude that even with pre-determined discrete categories, a range of approaches is needed where those categories are intrinsically similar to each other, and we raise the question of whether in studies where potentially continuous natural variation is being categorised the level of match between categories is routinely set too high

Central Archive at the University of Reading

Directory of Open Access Journals

Oxford University Research Archive

FigShare

SolCyc: a database hub at the Sol Genomics Network (SGN) for the manual curation of metabolic networks in Solanum and Nicotiana specific databases

Author: A. Bombarely
H. Foerster
J.N. Battey
L.A. Mueller
N. Sierro
N.V. Ivanov
Publication venue: 'Oxford University Press (OUP)'
Publication date: 10/05/2018
Field of study

SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc (http://www.metacyc.org/) as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species\u2013specific N.tabacum PGDB. The implementation of this strategy will significantly advance the curation status of all organism-specific databases in SolCyc resulting in the improvement on database accuracy, data analysis and visualization of biochemical networks in those species

AIR Universita degli studi di Milano

Recommended from our members

Arabidopsis annexin1 mediates the radical-activated plasma membrane Ca2+ - and K+ -permeable conductance in root cells

Author: Battey N. H.
Brownlee C.
Coxon K. M.
Cuin T. A.
Davies J. M.
Laohavisit A.
Macpherson N.
Mortimer J. C.
Park O. K.
Rubio L.
Sentenac H.
Shabala S.
Shang Z.
Very A.-A.
Wang A.
Webb A. A. R.
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2012
Field of study

Plant cell growth and stress signaling require Ca2+ influx through plasma membrane transport proteins that are regulated by reactive oxygen species. In root cell growth, adaptation to salinity stress, and stomatal closure, such proteins operate downstream of the plasma membrane NADPH oxidases that produce extracellular superoxide anion, a reactive oxygen species that is readily converted to extracellular hydrogen peroxide and hydroxyl radicals, OH_. In root cells, extracellular OH_ activates a plasma membrane Ca2+-permeable conductance that permits Ca2+ influx. In Arabidopsis thaliana, distribution of this conductance resembles that of annexin1 (ANN1). Annexins are membrane binding proteins that can form Ca2+-permeable conductances in vitro. Here, the Arabidopsis loss-of-function mutant for annexin1 (Atann1) was found to lack the root hair and epidermal OH_-activated Ca2+- and K+-permeable conductance. This manifests in both impaired root cell growth and ability to elevate root cell cytosolic free Ca2+ in response to OH_. An OH_-activated Ca2+ conductance is reconstituted by recombinant ANN1 in planar lipid bilayers. ANN1 therefore presents as a novel Ca2+-permeable transporter providing a molecular link between reactive oxygen species and cytosolic Ca2+ in plants

Central Archive at the University of Reading

Crossref

Adelaide Research & Scholarship

HAL Descartes

PubMed Central

ProdInra

The Protein Model Portal

Author: A Bairoch
A Hillisch
A Tramontano
AM Jenkinson
AR Ortiz
C Yeats
D Baker
D Chivian
EA Merritt
Florian Kiefer
H Berman
H Berman
H Huang
Helen M. Berman
HM Berman
HM Berman
J Kopp
J Kopp
J Kopp
James N. D. Battey
JN Battey
John D. Westbrook
Jürgen Kopp
Konstantin Arnold
Lorenza Bordoli
MC Peitsch
Michael Podvinec
MJ Hartshorn
N Mirkovic
PJ Kraulis
RD Finn
S Yooseph
SF Altschul
T Schwede
T Schwede
Torsten Schwede
U Pieper
Y Zhang
Publication venue: Springer Netherlands
Publication date: 01/01/2008
Field of study

Structural Genomics has been successful in determining the structures of many unique proteins in a high throughput manner. Still, the number of known protein sequences is much larger than the number of experimentally solved protein structures. Homology (or comparative) modeling methods make use of experimental protein structures to build models for evolutionary related proteins. Thereby, experimental structure determination efforts and homology modeling complement each other in the exploration of the protein structure space. One of the challenges in using model information effectively has been to access all models available for a specific protein in heterogeneous formats at different sites using various incompatible accession code systems. Often, structure models for hundreds of proteins can be derived from a given experimentally determined structure, using a variety of established methods. This has been done by all of the PSI centers, and by various independent modeling groups. The goal of the Protein Model Portal (PMP) is to provide a single portal which gives access to the various models that can be leveraged from PSI targets and other experimental protein structures. A single interface allows all existing pre-computed models across these various sites to be queried simultaneously, and provides links to interactive services for template selection, target-template alignment, model building, and quality assessment. The current release of the portal consists of 7.6 million model structures provided by different partner resources (CSMP, JCSG, MCSG, NESG, NYSGXRC, JCMM, ModBase, SWISS-MODEL Repository). The PMP is available at http://www.proteinmodelportal.org and from the PSI Structural Genomics Knowledgebase

Crossref

Springer - Publisher Connector

edoc

PubMed Central

Activation of neuromedin B-preferring bombesin receptors on rat glioblastoma C-6 cells increases cellular Ca2+ and phosphoinositides

Author: D H Coy
E Wada
J F Battey
J T Lin
L H Wang
R T Jensen
S Mantey
Publication venue: 'Portland Press Ltd.'
Publication date
Field of study

Crossref