Search CORE

74,851 research outputs found

New statistical method identifes cytokines that distinguish stool microbiomes

Author: Deych Elena
Hanson Blake
Johnson Jethro
Shands Berkley
Shannon William D.
Sodergren Erica
Weinstock George
Yang Dake
Zhou Xin
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Regressing an outcome or dependent variable onto a set of input or independent variables allows the analyst to measure associations between the two so that changes in the outcome can be described by and predicted by changes in the inputs. While there are many ways of doing this in classical statistics, where the dependent variable has certain properties (e.g., a scalar, survival time, count), little progress on regression where the dependent variable are microbiome taxa counts has been made that do not impose extremely strict conditions on the data. In this paper, we propose and apply a new regression model combining the Dirichlet-multinomial distribution with recursive partitioning providing a fully non-parametric regression model. This model, called DM-RPart, is applied to cytokine data and microbiome taxa count data and is applicable to any microbiome taxa count/metadata, is automatically fit, and intuitively interpretable. This is a model which can be applied to any microbiome or other compositional data and software (R package HMP) available through the R CRAN website

The Jackson Laboratory: The Mouseion at the JAXlibrary

Digital Commons@Becker

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization

Author: Heifets Abraham
Wallach Izhar
Publication venue: 'American Chemical Society (ACS)'
Publication date: 09/05/2018
Field of study

Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems that accounts for the similarity amongst inactive molecules as well as active. We investigated seven widely-used benchmarks for virtual screening and classification, and show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously-applied unbiasing techniques. Therefore, it may be that the previously-reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy

arXiv.org e-Print Archive

FigShare

Rule-based Machine Learning Methods for Functional Prediction

Author: Indurkhya N.
Weiss S. M.
Publication venue
Publication date: 01/01/1995
Field of study

We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

Testate amoebae as a proxy for reconstructing Holocene water table dynamics in southern Patagonian peat bogs

Author: Aaby
Adema
Amesbury
Barber
Barber
Bobrov
Booth
Booth
Booth
Borromei
Charman
Charman
Charman
Charman
Charman
Charman
Daley
Daley
De Vleeschouwer
Deflandre
Foissner
Fritz
Garreaud
Gomaa
Heal
Heger
Hendon
Hendon
Juggins
Kleinebecker
Lamarre
Lamarre
Lamentowicz
Lamentowicz
Lamy
Loisel
Loisel
Mauquoy
Mauquoy
Mauquoy
Mitchell
Mitchell
Moschen
Payne
Payne
Payne
Payne
Payne
Payne
Pendall
Qin
R Core Team
Roland
Schoning
Schouten
Smith
Smith
Stanek
Sullivan
Swindles
Telford
Telford
Ter Braak
Ter Braak
Tonello
Turner
van Bellen
Warner
Wilmshurst
Yu
Publication venue: 'Wiley'
Publication date: 15/07/2014
Field of study

Funded by Natural Environment Research Council. Grant Numbers: NE/I022809/1, NE/I022981/1, NE/I022833/1, NE/I023104/1 Ricardo Muza and the Wildlife Conservation Society Karukinka Park Acknowledgements This work was supported by the Natural Environment Research Council (grant numbers NE/I022809/1, NE/I022981/1, NE/I022833/1 and NE/I023104/1). We thank Ricardo Muza and the Wildlife Conservation Society (WCS) Karukinka Park rangers for facilitating access to Karukinka Park. We also thank François De Vleeschouwer, Gaël Le Roux, Heleen Vanneste, Sébastien Bertrand, Zakaria Ghazoui and Jean-Yves De Vleeschouwer for fieldwork assistance. Nelson Bahamonde (INIA, Punta Arenas, Chile) and Ernesto Teneb (UMag, Punta Arenas, Chile) provided logistical support for the fieldwork in Chile. Dr Andrea Coronato (CADIC, Ushuaia) kindly provided logistical support for the research in Argentina. Thanks to Jenny Johnston for cartography, David Jolley for assistance in microscopic photography and Audrey Innes for laboratory assistance. We highly appreciate reviews by Matt Amesbury and an anonymous reviewer. R.P. is supported by an Impact Fellowship from the University of Stirling.Peer reviewedPublisher PD

Aberdeen University Research

Crossref

Bandwidth selection for kernel estimation in mixed multi-dimensional spaces

Author: Aurélie Bugeau
Aurélie Bugeau
Patrick Pérez
Patrick Pérez
Projet Vista
Publication venue
Publication date: 01/01/2007
Field of study

Kernel estimation techniques, such as mean shift, suffer from one major drawback: the kernel bandwidth selection. The bandwidth can be fixed for all the data set or can vary at each points. Automatic bandwidth selection becomes a real challenge in case of multidimensional heterogeneous features. This paper presents a solution to this problem. It is an extension of \cite{Comaniciu03a} which was based on the fundamental property of normal distributions regarding the bias of the normalized density gradient. The selection is done iteratively for each type of features, by looking for the stability of local bandwidth estimates across a predefined range of bandwidths. A pseudo balloon mean shift filtering and partitioning are introduced. The validity of the method is demonstrated in the context of color image segmentation based on a 5-dimensional space

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1