Search CORE

328 research outputs found

The Bayesian Decision Tree Technique with a Sweeping Strategy

Author: Bailey T. C.
Everson R. M.
Fieldsend J. E.
Hernandez A.
Krzanowski W. J.
Partridge D.
Schetinin V.
Publication venue
Publication date: 01/01/2004
Field of study

The uncertainty of classification outcomes is of crucial importance for many safety critical applications including, for example, medical diagnostics. In such applications the uncertainty of classification can be reliably estimated within a Bayesian model averaging technique that allows the use of prior information. Decision Tree (DT) classification models used within such a technique gives experts additional information by making this classification scheme observable. The use of the Markov Chain Monte Carlo (MCMC) methodology of stochastic sampling makes the Bayesian DT technique feasible to perform. However, in practice, the MCMC technique may become stuck in a particular DT which is far away from a region with a maximal posterior. Sampling such DTs causes bias in the posterior estimates, and as a result the evaluation of classification uncertainty may be incorrect. In a particular case, the negative effect of such sampling may be reduced by giving additional prior information on the shape of DTs. In this paper we describe a new approach based on sweeping the DTs without additional priors on the favorite shape of DTs. The performances of Bayesian DT techniques with the standard and sweeping strategies are compared on a synthetic data as well as on real datasets. Quantitatively evaluating the uncertainty in terms of entropy of class posterior probabilities, we found that the sweeping strategy is superior to the standard strategy

arXiv.org e-Print Archive

CiteSeerX

Unsupervised Feature Selection with Adaptive Structure Learning

Author: Alelyani S.
He X.
Hou C.
Krzanowski W.
Li Z.
Liu J.
Liu X.
Nie F.
Nie F.
Qian M.
Takeuchi I.
Yang Y.
Zhao Z.
Publication venue
Publication date: 02/04/2015
Field of study

The problem of feature selection has raised considerable interests in the past decade. Traditional unsupervised methods select the features which can faithfully preserve the intrinsic structures of data, where the intrinsic structures are estimated using all the input features of data. However, the estimated intrinsic structures are unreliable/inaccurate when the redundant and noisy features are not removed. Therefore, we face a dilemma here: one need the true structures of data to identify the informative features, and one need the informative features to accurately estimate the true structures of data. To address this, we propose a unified learning framework which performs structure learning and feature selection simultaneously. The structures are adaptively learned from the results of feature selection, and the informative features are reselected to preserve the refined structures of data. By leveraging the interactions between these two essential tasks, we are able to capture accurate structures and select more informative features. Experimental results on many benchmark data sets demonstrate that the proposed method outperforms many state of the art unsupervised feature selection methods

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bayesian averaging over Decision Tree models for trauma severity scoring

Author: Achilleos
Acid
Bailey
Bayat
Becalick
Bouamra
Boyd
Breiman
Brohi
Chipman
Constantinou
Denison
Dietterich
Dinesen
Free
Green
Hastie
Jakaite
Jakaite
Janssens
Kilgo
Koller
Korb
Kramer
Krzanowski
Krzanowski
L. Jakaite
Lefering
Magni
National Center for Health Statistics
Negrin
Osler
Robert
Rogers
Schechner
Schetinin
Schetinin
Schetinin
Schetinin
Steyerberg
The American College of Surgeons
V. Schetinin
W. Krzanowski
Publication venue: 'Elsevier BV'
Publication date: 11/01/2018
Field of study

Health care practitioners analyse possible risks of misleading decisions and need to estimate and quantify uncertainty in predictions. We have examined the “gold” standard of screening a patient's conditions for predicting survival probability, based on logistic regression modelling, which is used in trauma care for clinical purposes and quality audit. This methodology is based on theoretical assumptions about data and uncertainties. Models induced within such an approach have exposed a number of problems, providing unexplained fluctuation of predicted survival and low accuracy of estimating uncertainty intervals within which predictions are made. Bayesian method, which in theory is capable of providing accurate predictions and uncertainty estimates, has been adopted in our study using Decision Tree models. Our approach has been tested on a large set of patients registered in the US National Trauma Data Bank and has outperformed the standard method in terms of prediction accuracy, thereby providing practitioners with accurate estimates of the predictive posterior densities of interest that are required for making risk-aware decisions

Crossref

University of Bedfordshire Repository

Estabilidade fenotípica via modelo AMMI com reamostragem "Bootstrap".

Author: DIAS C. T. dos S.
KRZANOWSKI W. J.
LAVORANTI O. J.
Publication venue
Publication date: 16/03/2016
Field of study

Repository Open Access to Scientific Information from Embrapa

Recipes for sparse LDA of horizontal data

Author: A Marshall
A Montanari
A Rencher
B Flury
B Flury
BG Osborne
C Hage
D Bragoli
DG Calò
DM Witten
GH Golub
H Shin
IS Dhillon
IT Jolliffe
J Duchene
J Duintjer Tebbens
J Fan
JC Gower
JC Gower
L Clemmensen
M Ng
M Vichi
M Zou
ME Timmerman
N Boumal
N Hao
NA Campbell
NT Trendafilov
NT Trendafilov
NT Trendafilov
NT Trendafilov
NT Trendafilov
P Bickel
P-A Absil
R Tibshirani
RA Fisher
S Mussard
T Cai
T Hastie
TP Conrads
W Gander
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
WJ Krzanowski
Z Wen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many important modern applications require analyzing data with more variables than observations, called for short horizontal. In such situation the classical Fisher’s linear discriminant analysis (LDA) does not possess solution because the within-group scatter matrix is singular. Moreover, the number of the variables is usually huge and the classical type of solutions (discriminant functions) are difficult to interpret as they involve all available variables. Nowadays, the aim is to develop fast and reliable algorithms for sparse LDA of horizontal data. The resulting discriminant functions depend on very few original variables, which facilitates their interpretation. The main theoretical and numerical challenge is how to cope with the singularity of the within-group scatter matrix. This work aims at classifying the existing approaches according to the way they tackle this singularity issue, and suggest new ones

Crossref

Springer - Publisher Connector

Open Research Online

High connectivity among locally adapted populations of a marine fish (Menidia menidia)

Author: Conover D. O.
Conover D. O.
David O. Conover
Hendry A. P.
Hildebrand W. C.
Krzanowski W. J.
Lora M. Clarke
Mach M.
Simon R. Thorrold
Stephan B. Munch
Swearer S. E.
Publication venue: 'Wiley'
Publication date: 01/12/2010
Field of study

Author Posting. © Ecological Society of America, 2010. This article is posted here by permission of Ecological Society of America for personal use, not for redistribution. The definitive version was published in Ecology 91 (2010): 3526–3537, doi:10.1890/09-0548.1.Patterns of connectivity are important in understanding the geographic scale of local adaptation in marine populations. While natural selection can lead to local adaptation, high connectivity can diminish the potential for such adaptation to occur. Connectivity, defined as the exchange of individuals among subpopulations, is presumed to be significant in most marine species due to life histories that include widely dispersive stages. However, evidence of local adaptation in marine species, such the Atlantic silverside, Menidia menidia, raises questions concerning the degree of connectivity. We examined geochemical signatures in the otoliths, or ear bones, of adult Atlantic silversides collected in 11 locations along the northeastern coast of the United States from New Jersey to Maine in 2004 and eight locations in 2005 using laser ablation inductively coupled plasma mass spectrometry (ICP-MS) and isotope ratio monitoring mass spectrometry (irm-MS). These signatures were then compared to baseline signatures of juvenile fish of known origin to determine natal origin of these adult fish. We then estimated migration distances and the degree of mixing from these data. In both years, fish generally had the highest probability of originating from the same location in which they were captured (0.01–0.80), but evidence of mixing throughout the sample area was present. Furthermore, adult M. menidia exhibit highly dispersive behavior with some fish migrating over 700 km. The probability of adult fish returning to natal areas differed between years, with the probability being, on average, 0.2 higher in the second year. These findings demonstrate that marine species with largely open populations are capable of local adaptation despite apparently high gene flow.This work was funded by the National Science Foundation (grant OCE-0425830 to D. O. Conover and grant OCE- 0134998 to S. R. Thorrold) and the New York State Department of Environmental Conservation

Crossref

Woods Hole Open Access Server

Bayesian averaging over decision tree models: an application for estimating uncertainty in trauma severity scoring

Author: Achilleos
Bailey
Becalick
Benjamin
Bouamra
Boyd
Breiman
Brohi
Champion
Chipman
Cook
Denison
Gabbe
Gennarelli
González-Robledo
Green
Hosmer
Jakaite
Jakaite
Kilgo
Koller
Kramer
Krzanowski
Krzanowski
L. Jakaite
Lefering
Lu
Magni
Meyfroidt
Negrin
Osler
Ozenne
Patil
Paul
Peng
Pennell
Philip
Robert
Rogers
Saito
Schetinin
Schetinin
Schetinin
Schetinin
Schetinin
Schetinin
Schetinin
Schetinin
Schluter
Steyerberg
Steyerberg
The American Association for the Surgery of Trauma
The American College of Surgeons
Toma
TraumaCalc
V. Schetinin
W. Krzanowski
Publication venue: 'Elsevier BV'
Publication date: 11/01/2018
Field of study

Introduction For making reliable decisions, practitioners need to estimate uncertainties that exist in data and decision models. In this paper we analyse uncertainties of predicting survival probability for patients in trauma care. The existing prediction methodology employs logistic regression modelling of Trauma and Injury Severity Score(external) (TRISS), which is based on theoretical assumptions. These assumptions limit the capability of TRISS methodology to provide accurate and reliable predictions. Methods We adopt the methodology of Bayesian model averaging and show how this methodology can be applied to decision trees in order to provide practitioners with new insights into the uncertainty. The proposed method has been validated on a large set of 447,176 cases registered in the US National Trauma Data Bank in terms of discrimination ability evaluated with receiver operating characteristic (ROC) and precision–recall (PRC) curves. Results Areas under curves were improved for ROC from 0.951 to 0.956 (p = 3.89 × 10−18) and for PRC from 0.564 to 0.605 (p = 3.89 × 10−18). The new model has significantly better calibration in terms of the Hosmer–Lemeshow Hˆ" role="presentation"> statistic, showing an improvement from 223.14 (the standard method) to 11.59 (p = 2.31 × 10−18). Conclusion The proposed Bayesian method is capable of improving the accuracy and reliability of survival prediction. The new method has been made available for evaluation purposes as a web application

Crossref

University of Bedfordshire Repository

Standard survey methods for estimating colony losses and explanatory risk factors in Apis mellifera

Author: AAPOR
Adjlane Noureddine
Alison Gray
Aykut Kence
Bach Kim Nguyen
Céline Holzmann
DE LEEUW E D
DE LEEUW E D
DOBSON A J
EFRON B
Flemming Vejsnæs
Franco Mutinelli
GRAY A
Grażyna Topolska
HARDIN J W
KINDT R
KLEINBAUM D G
KRZANOWSKI W J
LEHTONEN R
Lennard Pisa
LOHR S L
Magnus Peterson
Mary F Coffey
MYERS R H
OTT R L
Preben Kristiansen
Robert Brodschneider
Romée van der Zee
Róbert Chlebo
SAMUELS M L
SCHAEFFER R L
Selwyn Wilkins
SOROKER V
TWISK J W R
VANENGELSDORP D
Victoria Soroker
Publication venue: 'International Bee Research Association'
Publication date: 01/01/2013
Field of study

This chapter addresses survey methodology and questionnaire design for the collection of data pertaining to estimation of honey bee colony loss rates and identification of risk factors for colony loss. Sources of error in surveys are described. Advantages and disadvantages of different random and non-random sampling strategies and different modes of data collection are presented to enable the researcher to make an informed choice. We discuss survey and questionnaire methodology in some detail, for the purpose of raising awareness of issues to be considered during the survey design stage in order to minimise error and bias in the results. Aspects of survey design are illustrated using surveys in Scotland. Part of a standardized questionnaire is given as a further example, developed by the COLOSS working group for Monitoring and Diagnosis. Approaches to data analysis are described, focussing on estimation of loss rates. Dutch monitoring data from 2012 were used for an example of a statistical analysis with the public domain R software. We demonstrate the estimation of the overall proportion of losses and corresponding confidence interval using a quasi-binomial model to account for extra-binomial variation. We also illustrate generalized linear model fitting when incorporating a single risk factor, and derivation of relevant confidence intervals

Archives ouvertes de l'Université M'hamed Bougara Boumerdes

Crossref

University of Strathclyde Institutional Repository

OpenMETU (Middle East Technical University)

Theory and simulations of covariance mapping in multiple dimensions for data analysis in high-event-rate experiments

Author: Carl Nordling
J. H. D. Eland
L. J. Frasinski
R. Feifel
V. Zhaunerchyk
W. J. Krzanowski
Publication venue: 'American Physical Society (APS)'
Publication date
Field of study

Crossref