Search CORE

3,652 research outputs found

How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

Author: Anderson-Cook Christine M.
Fugate Michael L.
Lu Lu
Myers Kary L.
Pawley Norma
Quinlan Kevin R.
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.Comment: 36 page

arXiv.org e-Print Archive

USFSP Digital Archive

Scholar Commons - University of South Florida

Interpretable Machine Learning Model for Clinical Decision Making

Author: El-Sharif Ali
Publication venue: NSUWorks
Publication date: 01/01/2021
Field of study

Despite machine learning models being increasingly used in medical decision-making and meeting classification predictive accuracy standards, they remain untrusted black-boxes due to decision-makers\u27 lack of insight into their complex logic. Therefore, it is necessary to develop interpretable machine learning models that will engender trust in the knowledge they generate and contribute to clinical decision-makers intention to adopt them in the field. The goal of this dissertation was to systematically investigate the applicability of interpretable model-agnostic methods to explain predictions of black-box machine learning models for medical decision-making. As proof of concept, this study addressed the problem of predicting the risk of emergency readmissions within 30 days of being discharged for heart failure patients. Using a benchmark data set, supervised classification models of differing complexity were trained to perform the prediction task. More specifically, Logistic Regression (LR), Random Forests (RF), Decision Trees (DT), and Gradient Boosting Machines (GBM) models were constructed using the Healthcare Cost and Utilization Project (HCUP) Nationwide Readmissions Database (NRD). The precision, recall, area under the ROC curve for each model were used to measure predictive accuracy. Local Interpretable Model-Agnostic Explanations (LIME) was used to generate explanations from the underlying trained models. LIME explanations were empirically evaluated using explanation stability and local fit (R2). The results demonstrated that local explanations generated by LIME created better estimates for Decision Trees (DT) classifiers

NSU Works

Empirical Modeling Of Piping Along Mississippi River Levees In Southwestern Illinois

Author: Shields Rebecca F.
Publication venue: eGrove Press
Publication date: 01/01/2015
Field of study

eGrove (Univ. of Mississippi)

Recommended from our members

Risky Sex and HIV Acquisition Among HIV Serodiscordant Couples in Zambia, 2002-2012: What Does Alcohol Have To Do With It?

Author: Allen Susan
Brill Ilene
Chomba Elwyn
Comulada W Scott
Gorbach Pamina M
Javanbakht Marjan
Joseph Davey Dvora
Khu Naw Htee
Kilembe William
Mulenga Joseph
Tichacek Amanda
Vwalika Bellington
Wall Kristin M
Publication venue: eScholarship, University of California
Publication date: 01/07/2017
Field of study

In this paper we evaluate the effects of heavy alcohol consumption on sexual behavior, HIV acquisition, and antiretroviral treatment (ART) initiation in a longitudinal open cohort of 1929 serodiscordant couples in Lusaka, Zambia from 2002 to 2012. We evaluated factors associated with baseline heavy alcohol consumption and its association with condomless sex with the study partner, sex outside of the partnership, and ART initiation using multivariable logistic regression. We estimated the effect of alcohol consumption on HIV acquisition using multivariable Cox models. Baseline factors significantly associated with women's heavy drinking (drunk weekly or more in 12-months before enrollment) included woman's older age (adjusted prevalence odds ratio [aPOR] = 1.04), partner heavy drinking (aPOR = 3.93), and being HIV-infected (aPOR = 2.03). Heavy drinking among men was associated with less age disparity with partner (aPOR per year disparity = 0.97) and partner heavy drinking (aPOR = 1.63). Men's being drunk daily (aOR = 1.18), women's being drunk less than monthly (aOR = 1.39) vs. never drunk and being in a male HIV-negative and female HIV-positive union (aOR = 1.45) were associated with condomless sex. Heavy alcohol use was associated with having 1 or more outside sex partners among men (aOR drunk daily = 1.91, drunk weekly = 1.32, drunk monthly = 2.03 vs. never), and women (aOR drunk monthly = 2.75 vs. never). Being drunk weekly or more increased men's risk of HIV acquisition (adjusted hazard ratio [aHR] = 1.72). Men and women being drunk weekly or more was associated (p < 0.1) with women's seroconversion (aHR = 1.42 and aHR = 3.71 respectively). HIV-positive women who were drunk monthly or more had lower odds of initiating ART (aOR = 0.83; 95% CI = 0.70-0.99) adjusting for age, months since baseline and previous pregnancies. Individuals in HIV-serodiscordant couples who reported heavy drinking had more outside sex partnerships and condomless sex with their study partner and were more likely to acquire HIV. HIV-positive women had lower odds of initiating ART if they were heavy drinkers

eScholarship - University of California

Spatial prediction of rotational landslide using geographically weighted regression, logistic regression, and support vector machine models in Xing Guo area (China)

Author: Chen W
Hong H
Pradhan B
Sameen MI
Xu C
Publication venue: 'Informa UK Limited'
Publication date: 01/12/2017
Field of study

© 2017 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This study evaluated the geographically weighted regression (GWR) model for landslide susceptibility mapping in Xing Guo County, China. In this study, 16 conditioning factors, such as slope, aspect, altitude, topographic wetness index, stream power index, sediment transport index, soil, lithology, normalized difference vegetation index (NDVI), landuse, rainfall, distance to road, distance to river, distance to fault, plan curvature, and profile curvature, were analyzed. Chi-square feature selection method was adopted to compare the significance of each factor with landslide occurence. The GWR model was compared with two well-known models, namely, logistic regression (LR) and support vcector machine (SVM). Results of chi-square feature selection indicated that lithology and slope are the most influencial factors, whereas SPI was found statistically insignificant. Four landslide susceptibility maps were generated by GWR, SGD-LR, SGD-SVM, and SVM models. The GWR model exhibited the highest performance in terms of success rate and prediction accuracy, with values of 0.789 and 0.819, respectively. The SVM model exhibited slightly lower AUC values than that of the GWR model. Validation result of the four models indicates that GWR is a better model than other widely used models

OPUS - University of Technology Sydney

Directory of Open Access Journals

A novel preoperative model to predict 90-day surgical mortality in patients considered for renal cell carcinoma surgery

Author: Bahler Clinton D.
Boris Ronald S.
Calaway Adam C.
Cary Clint
Monn M. Francesca
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

Introduction Surgical benefits for renal cell carcinoma must be weighed against competing causes of mortality, especially in the elderly patient population. We used a large cancer registry to evaluate the impact of patient and cancer-specific factors on 90-day mortality (90DM). A nomogram to predict the odds of short-term mortality was created. Materials and Methods The National Cancer Database was queried to identify all patients with clinically localized, nonmetastatic disease treated with partial or radical nephrectomy. Using a random sample of 60%, multiple logistic regression with 90DM outcomes were performed to identify preoperative variables associated with mortality. Variables included age, sex, race, co-morbidity score, tumor size, and presence of a thrombus. A nomogram was created and tested on the remaining 40% of patients to predict 90DM. Results 183,407 patients met inclusion criteria. Overall 90DM for the cohort was 1.9%. All preoperative variables significantly influenced the risk of 90DM. Patient age was by far the strongest predictor. Nomogram scores ranged from 0 to 12. Compared to patients with 0 to 1 points, those with 2 to 3 (odds ratio [OR] 2.89, 2.42–3.46; P 6 (OR 12.86, 10.83–15.27; P 80 years of age alone placed patients into the highest risk of surgical mortality. Conclusions Management of localized kidney cancer must consider competing causes of mortality, especially in elderly patients with multiple co-morbidities. We present a preoperative tool to calculate risk of surgical short-term mortality to aid surgeon–patient counseling

IUPUIScholarWorks

Crowding Perception in a Tourist City: A Question of Preference

Author: Neuts B.
Nijkamp P.
Publication venue: Amsterdam
Publication date: 01/01/2011
Field of study

Two main topics are analysed in this paper: a crowding model for an urban destination is tested by the use of a binary logistic model in order to identify the variables influencing crowding perception; and the inherent negativity of the crowding concept, as is often assumed, is examined through association statistics. The results confirmed that personal and behavioural variables have a larger effect on the perception of crowding than use-level. Furthermore, the relationship between crowding and experience, while significantly negative, could only be found in respondents with a preference for low, and a perception of high, use levels, while for the majority of individuals the perception of a certain crowding level did not lead to a negative evaluation of the conditions. This proves that the concept of crowding cannot be assumed to be implicitly negative, and needs individual preferences to be fully understood.status: publishe

Lirias

VU Research Portal