Search CORE

99,984 research outputs found

Reducing Attrition Bias using Targeted Refreshment Sampling and Matching

Author: Dolton Peter
Publication venue
Publication date
Field of study

This paper examines the possibility of reducing attrition bias in panel data using targeted refreshment sampling and matched imputation. The targeted refreshment sampling approach consists of collecting new data from the original sampling population from individuals who would never usually respond to surveys. Using propensity score matching and imputation in conjunction with refreshment sampling it is suggested that the dropouts from a panel can effectively be 'replaced'. The procedure allows us to identify underlying joint distributions in the data. The method is illustrated using data from the Youth Cohort Surveys in the UK which suffer 45% attrition in the second wave. A comparison of the results of this method with other techniques for attrition modeling suggest that the technique could be an effective way to overcome a substantial part of the bias associated with attrition.attrition, refreshment sampling

Quality Aware Network for Set to Set Recognition

Author: Liu Yu
Ouyang Wanli
Yan Junjie
Publication venue
Publication date: 11/04/2017
Field of study

This paper targets on the problem of set to set recognition, which learns the metric between two image sets. Images in each set belong to the same identity. Since images in a set can be complementary, they hopefully lead to higher accuracy in practical applications. However, the quality of each sample cannot be guaranteed, and samples with poor quality will hurt the metric. In this paper, the quality aware network (QAN) is proposed to confront this problem, where the quality of each sample can be automatically learned although such information is not explicitly provided in the training stage. The network has two branches, where the first branch extracts appearance feature embedding for each sample and the other branch predicts quality score for each sample. Features and quality scores of all samples in a set are then aggregated to generate the final feature embedding. We show that the two branches can be trained in an end-to-end manner given only the set-level identity annotation. Analysis on gradient spread of this mechanism indicates that the quality learned by the network is beneficial to set-to-set recognition and simplifies the distribution that the network needs to fit. Experiments on both face verification and person re-identification show advantages of the proposed QAN. The source code and network structure can be downloaded at https://github.com/sciencefans/Quality-Aware-Network.Comment: Accepted at CVPR 201

arXiv.org e-Print Archive

How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

Author: Anderson-Cook Christine M.
Fugate Michael L.
Lu Lu
Myers Kary L.
Pawley Norma
Quinlan Kevin R.
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.Comment: 36 page

arXiv.org e-Print Archive

Plant image retrieval using color, shape and texture features

Author: Kebapcı Hanife
Kebapci Hanife
Unal Gozde
Yanikoglu Berrin
Yanıkoğlu Berrin
Ünal Gözde
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/09/2011
Field of study

We present a content-based image retrieval system for plant image retrieval, intended especially for the house plant identification problem. A plant image consists of a collection of overlapping leaves and possibly flowers, which makes the problem challenging.We studied the suitability of various well-known color, shape and texture features for this problem, as well as introducing some new texture matching techniques and shape features. Feature extraction is applied after segmenting the plant region from the background using the max-flow min-cut technique. Results on a database of 380 plant images belonging to 78 different types of plants show promise of the proposed new techniques and the overall system: in 55% of the queries, the correct plant image is retrieved among the top-15 results. Furthermore, the accuracy goes up to 73% when a 132-image subset of well-segmented plant images are considered

Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identification

Author: Barbosa Igor Barros
Caputo Barbara
Cristani Marco
Rognhaugen Aleksander
Theoharis Theoharis
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Re-identification is generally carried out by encoding the appearance of a subject in terms of outfit, suggesting scenarios where people do not change their attire. In this paper we overcome this restriction, by proposing a framework based on a deep convolutional neural network, SOMAnet, that additionally models other discriminative aspects, namely, structural attributes of the human figure (e.g. height, obesity, gender). Our method is unique in many respects. First, SOMAnet is based on the Inception architecture, departing from the usual siamese framework. This spares expensive data preparation (pairing images across cameras) and allows the understanding of what the network learned. Second, and most notably, the training data consists of a synthetic 100K instance dataset, SOMAset, created by photorealistic human body generation software. Synthetic data represents a good compromise between realistic imagery, usually not required in re-identification since surveillance cameras capture low-resolution silhouettes, and complete control of the samples, which is useful in order to customize the data w.r.t. the surveillance scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on recent re-identification benchmarks, outperforms all competitors, matching subjects even with different apparel. The combination of synthetic data with Inception architectures opens up new research avenues in re-identification.Comment: 14 page

arXiv.org e-Print Archive

Energy Disaggregation Using Elastic Matching Algorithms

Author: Mporas Iosif
Paraskevas Michael
Schirmer Pascal
Publication venue: 'MDPI AG'
Publication date: 06/01/2020
Field of study

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)In this article an energy disaggregation architecture using elastic matching algorithms is presented. The architecture uses a database of reference energy consumption signatures and compares them with incoming energy consumption frames using template matching. In contrast to machine learning-based approaches which require significant amount of data to train a model, elastic matching-based approaches do not have a model training process but perform recognition using template matching. Five different elastic matching algorithms were evaluated across different datasets and the experimental results showed that the minimum variance matching algorithm outperforms all other evaluated matching algorithms. The best performing minimum variance matching algorithm improved the energy disaggregation accuracy by 2.7% when compared to the baseline dynamic time warping algorithm.Peer reviewedFinal Published versio

Changes and classification in myocardial contractile function in the left ventricle following acute myocardial infarction

Author: Aderhold Andrej
Berry Colin
Gao Hao
Husmeier Dirk
Luo Xiaoyu
Mangion Kenneth
Publication venue: 'The Royal Society'
Publication date: 01/07/2017
Field of study

In this research, we hypothesized that novel biomechanical parameters are discriminative in patients following acute ST-segment elevation myocardial infarction (STEMI). To identify these biomechanical biomarkers and bring computational biomechanics ‘closer to the clinic’, we applied state-of-the-art multiphysics cardiac modelling combined with advanced machine learning and multivariate statistical inference to a clinical database of myocardial infarction. We obtained data from 11 STEMI patients (ClinicalTrials.gov NCT01717573) and 27 healthy volunteers, and developed personalized mathematical models for the left ventricle (LV) using an immersed boundary method. Subject-specific constitutive parameters were achieved by matching to clinical measurements. We have shown, for the first time, that compared with healthy controls, patients with STEMI exhibited increased LV wall active tension when normalized by systolic blood pressure, which suggests an increased demand on the contractile reserve of remote functional myocardium. The statistical analysis reveals that the required patient-specific contractility, normalized active tension and the systolic myofilament kinematics have the strongest explanatory power for identifying the myocardial function changes post-MI. We further observed a strong correlation between two biomarkers and the changes in LV ejection fraction at six months from baseline (the required contractility (r = − 0.79, p < 0.01) and the systolic myofilament kinematics (r = 0.70, p = 0.02)). The clinical and prognostic significance of these biomechanical parameters merits further scrutinization

Enlighten

The early socioeconomic effects of teenage childbearing

Author: Dohoon Lee
Publication venue
Publication date
Field of study

A large body of literature has documented a negative correlation between teenage childbearing and teen mothers’ socioeconomic outcomes, yet researchers continue to disagree as to whether the association represents a true causal effect. This article extends the extant literature by employing propensity score matching with a sensitivity analysis using Rosenbaum bounds. The analysis of recent cohort data from the National Longitudinal Study of Adolescent Health shows that (1) teenage childbearing has modest but significant negative effects on early socioeconomic outcomes and (2) unobserved covariates would have to be more powerful than known covariates to nullify the propensity score matching estimates. The author concludes by suggesting that more research should be done to address unobserved heterogeneity and the long-term effects of teenage childbearing for this young cohort.propensity score matching, Rosenbaum Bounds, selection bias, teen childbearing