99,984 research outputs found
Reducing Attrition Bias using Targeted Refreshment Sampling and Matching
This paper examines the possibility of reducing attrition bias in panel data using targeted refreshment sampling and matched imputation. The targeted refreshment sampling approach consists of collecting new data from the original sampling population from individuals who would never usually respond to surveys. Using propensity score matching and imputation in conjunction with refreshment sampling it is suggested that the dropouts from a panel can effectively be 'replaced'. The procedure allows us to identify underlying joint distributions in the data. The method is illustrated using data from the Youth Cohort Surveys in the UK which suffer 45% attrition in the second wave. A comparison of the results of this method with other techniques for attrition modeling suggest that the technique could be an effective way to overcome a substantial part of the bias associated with attrition.attrition, refreshment sampling
Quality Aware Network for Set to Set Recognition
This paper targets on the problem of set to set recognition, which learns the
metric between two image sets. Images in each set belong to the same identity.
Since images in a set can be complementary, they hopefully lead to higher
accuracy in practical applications. However, the quality of each sample cannot
be guaranteed, and samples with poor quality will hurt the metric. In this
paper, the quality aware network (QAN) is proposed to confront this problem,
where the quality of each sample can be automatically learned although such
information is not explicitly provided in the training stage. The network has
two branches, where the first branch extracts appearance feature embedding for
each sample and the other branch predicts quality score for each sample.
Features and quality scores of all samples in a set are then aggregated to
generate the final feature embedding. We show that the two branches can be
trained in an end-to-end manner given only the set-level identity annotation.
Analysis on gradient spread of this mechanism indicates that the quality
learned by the network is beneficial to set-to-set recognition and simplifies
the distribution that the network needs to fit. Experiments on both face
verification and person re-identification show advantages of the proposed QAN.
The source code and network structure can be downloaded at
https://github.com/sciencefans/Quality-Aware-Network.Comment: Accepted at CVPR 201
How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
Data competitions rely on real-time leaderboards to rank competitor entries
and stimulate algorithm improvement. While such competitions have become quite
popular and prevalent, particularly in supervised learning formats, their
implementations by the host are highly variable. Without careful planning, a
supervised learning competition is vulnerable to overfitting, where the winning
solutions are so closely tuned to the particular set of provided data that they
cannot generalize to the underlying problem of interest to the host. This paper
outlines some important considerations for strategically designing relevant and
informative data sets to maximize the learning outcome from hosting a
competition based on our experience. It also describes a post-competition
analysis that enables robust and efficient assessment of the strengths and
weaknesses of solutions from different competitors, as well as greater
understanding of the regions of the input space that are well-solved. The
post-competition analysis, which complements the leaderboard, uses exploratory
data analysis and generalized linear models (GLMs). The GLMs not only expand
the range of results we can explore, they also provide more detailed analysis
of individual sub-questions including similarities and differences between
algorithms across different types of scenarios, universally easy or hard
regions of the input space, and different learning objectives. When coupled
with a strategically planned data generation approach, the methods provide
richer and more informative summaries to enhance the interpretation of results
beyond just the rankings on the leaderboard. The methods are illustrated with a
recently completed competition to evaluate algorithms capable of detecting,
identifying, and locating radioactive materials in an urban environment.Comment: 36 page
Plant image retrieval using color, shape and texture features
We present a content-based image retrieval system for plant image retrieval, intended especially for the house plant identification problem. A plant image consists of a collection of overlapping leaves and possibly flowers, which makes the problem challenging.We studied the suitability of various well-known color, shape and texture features for this problem, as well as introducing some new texture matching techniques and shape features. Feature extraction is applied after segmenting the plant region from the background using the max-flow min-cut technique. Results on a database of 380 plant images belonging to 78 different types of plants show promise of the proposed new techniques
and the overall system: in 55% of the queries, the correct plant image is retrieved among the top-15 results. Furthermore, the accuracy goes up to 73% when a 132-image subset of well-segmented plant images are considered
Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identification
Re-identification is generally carried out by encoding the appearance of a
subject in terms of outfit, suggesting scenarios where people do not change
their attire. In this paper we overcome this restriction, by proposing a
framework based on a deep convolutional neural network, SOMAnet, that
additionally models other discriminative aspects, namely, structural attributes
of the human figure (e.g. height, obesity, gender). Our method is unique in
many respects. First, SOMAnet is based on the Inception architecture, departing
from the usual siamese framework. This spares expensive data preparation
(pairing images across cameras) and allows the understanding of what the
network learned. Second, and most notably, the training data consists of a
synthetic 100K instance dataset, SOMAset, created by photorealistic human body
generation software. Synthetic data represents a good compromise between
realistic imagery, usually not required in re-identification since surveillance
cameras capture low-resolution silhouettes, and complete control of the
samples, which is useful in order to customize the data w.r.t. the surveillance
scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on
recent re-identification benchmarks, outperforms all competitors, matching
subjects even with different apparel. The combination of synthetic data with
Inception architectures opens up new research avenues in re-identification.Comment: 14 page
Energy Disaggregation Using Elastic Matching Algorithms
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)In this article an energy disaggregation architecture using elastic matching algorithms is presented. The architecture uses a database of reference energy consumption signatures and compares them with incoming energy consumption frames using template matching. In contrast to machine learning-based approaches which require significant amount of data to train a model, elastic matching-based approaches do not have a model training process but perform recognition using template matching. Five different elastic matching algorithms were evaluated across different datasets and the experimental results showed that the minimum variance matching algorithm outperforms all other evaluated matching algorithms. The best performing minimum variance matching algorithm improved the energy disaggregation accuracy by 2.7% when compared to the baseline dynamic time warping algorithm.Peer reviewedFinal Published versio
Changes and classification in myocardial contractile function in the left ventricle following acute myocardial infarction
In this research, we hypothesized that novel biomechanical parameters are discriminative in patients following acute ST-segment elevation myocardial infarction (STEMI). To identify these biomechanical biomarkers and bring computational biomechanics ‘closer to the clinic’, we applied state-of-the-art multiphysics cardiac modelling combined with advanced machine learning and multivariate statistical inference to a clinical database of myocardial infarction. We obtained data from 11 STEMI patients (ClinicalTrials.gov NCT01717573) and 27 healthy volunteers, and developed personalized mathematical models for the left ventricle (LV) using an immersed boundary method. Subject-specific constitutive parameters were achieved by matching to clinical measurements. We have shown, for the first time, that compared with healthy controls, patients with STEMI exhibited increased LV wall active tension when normalized by systolic blood pressure, which suggests an increased demand on the contractile reserve of remote functional myocardium. The statistical analysis reveals that the required patient-specific contractility, normalized active tension and the systolic myofilament kinematics have the strongest explanatory power for identifying the myocardial function changes post-MI. We further observed a strong correlation between two biomarkers and the changes in LV ejection fraction at six months from baseline (the required contractility (r = − 0.79, p < 0.01) and the systolic myofilament kinematics (r = 0.70, p = 0.02)). The clinical and prognostic significance of these biomechanical parameters merits further scrutinization
The early socioeconomic effects of teenage childbearing
A large body of literature has documented a negative correlation between teenage childbearing and teen mothers’ socioeconomic outcomes, yet researchers continue to disagree as to whether the association represents a true causal effect. This article extends the extant literature by employing propensity score matching with a sensitivity analysis using Rosenbaum bounds. The analysis of recent cohort data from the National Longitudinal Study of Adolescent Health shows that (1) teenage childbearing has modest but significant negative effects on early socioeconomic outcomes and (2) unobserved covariates would have to be more powerful than known covariates to nullify the propensity score matching estimates. The author concludes by suggesting that more research should be done to address unobserved heterogeneity and the long-term effects of teenage childbearing for this young cohort.propensity score matching, Rosenbaum Bounds, selection bias, teen childbearing
- …