167 research outputs found
Health Disparities in Colorectal Cancer Screening in United States: Race/ethnicity or Shifting Paradigms?
Background: Colorectal cancer (CRC) remains the third leading cause of cancer death in the United States. The incidence, mortality, and screening vary by race/ethnicity, with African Americans and Hispanics being disproportionately represented. Early detection through screening prolongs survival and decreases mortality. CRC screening (CRCS) varies by race/ethnicity, with lower prevalence rates observed among minorities, but the factors associated with such disparities remain to be fully understood. The current study aimed to examine the ethnic/racial disparities in the prevalence of CRCS, and the explanatory factors therein in a large sample of U.S. residents, using the National Health Interview Survey, 2003.
Materials and Methods: A cross-sectional, epidemiologic design was used with a chi squareto assess the prevalence of CRCS, while a survey logistic regression model was used to assess the odds of being screened.
Results: There was a significant variability in CRCS, with minorities demonstrating lower prevalence relative to Caucasians χ2 (3) = 264.4, p\u3c 0.0001. After controlling for the covariates, racial/ethnic disparities in CRCS persisted. Compared to Caucasians, African Americans/Blacks were 28% (adjusted prevalence odds ratio [APOR] = 0.72, 99% CI, 0.60-0.80), while Hispanics 33% (APOR, 0.67, 99% CI, 0.53-0.84) and Asians 37% (APOR, 0.63, 99% CI, 0.43-0.95) were less likely to be screened for CRC.
Conclusion: Among older Americans, racial/ethnic disparities in CRCS exist, which was unexplained by racial/ethnic variance in the covariates associated with CRCS. These findings recommend further studies in enhancing the understanding of confounders and mediators of disparities in CRCS and the application of these factors including the health belief model in improving CRCS among ethnic/racial minorities
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
This paper introduces the Multi-Genre Natural Language Inference (MultiNLI)
corpus, a dataset designed for use in the development and evaluation of machine
learning models for sentence understanding. In addition to being one of the
largest corpora available for the task of NLI, at 433k examples, this corpus
improves upon available resources in its coverage: it offers data from ten
distinct genres of written and spoken English--making it possible to evaluate
systems on nearly the full complexity of the language--and it offers an
explicit setting for the evaluation of cross-genre domain adaptation.Comment: 10 pages, 1 figures, 5 tables. v2 corrects a misreported accuracy
number for the CBOW model in the 'matched' setting. v3 adds a discussion of
the difficulty of the corpus to the analysis section. v4 is the version that
was accepted to NAACL201
The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks
NLP models have progressed drastically in recent years, according to numerous
datasets proposed to evaluate performance. Questions remain, however, about how
particular dataset design choices may impact the conclusions we draw about
model capabilities. In this work, we investigate this question in the domain of
compositional generalization. We examine the performance of six modeling
approaches across 4 datasets, split according to 8 compositional splitting
strategies, ranking models by 18 compositional generalization splits in total.
Our results show that: i) the datasets, although all designed to evaluate
compositional generalization, rank modeling approaches differently; ii)
datasets generated by humans align better with each other than they with
synthetic datasets, or than synthetic datasets among themselves; iii)
generally, whether datasets are sampled from the same source is more predictive
of the resulting model ranking than whether they maintain the same
interpretation of compositionality; and iv) which lexical items are used in the
data can strongly impact conclusions. Overall, our results demonstrate that
much work remains to be done when it comes to assessing whether popular
evaluation datasets measure what they intend to measure, and suggest that
elucidating more rigorous standards for establishing the validity of evaluation
sets could benefit the field.Comment: CoNLL202
Recommended from our members
ANLIzing the Adversarial Natural Language Inference Dataset
We perform an in-depth error analysis of the Adversarial NLI (ANLI) dataset, a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected dynamically over multiple rounds. We propose a fine-grained annotation scheme for the different aspects of inference responsible for the gold classification labels, and use it to hand-code the ANLI development sets in their entirety. We use these annotations to answer a variety of important questions: which models have the highest performance on each inference type, which inference types are most common, and which types are the most challenging for state-of-the-art models? We hope our annotations will enable more fine-grained evaluation of NLI models, and provide a deeper understanding of where models fail (and succeed). Both insights can guide us in training stronger models going forward
Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition
Natural language inference (NLI) is an increasingly important task for
natural language understanding, which requires one to infer whether a sentence
entails another. However, the ability of NLI models to make pragmatic
inferences remains understudied. We create an IMPlicature and PRESupposition
diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated
sentence pairs illustrating well-studied pragmatic inference types. We use
IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on
MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although
MultiNLI appears to contain very few pairs illustrating these inference types,
we find that BERT learns to draw pragmatic inferences. It reliably treats
scalar implicatures triggered by "some" as entailments. For some presupposition
triggers like "only", BERT reliably recognizes the presupposition as an
entailment, even when the trigger is embedded under an entailment canceling
operator like negation. BOW and InferSent show weaker evidence of pragmatic
reasoning. We conclude that NLI training encourages models to learn some, but
not all, pragmatic inferences.Comment: to appear in Proceedings of ACL 202
- …