105 research outputs found
Recommended from our members
General Purpose Computer-Assisted Clustering and Conceptualization
We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an “insightful” or “useful” way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given dataset (along with millions of other solutions we add based on combinations of existing clusterings) and enable a user to explore and interact with it and quickly reveal or prompt useful or insightful conceptualizations. In addition, although it is uncommon to do so in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than expert human coders or many existing fully automated methods.Governmen
Recommended from our members
No evidence for systematic voter fraud: A guide to statistical claims about the 2020 election
After the 2020 US presidential election Donald Trump refused to concede, alleging widespread and unparalleled voter fraud. Trump's supporters deployed several statistical arguments in an attempt to cast doubt on the result. Reviewing the most prominent of these statistical claims, we conclude that none of them is even remotely convincing. The common logic behind these claims is that, if the election were fairly conducted, some feature of the observed 2020 election result would be unlikely or impossible. In each case, we find that the purportedly anomalous fact is either not a fact or not anomalous. © 2021 National Academy of Sciences. All rights reserved
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond
A fundamental goal of scientific research is to learn about causal
relationships. However, despite its critical role in the life and social
sciences, causality has not had the same importance in Natural Language
Processing (NLP), which has traditionally placed more emphasis on predictive
tasks. This distinction is beginning to fade, with an emerging area of
interdisciplinary research at the convergence of causal inference and language
processing. Still, research on causality in NLP remains scattered across
domains without unified definitions, benchmark datasets and clear articulations
of the challenges and opportunities in the application of causal inference to
the textual domain, with its unique properties. In this survey, we consolidate
research across academic areas and situate it in the broader NLP landscape. We
introduce the statistical challenge of estimating causal effects with text,
encompassing settings where text is used as an outcome, treatment, or to
address confounding. In addition, we explore potential uses of causal inference
to improve the robustness, fairness, and interpretability of NLP models. We
thus provide a unified overview of causal inference for the NLP community.Comment: Accepted to Transactions of the Association for Computational
Linguistics (TACL
Effects of fluoxetine on functional outcomes after acute stroke (FOCUS): a pragmatic, double-blind, randomised, controlled trial
Background
Results of small trials indicate that fluoxetine might improve functional outcomes after stroke. The FOCUS trial aimed to provide a precise estimate of these effects.
Methods
FOCUS was a pragmatic, multicentre, parallel group, double-blind, randomised, placebo-controlled trial done at 103 hospitals in the UK. Patients were eligible if they were aged 18 years or older, had a clinical stroke diagnosis, were enrolled and randomly assigned between 2 days and 15 days after onset, and had focal neurological deficits. Patients were randomly allocated fluoxetine 20 mg or matching placebo orally once daily for 6 months via a web-based system by use of a minimisation algorithm. The primary outcome was functional status, measured with the modified Rankin Scale (mRS), at 6 months. Patients, carers, health-care staff, and the trial team were masked to treatment allocation. Functional status was assessed at 6 months and 12 months after randomisation. Patients were analysed according to their treatment allocation. This trial is registered with the ISRCTN registry, number ISRCTN83290762.
Findings
Between Sept 10, 2012, and March 31, 2017, 3127 patients were recruited. 1564 patients were allocated fluoxetine and 1563 allocated placebo. mRS data at 6 months were available for 1553 (99·3%) patients in each treatment group. The distribution across mRS categories at 6 months was similar in the fluoxetine and placebo groups (common odds ratio adjusted for minimisation variables 0·951 [95% CI 0·839–1·079]; p=0·439). Patients allocated fluoxetine were less likely than those allocated placebo to develop new depression by 6 months (210 [13·43%] patients vs 269 [17·21%]; difference 3·78% [95% CI 1·26–6·30]; p=0·0033), but they had more bone fractures (45 [2·88%] vs 23 [1·47%]; difference 1·41% [95% CI 0·38–2·43]; p=0·0070). There were no significant differences in any other event at 6 or 12 months.
Interpretation
Fluoxetine 20 mg given daily for 6 months after acute stroke does not seem to improve functional outcomes. Although the treatment reduced the occurrence of depression, it increased the frequency of bone fractures. These results do not support the routine use of fluoxetine either for the prevention of post-stroke depression or to promote recovery of function.
Funding
UK Stroke Association and NIHR Health Technology Assessment Programme
Effect of angiotensin-converting enzyme inhibitor and angiotensin receptor blocker initiation on organ support-free days in patients hospitalized with COVID-19
IMPORTANCE Overactivation of the renin-angiotensin system (RAS) may contribute to poor clinical outcomes in patients with COVID-19.
Objective To determine whether angiotensin-converting enzyme (ACE) inhibitor or angiotensin receptor blocker (ARB) initiation improves outcomes in patients hospitalized for COVID-19.
DESIGN, SETTING, AND PARTICIPANTS In an ongoing, adaptive platform randomized clinical trial, 721 critically ill and 58 non–critically ill hospitalized adults were randomized to receive an RAS inhibitor or control between March 16, 2021, and February 25, 2022, at 69 sites in 7 countries (final follow-up on June 1, 2022).
INTERVENTIONS Patients were randomized to receive open-label initiation of an ACE inhibitor (n = 257), ARB (n = 248), ARB in combination with DMX-200 (a chemokine receptor-2 inhibitor; n = 10), or no RAS inhibitor (control; n = 264) for up to 10 days.
MAIN OUTCOMES AND MEASURES The primary outcome was organ support–free days, a composite of hospital survival and days alive without cardiovascular or respiratory organ support through 21 days. The primary analysis was a bayesian cumulative logistic model. Odds ratios (ORs) greater than 1 represent improved outcomes.
RESULTS On February 25, 2022, enrollment was discontinued due to safety concerns. Among 679 critically ill patients with available primary outcome data, the median age was 56 years and 239 participants (35.2%) were women. Median (IQR) organ support–free days among critically ill patients was 10 (–1 to 16) in the ACE inhibitor group (n = 231), 8 (–1 to 17) in the ARB group (n = 217), and 12 (0 to 17) in the control group (n = 231) (median adjusted odds ratios of 0.77 [95% bayesian credible interval, 0.58-1.06] for improvement for ACE inhibitor and 0.76 [95% credible interval, 0.56-1.05] for ARB compared with control). The posterior probabilities that ACE inhibitors and ARBs worsened organ support–free days compared with control were 94.9% and 95.4%, respectively. Hospital survival occurred in 166 of 231 critically ill participants (71.9%) in the ACE inhibitor group, 152 of 217 (70.0%) in the ARB group, and 182 of 231 (78.8%) in the control group (posterior probabilities that ACE inhibitor and ARB worsened hospital survival compared with control were 95.3% and 98.1%, respectively).
CONCLUSIONS AND RELEVANCE In this trial, among critically ill adults with COVID-19, initiation of an ACE inhibitor or ARB did not improve, and likely worsened, clinical outcomes.
TRIAL REGISTRATION ClinicalTrials.gov Identifier: NCT0273570
Replication data for: Representational Style: The Central Role of Communication in Representation
Quantitative studies define representation through roll call votes, but roll call votes alone are insufficient to describe a legislator's \emph{representational style}: how legislators respond, anticipate, and shape their constituents' preferences. Representational style is composed of three elements: how legislators invest their time and resources in Washington, their \emph{Washington styles}; the way they connect to their constituents in their district, their \emph{home styles}; and how they vote on legislation, their \emph{voting}. In this book, I use new comprehensive, systematic, and verifiable measures of home style to demonstrate its central role in representational style. Home styles are important on their own as the primary way legislators define the representation provided to constituents. Legislators define diverse, stable, and nonpartisan home styles that reflect senators' multiple motivations in the institution. Home styles are also important because they are much more than cheap talk. I demonstrate that home styles are systematically related to what legislators do in Washington and how senators vote on controversial legislation, therefore providing a credible indicator of legislators' representational styles. Legislators also value opportunities to maintain their home styles, which bureaucrats exploit to cultivate support for their agencies. Because analyzing home styles across all legislators using standard methods and data is infeasible, I introduce a new Bayesian statistical model for political texts (estimated using a variational approximation, a deterministic alternative to MCMC) and an original collection of over 64,000 Senate press releases to measure home style
An Introduction to Bayesian Inference via Variational Approximations.” Political Analysis 19(1):32–47.
The goal of a variational approximation is to approximate a posterior, p(β|Y ) by making an approximating distribution, q(β), as close as possible to the true posterior If we make no assumptions about the factorized distribution, then Equation 1.1 is minimized when q(β) = p(β|Y ) (because log 1 = 0). Of course, this is not particularly helpful, because the posterior is generally intractable. To make manipulation of the approximating distribution possible, we introduce additional assumptions into the approximating distribution, with the hope that this will make inference tractable, while also still providing a close approximation to the true posterior. This presents the inherent tension when approximating a posterior with a variational approximation, while also illustrating one of the major advantages of variational approximations. On the one hand, using a more general approximating distribution will lead to a better approximation of the posterior distribution: we make the approximation arbitrarily close by reducing the number of assumptions made in the approximating distribution. But fewer assumptions result in the approximating distribution being less tractable, limiting the usefulness of the approximation. On the other hand, introducing more restrictive assumptions to the approximating distribution, results in a poorer approximation (or a greater KL-divergence between the approximating distribution and true posterior). But if the additional assumptions imposed upon the approximation are useful, then the approximating distribution will be easier to manipulate for inference. Therefore, the goal (and indeed, the art) of applying variational approximations is to identify a set of approximating distributions that lead to tractable inferences, while also providing a close fit to the true posterior distribution. This tension is actually an advantage of the variational approximation over other approximation methods, because we can control the quality of our approximation by controlling the assumptions made in the approximating distributions. Following a large literature in computer science and machine learning, we use a very general form of
Replication data for: Representational Style: The Central Role of Communication in Representation
Quantitative studies define representation through roll call votes, but roll call votes alone are insufficient to describe a legislator's \emph{representational style}: how legislators respond, anticipate, and shape their constituents' preferences. Representational style is composed of three elements: how legislators invest their time and resources in Washington, their \emph{Washington styles}; the way they connect to their constituents in their district, their \emph{home styles}; and how they vote on legislation, their \emph{voting}. In this book, I use new comprehensive, systematic, and verifiable measures of home style to demonstrate its central role in representational style. Home styles are important on their own as the primary way legislators define the representation provided to constituents. Legislators define diverse, stable, and nonpartisan home styles that reflect senators' multiple motivations in the institution. Home styles are also important because they are much more than cheap talk. I demonstrate that home styles are systematically related to what legislators do in Washington and how senators vote on controversial legislation, therefore providing a credible indicator of legislators' representational styles. Legislators also value opportunities to maintain their home styles, which bureaucrats exploit to cultivate support for their agencies. Because analyzing home styles across all legislators using standard methods and data is infeasible, I introduce a new Bayesian statistical model for political texts (estimated using a variational approximation, a deterministic alternative to MCMC) and an original collection of over 64,000 Senate press releases to measure home style
- …