33 research outputs found
Execution-based Code Generation using Deep Reinforcement Learning
The utilization of programming language (PL) models, pretrained on
large-scale code corpora, as a means of automating software engineering
processes has demonstrated considerable potential in streamlining various code
generation tasks such as code completion, code translation, and program
synthesis. However, current approaches mainly rely on supervised fine-tuning
objectives borrowed from text generation, neglecting specific sequence-level
features of code, including but not limited to compilability as well as
syntactic and functional correctness. To address this limitation, we propose
PPOCoder, a new framework for code generation that combines pretrained PL
models with Proximal Policy Optimization (PPO) deep reinforcement learning and
employs execution feedback as the external source of knowledge into the model
optimization. PPOCoder is transferable across different code generation tasks
and PLs. Extensive experiments on three code generation tasks demonstrate the
effectiveness of our proposed approach compared to SOTA methods, improving the
success rate of compilation and functional correctness over different PLs. Our
code can be found at https://github.com/reddy-lab-code-research/PPOCoder
Identifying TBI Physiological States by Clustering Multivariate Clinical Time-Series Data
Determining clinically relevant physiological states from multivariate time
series data with missing values is essential for providing appropriate
treatment for acute conditions such as Traumatic Brain Injury (TBI),
respiratory failure, and heart failure. Utilizing non-temporal clustering or
data imputation and aggregation techniques may lead to loss of valuable
information and biased analyses. In our study, we apply the SLAC-Time
algorithm, an innovative self-supervision-based approach that maintains data
integrity by avoiding imputation or aggregation, offering a more useful
representation of acute patient states. By using SLAC-Time to cluster data in a
large research dataset, we identified three distinct TBI physiological states
and their specific feature profiles. We employed various clustering evaluation
metrics and incorporated input from a clinical domain expert to validate and
interpret the identified physiological states. Further, we discovered how
specific clinical events and interventions can influence patient states and
state transitions.Comment: 10 pages, 7 figures, 2 table
A Self-Supervised Learning-based Approach to Clustering Multivariate Time-Series Data with Missing Values (SLAC-Time): An Application to TBI Phenotyping
Self-supervised learning approaches provide a promising direction for
clustering multivariate time-series data. However, real-world time-series data
often include missing values, and the existing approaches require imputing
missing values before clustering, which may cause extensive computations and
noise and result in invalid interpretations. To address these challenges, we
present a Self-supervised Learning-based Approach to Clustering multivariate
Time-series data with missing values (SLAC-Time). SLAC-Time is a
Transformer-based clustering method that uses time-series forecasting as a
proxy task for leveraging unlabeled data and learning more robust time-series
representations. This method jointly learns the neural network parameters and
the cluster assignments of the learned representations. It iteratively clusters
the learned representations with the K-means method and then utilizes the
subsequent cluster assignments as pseudo-labels to update the model parameters.
To evaluate our proposed approach, we applied it to clustering and phenotyping
Traumatic Brain Injury (TBI) patients in the Transforming Research and Clinical
Knowledge in Traumatic Brain Injury (TRACK-TBI) study. Our experiments
demonstrate that SLAC-Time outperforms the baseline K-means clustering
algorithm in terms of silhouette coefficient, Calinski Harabasz index, Dunn
index, and Davies Bouldin index. We identified three TBI phenotypes that are
distinct from one another in terms of clinically significant variables as well
as clinical outcomes, including the Extended Glasgow Outcome Scale (GOSE)
score, Intensive Care Unit (ICU) length of stay, and mortality rate. The
experiments show that the TBI phenotypes identified by SLAC-Time can be
potentially used for developing targeted clinical trials and therapeutic
strategies.Comment: Submitted to the Journal of Biomedical Informatic
Psychopathology predicts the outcome of medial branch blocks with corticosteroid for chronic axial low back or cervical pain: a prospective cohort study
<p>Abstract</p> <p>Background</p> <p>Comorbid psychopathology is an important predictor of poor outcome for many types of treatments for back or neck pain. But it is unknown if this applies to the results of medial branch blocks (MBBs) for chronic low back or neck pain, which involves injecting the medial branch of the dorsal ramus nerves that innervate the facet joints. The objective of this study was to determine whether high levels of psychopathology are predictive of pain relief after MBB injections in the lumbar or cervical spine.</p> <p>Methods</p> <p>This was a prospective cohort study. Consecutive patients in a pain medicine practice undergoing MBBs of the lumbar or cervical facets with corticosteroids were recruited to participate. Subjects were selected for a MBB based on operationalized selection criteria and the procedure was performed in a standardized manner. Subjects completed the Brief Pain Inventory (BPI) and the Hospital Anxiety and Depression Scale (HADS) just prior to the procedure and at one-month follow up. Scores on the HADS classified the subjects into three groups based on psychiatric symptoms, which formed the primary predictor variable: <it>Low</it>, <it>Moderate</it>, or <it>High </it>levels of psychopathology. The primary outcome measure was the percent improvement in average daily pain rating one-month following an injection. Analysis of variance and chi-square were used to analyze the analgesia and functional rating differences between groups, and to perform a responder analysis.</p> <p>Results</p> <p>Eighty six (86) subjects completed the study. The <it>Low </it>psychopathology group (n = 37) reported a mean of 23% improvement in pain at one-month while the <it>High </it>psychopathology group (n = 29) reported a mean worsening of -5.8% in pain (p < .001). Forty five percent (45%) of the <it>Low </it>group had at least 30% improvement in pain versus 10% in the <it>High </it>group (p < .001). Using an analysis of covariance, no baseline demographic, social, or medical variables were significant predictors of pain improvement, nor did they mitigate the effect of psychopathology on the outcome.</p> <p>Conclusion</p> <p>Psychiatric comorbidity is associated with diminished pain relief after a MBB injection performed with steroid at one-month follow-up. These findings illustrate the importance of assessing comorbid psychopathology as part of a spine care evaluation.</p
Analyzing a fake news authorship network
This project synthesizes a set of 246 fake news websites previously identified in three earlier research projects. From this dataset, we extract a set of all authors who have written for these sites in 2016. This authorcentric dataset is itself a contribution that will allow future analysis of the fake news ecosystem. Based on the data we collected, we construct a network of fake news sites, linking them if they shared a common author. Our analysis shows a tight cluster of author-sharing sites, with a small core set of sites sharing dozens of authors
International Consensus Statement on Rhinology and Allergy: Rhinosinusitis
Background: The 5 years since the publication of the first International Consensus Statement on Allergy and Rhinology: Rhinosinusitis (ICARâRS) has witnessed foundational progress in our understanding and treatment of rhinologic disease. These advances are reflected within the more than 40 new topics covered within the ICARâRSâ2021 as well as updates to the original 140 topics. This executive summary consolidates the evidenceâbased findings of the document. Methods: ICARâRS presents over 180 topics in the forms of evidenceâbased reviews with recommendations (EBRRs), evidenceâbased reviews, and literature reviews. The highest grade structured recommendations of the EBRR sections are summarized in this executive summary. Results: ICARâRSâ2021 covers 22 topics regarding the medical management of RS, which are grade A/B and are presented in the executive summary. Additionally, 4 topics regarding the surgical management of RS are grade A/B and are presented in the executive summary. Finally, a comprehensive evidenceâbased management algorithm is provided. Conclusion: This ICARâRSâ2021 executive summary provides a compilation of the evidenceâbased recommendations for medical and surgical treatment of the most common forms of RS
StructCoder: Structure-Aware Transformer for Code Generation
There has been a recent surge of interest in automating software engineering
tasks using deep learning. This work addresses the problem of code generation
where the goal is to generate target code given source code in a different
language or a natural language description. Most of the state-of-the-art deep
learning models for code generation use training strategies that are primarily
designed for natural language. However, understanding and generating code
requires a more rigorous comprehension of the code syntax and semantics. With
this motivation, we develop an encoder-decoder Transformer model where both the
encoder and decoder are trained to recognize the syntax and data flow in the
source and target codes, respectively. We not only make the encoder
structure-aware by leveraging the source code's syntax tree and data flow
graph, but we also ensure that our decoder preserves the syntax and data flow
of the target code by introducing two auxiliary tasks: AST (Abstract Syntax
Tree) paths prediction and data flow prediction. To the best of our knowledge,
this is the first work to introduce a structure-aware Transformer decoder to
enhance the quality of generated code by modeling target syntax and data flow.
The proposed StructCoder model achieves state-of-the-art performance on code
translation and text-to-code generation tasks in the CodeXGLUE benchmark
WindowSHAP: An Efficient Framework for Explaining Time-series Classifiers based on Shapley Values
Unpacking and comprehending how black-box machine learning algorithms make
decisions has been a persistent challenge for researchers and end-users.
Explaining time-series predictive models is useful for clinical applications
with high stakes to understand the behavior of prediction models. However,
existing approaches to explain such models are frequently unique to data where
the features do not have a time-varying component. In this paper, we introduce
WindowSHAP, a model-agnostic framework for explaining time-series classifiers
using Shapley values. We intend for WindowSHAP to mitigate the computational
complexity of calculating Shapley values for long time-series data as well as
improve the quality of explanations. WindowSHAP is based on partitioning a
sequence into time windows. Under this framework, we present three distinct
algorithms of Stationary, Sliding and Dynamic WindowSHAP, each evaluated
against baseline approaches, KernelSHAP and TimeSHAP, using perturbation and
sequence analyses metrics. We applied our framework to clinical time-series
data from both a specialized clinical domain (Traumatic Brain Injury - TBI) as
well as a broad clinical domain (critical care medicine). The experimental
results demonstrate that, based on the two quantitative metrics, our framework
is superior at explaining clinical time-series classifiers, while also reducing
the complexity of computations. We show that for time-series data with 120 time
steps (hours), merging 10 adjacent time points can reduce the CPU time of
WindowSHAP by 80% compared to KernelSHAP. We also show that our Dynamic
WindowSHAP algorithm focuses more on the most important time steps and provides
more understandable explanations. As a result, WindowSHAP not only accelerates
the calculation of Shapley values for time-series data, but also delivers more
understandable explanations with higher quality.Comment: Submitted to the Journal of Biomedical Informatic