28,233 research outputs found
Depression and Self-Harm Risk Assessment in Online Forums
Users suffering from mental health conditions often turn to online resources
for support, including specialized online support communities or general
communities such as Twitter and Reddit. In this work, we present a neural
framework for supporting and studying users in both types of communities. We
propose methods for identifying posts in support communities that may indicate
a risk of self-harm, and demonstrate that our approach outperforms strong
previously proposed methods for identifying such posts. Self-harm is closely
related to depression, which makes identifying depressed users on general
forums a crucial related task. We introduce a large-scale general forum dataset
("RSDD") consisting of users with self-reported depression diagnoses matched
with control users. We show how our method can be applied to effectively
identify depressed users from their use of language alone. We demonstrate that
our method outperforms strong baselines on this general forum dataset.Comment: Expanded version of EMNLP17 paper. Added sections 6.1, 6.2, 6.4,
FastText baseline, and CNN-
Risk Regulation and the Faces of Uncertainty
Dr. Walker addresses the difficulty of regulators\u27 working with potentially inaccurate information and clarifies related aspects of decision making by presenting a taxonomy for the kinds of uncertainty inherent in necessarily incomplete data
Promoter Sequences Prediction Using Relational Association Rule Mining
In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal
Treatment Effects on Ordinal Outcomes: Causal Estimands and Sharp Bounds
Assessing the causal effects of interventions on ordinal outcomes is an
important objective of many educational and behavioral studies. Under the
potential outcomes framework, we can define causal effects as comparisons
between the potential outcomes under treatment and control. However,
unfortunately, the average causal effect, often the parameter of interest, is
difficult to interpret for ordinal outcomes. To address this challenge, we
propose to use two causal parameters, which are defined as the probabilities
that the treatment is beneficial and strictly beneficial for the experimental
units. However, although well-defined for any outcomes and of particular
interest for ordinal outcomes, the two aforementioned parameters depend on the
association between the potential outcomes, and are therefore not identifiable
from the observed data without additional assumptions. Echoing recent advances
in the econometrics and biostatistics literature, we present the sharp bounds
of the aforementioned causal parameters for ordinal outcomes, under fixed
marginal distributions of the potential outcomes. Because the causal estimands
and their corresponding sharp bounds are based on the potential outcomes
themselves, the proposed framework can be flexibly incorporated into any chosen
models of the potential outcomes, and are directly applicable to randomized
experiments, unconfounded observational studies, and randomized experiments
with noncompliance. We illustrate our methodology via numerical examples and
three real-life applications related to educational and behavioral research.Comment: Accepted by the Journal of Education and Behavioral Statistic
Using ordinal logistic regression to evaluate the performance of laser-Doppler predictions of burn-healing time
Background
Laser-Doppler imaging (LDI) of cutaneous blood flow is beginning to be used by burn surgeons to predict the healing time of burn wounds; predicted healing time is used to determine wound treatment as either dressings or surgery. In this paper, we do a statistical analysis of the performance of the technique.
Methods
We used data from a study carried out by five burn centers: LDI was done once between days 2 to 5 post burn, and healing was assessed at both 14 days and 21 days post burn. Random-effects ordinal logistic regression and other models such as the continuation ratio model were used to model healing-time as a function of the LDI data, and of demographic and wound history variables. Statistical methods were also used to study the false-color palette, which enables the laser-Doppler imager to be used by clinicians as a decision-support tool.
Results
Overall performance is that diagnoses are over 90% correct. Related questions addressed were what was the best blood flow summary statistic and whether, given the blood flow measurements, demographic and observational variables had any additional predictive power (age, sex, race, % total body surface area burned (%TBSA), site and cause of burn, day of LDI scan, burn center). It was found that mean laser-Doppler flux over a wound area was the best statistic, and that, given the same mean flux, women recover slightly more slowly than men. Further, the likely degradation in predictive performance on moving to a patient group with larger %TBSA than those in the data sample was studied, and shown to be small.
Conclusion
Modeling healing time is a complex statistical problem, with random effects due to multiple burn areas per individual, and censoring caused by patients missing hospital visits and undergoing surgery. This analysis applies state-of-the art statistical methods such as the bootstrap and permutation tests to a medical problem of topical interest. New medical findings are that age and %TBSA are not important predictors of healing time when the LDI results are known, whereas gender does influence recovery time, even when blood flow is controlled for.
The conclusion regarding the palette is that an optimum three-color palette can be chosen 'automatically', but the optimum choice of a 5-color palette cannot be made solely by optimizing the percentage of correct diagnoses
netgwas: An R Package for Network-Based Genome-Wide Association Studies
Graphical models are powerful tools for modeling and making statistical
inferences regarding complex associations among variables in multivariate data.
In this paper we introduce the R package netgwas, which is designed based on
undirected graphical models to accomplish three important and interrelated
goals in genetics: constructing linkage map, reconstructing linkage
disequilibrium (LD) networks from multi-loci genotype data, and detecting
high-dimensional genotype-phenotype networks. The netgwas package deals with
species with any chromosome copy number in a unified way, unlike other
software. It implements recent improvements in both linkage map construction
(Behrouzi and Wit, 2018), and reconstructing conditional independence network
for non-Gaussian continuous data, discrete data, and mixed
discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely
occur in genetics and genomics such as genotype data, and genotype-phenotype
data. We demonstrate the value of our package functionality by applying it to
various multivariate example datasets taken from the literature. We show, in
particular, that our package allows a more realistic analysis of data, as it
adjusts for the effect of all other variables while performing pairwise
associations. This feature controls for spurious associations between variables
that can arise from classical multiple testing approach. This paper includes a
brief overview of the statistical methods which have been implemented in the
package. The main body of the paper explains how to use the package. The
package uses a parallelization strategy on multi-core processors to speed-up
computations for large datasets. In addition, it contains several functions for
simulation and visualization. The netgwas package is freely available at
https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot
be longer than 1,920 characters", the abstract appearing here is slightly
shorter than that in the PDF fil
From a Domain Analysis to the Specification and Detection of Code and Design Smells
Code and design smells are recurring design problems in software systems that must be identified to avoid their possible negative consequences\ud
on development and maintenance. Consequently, several smell detection\ud
approaches and tools have been proposed in the literature. However,\ud
so far, they allow the detection of predefined smells but the detection\ud
of new smells or smells adapted to the context of the analysed systems\ud
is possible only by implementing new detection algorithms manually.\ud
Moreover, previous approaches do not explain the transition from\ud
specifications of smells to their detection. Finally, the validation\ud
of the existing approaches and tools has been limited on few proprietary\ud
systems and on a reduced number of smells. In this paper, we introduce\ud
an approach to automate the generation of detection algorithms from\ud
specifications written using a domain-specific language. This language\ud
is defined from a thorough domain analysis. It allows the specification\ud
of smells using high-level domain-related abstractions. It allows\ud
the adaptation of the specifications of smells to the context of\ud
the analysed systems.We specify 10 smells, generate automatically\ud
their detection algorithms using templates, and validate the algorithms\ud
in terms of precision and recall on Xerces v2.7.0 and GanttProject\ud
v1.10.2, two open-source object-oriented systems.We also compare\ud
the detection results with those of a previous approach, iPlasma
A Regression Discontinuity Design for Ordinal Running Variables: Evaluating Central Bank Purchases of Corporate Bonds
Regression discontinuity (RD) is a widely used quasi-experimental design for
causal inference. In the standard RD, the assignment to treatment is determined
by a continuous pretreatment variable (i.e., running variable) falling above or
below a pre-fixed threshold. In the case of the corporate sector purchase
programme (CSPP) of the European Central Bank, which involves large-scale
purchases of securities issued by corporations in the euro area, such a
threshold can be defined in terms of an ordinal running variable. This feature
poses challenges to RD estimation due to the lack of a meaningful measure of
distance. To evaluate such program, this paper proposes an RD approach for
ordinal running variables under the local randomization framework. The proposal
first estimates an ordered probit model for the ordinal running variable. The
estimated probability of being assigned to treatment is then adopted as a
latent continuous running variable and used to identify a covariate-balanced
subsample around the threshold. Assuming local unconfoundedness of the
treatment in the subsample, an estimate of the effect of the program is
obtained by employing a weighted estimator of the average treatment effect. Two
weighting estimators---overlap weights and ATT weights---as well as their
augmented versions are considered. We apply the method to evaluate the causal
effect of the CSPP and find a statistically significant and negative effect on
corporate bond spreads at issuance.Comment: Also available as Temi di discussione (Economic working papers) 1213,
Bank of Italy, Economic Research and International Relations Are
- …