28,233 research outputs found

    Depression and Self-Harm Risk Assessment in Online Forums

    Full text link
    Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset.Comment: Expanded version of EMNLP17 paper. Added sections 6.1, 6.2, 6.4, FastText baseline, and CNN-

    Risk Regulation and the Faces of Uncertainty

    Get PDF
    Dr. Walker addresses the difficulty of regulators\u27 working with potentially inaccurate information and clarifies related aspects of decision making by presenting a taxonomy for the kinds of uncertainty inherent in necessarily incomplete data

    Promoter Sequences Prediction Using Relational Association Rule Mining

    Get PDF
    In this paper we are approaching, from a computational perspective, the problem of promoter sequences prediction, an important problem within the field of bioinformatics. As the conditions for a DNA sequence to function as a promoter are not known, machine learning based classification models are still developed to approach the problem of promoter identification in the DNA. We are proposing a classification model based on relational association rules mining. Relational association rules are a particular type of association rules and describe numerical orderings between attributes that commonly occur over a data set. Our classifier is based on the discovery of relational association rules for predicting if a DNA sequence contains or not a promoter region. An experimental evaluation of the proposed model and comparison with similar existing approaches is provided. The obtained results show that our classifier overperforms the existing techniques for identifying promoter sequences, confirming the potential of our proposal

    Treatment Effects on Ordinal Outcomes: Causal Estimands and Sharp Bounds

    Full text link
    Assessing the causal effects of interventions on ordinal outcomes is an important objective of many educational and behavioral studies. Under the potential outcomes framework, we can define causal effects as comparisons between the potential outcomes under treatment and control. However, unfortunately, the average causal effect, often the parameter of interest, is difficult to interpret for ordinal outcomes. To address this challenge, we propose to use two causal parameters, which are defined as the probabilities that the treatment is beneficial and strictly beneficial for the experimental units. However, although well-defined for any outcomes and of particular interest for ordinal outcomes, the two aforementioned parameters depend on the association between the potential outcomes, and are therefore not identifiable from the observed data without additional assumptions. Echoing recent advances in the econometrics and biostatistics literature, we present the sharp bounds of the aforementioned causal parameters for ordinal outcomes, under fixed marginal distributions of the potential outcomes. Because the causal estimands and their corresponding sharp bounds are based on the potential outcomes themselves, the proposed framework can be flexibly incorporated into any chosen models of the potential outcomes, and are directly applicable to randomized experiments, unconfounded observational studies, and randomized experiments with noncompliance. We illustrate our methodology via numerical examples and three real-life applications related to educational and behavioral research.Comment: Accepted by the Journal of Education and Behavioral Statistic

    Using ordinal logistic regression to evaluate the performance of laser-Doppler predictions of burn-healing time

    Get PDF
    Background Laser-Doppler imaging (LDI) of cutaneous blood flow is beginning to be used by burn surgeons to predict the healing time of burn wounds; predicted healing time is used to determine wound treatment as either dressings or surgery. In this paper, we do a statistical analysis of the performance of the technique. Methods We used data from a study carried out by five burn centers: LDI was done once between days 2 to 5 post burn, and healing was assessed at both 14 days and 21 days post burn. Random-effects ordinal logistic regression and other models such as the continuation ratio model were used to model healing-time as a function of the LDI data, and of demographic and wound history variables. Statistical methods were also used to study the false-color palette, which enables the laser-Doppler imager to be used by clinicians as a decision-support tool. Results Overall performance is that diagnoses are over 90% correct. Related questions addressed were what was the best blood flow summary statistic and whether, given the blood flow measurements, demographic and observational variables had any additional predictive power (age, sex, race, % total body surface area burned (%TBSA), site and cause of burn, day of LDI scan, burn center). It was found that mean laser-Doppler flux over a wound area was the best statistic, and that, given the same mean flux, women recover slightly more slowly than men. Further, the likely degradation in predictive performance on moving to a patient group with larger %TBSA than those in the data sample was studied, and shown to be small. Conclusion Modeling healing time is a complex statistical problem, with random effects due to multiple burn areas per individual, and censoring caused by patients missing hospital visits and undergoing surgery. This analysis applies state-of-the art statistical methods such as the bootstrap and permutation tests to a medical problem of topical interest. New medical findings are that age and %TBSA are not important predictors of healing time when the LDI results are known, whereas gender does influence recovery time, even when blood flow is controlled for. The conclusion regarding the palette is that an optimum three-color palette can be chosen 'automatically', but the optimum choice of a 5-color palette cannot be made solely by optimizing the percentage of correct diagnoses

    netgwas: An R Package for Network-Based Genome-Wide Association Studies

    Full text link
    Graphical models are powerful tools for modeling and making statistical inferences regarding complex associations among variables in multivariate data. In this paper we introduce the R package netgwas, which is designed based on undirected graphical models to accomplish three important and interrelated goals in genetics: constructing linkage map, reconstructing linkage disequilibrium (LD) networks from multi-loci genotype data, and detecting high-dimensional genotype-phenotype networks. The netgwas package deals with species with any chromosome copy number in a unified way, unlike other software. It implements recent improvements in both linkage map construction (Behrouzi and Wit, 2018), and reconstructing conditional independence network for non-Gaussian continuous data, discrete data, and mixed discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely occur in genetics and genomics such as genotype data, and genotype-phenotype data. We demonstrate the value of our package functionality by applying it to various multivariate example datasets taken from the literature. We show, in particular, that our package allows a more realistic analysis of data, as it adjusts for the effect of all other variables while performing pairwise associations. This feature controls for spurious associations between variables that can arise from classical multiple testing approach. This paper includes a brief overview of the statistical methods which have been implemented in the package. The main body of the paper explains how to use the package. The package uses a parallelization strategy on multi-core processors to speed-up computations for large datasets. In addition, it contains several functions for simulation and visualization. The netgwas package is freely available at https://cran.r-project.org/web/packages/netgwasComment: 32 pages, 9 figures; due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF fil

    From a Domain Analysis to the Specification and Detection of Code and Design Smells

    Get PDF
    Code and design smells are recurring design problems in software systems that must be identified to avoid their possible negative consequences\ud on development and maintenance. Consequently, several smell detection\ud approaches and tools have been proposed in the literature. However,\ud so far, they allow the detection of predefined smells but the detection\ud of new smells or smells adapted to the context of the analysed systems\ud is possible only by implementing new detection algorithms manually.\ud Moreover, previous approaches do not explain the transition from\ud specifications of smells to their detection. Finally, the validation\ud of the existing approaches and tools has been limited on few proprietary\ud systems and on a reduced number of smells. In this paper, we introduce\ud an approach to automate the generation of detection algorithms from\ud specifications written using a domain-specific language. This language\ud is defined from a thorough domain analysis. It allows the specification\ud of smells using high-level domain-related abstractions. It allows\ud the adaptation of the specifications of smells to the context of\ud the analysed systems.We specify 10 smells, generate automatically\ud their detection algorithms using templates, and validate the algorithms\ud in terms of precision and recall on Xerces v2.7.0 and GanttProject\ud v1.10.2, two open-source object-oriented systems.We also compare\ud the detection results with those of a previous approach, iPlasma

    A Regression Discontinuity Design for Ordinal Running Variables: Evaluating Central Bank Purchases of Corporate Bonds

    Full text link
    Regression discontinuity (RD) is a widely used quasi-experimental design for causal inference. In the standard RD, the assignment to treatment is determined by a continuous pretreatment variable (i.e., running variable) falling above or below a pre-fixed threshold. In the case of the corporate sector purchase programme (CSPP) of the European Central Bank, which involves large-scale purchases of securities issued by corporations in the euro area, such a threshold can be defined in terms of an ordinal running variable. This feature poses challenges to RD estimation due to the lack of a meaningful measure of distance. To evaluate such program, this paper proposes an RD approach for ordinal running variables under the local randomization framework. The proposal first estimates an ordered probit model for the ordinal running variable. The estimated probability of being assigned to treatment is then adopted as a latent continuous running variable and used to identify a covariate-balanced subsample around the threshold. Assuming local unconfoundedness of the treatment in the subsample, an estimate of the effect of the program is obtained by employing a weighted estimator of the average treatment effect. Two weighting estimators---overlap weights and ATT weights---as well as their augmented versions are considered. We apply the method to evaluate the causal effect of the CSPP and find a statistically significant and negative effect on corporate bond spreads at issuance.Comment: Also available as Temi di discussione (Economic working papers) 1213, Bank of Italy, Economic Research and International Relations Are
    • …
    corecore