Search CORE

7,081 research outputs found

Estimation of Stochastic Attribute-Value Grammars using an Informative Sample

Author: Osborne Miles
Publication venue
Publication date: 01/01/2000
Field of study

We argue that some of the computational complexity associated with estimation of stochastic attribute-value grammars can be reduced by training upon an informative subset of the full training set. Results using the parsed Wall Street Journal corpus show that in some circumstances, it is possible to obtain better estimation results using an informative sample than when training upon all the available material. Further experimentation demonstrates that with unlexicalised models, a Gaussian Prior can reduce overfitting. However, when models are lexicalised and contain overlapping features, overfitting does not seem to be a problem, and a Gaussian Prior makes minimal difference to performance. Our approach is applicable for situations when there are an infeasibly large number of parses in the training set, or else for when recovery of these parses from a packed representation is itself computationally expensive.Comment: 6 pages, 2 figures. Coling 2000, Saarbr\"{u}cken, Germany. pp 586--59

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Depression and Self-Harm Risk Assessment in Online Forums

Author: Cohan Arman
Goharian Nazli
Yates Andrew
Publication venue
Publication date: 01/01/2017
Field of study

Users suffering from mental health conditions often turn to online resources for support, including specialized online support communities or general communities such as Twitter and Reddit. In this work, we present a neural framework for supporting and studying users in both types of communities. We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts. Self-harm is closely related to depression, which makes identifying depressed users on general forums a crucial related task. We introduce a large-scale general forum dataset ("RSDD") consisting of users with self-reported depression diagnoses matched with control users. We show how our method can be applied to effectively identify depressed users from their use of language alone. We demonstrate that our method outperforms strong baselines on this general forum dataset.Comment: Expanded version of EMNLP17 paper. Added sections 6.1, 6.2, 6.4, FastText baseline, and CNN-

arXiv.org e-Print Archive

Crossref

MPG.PuRe