153 research outputs found
Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences
Given the lack of word delimiters in written Japanese, word segmentation is
generally considered a crucial first step in processing Japanese texts. Typical
Japanese segmentation algorithms rely either on a lexicon and syntactic
analysis or on pre-segmented data; but these are labor-intensive, and the
lexico-syntactic techniques are vulnerable to the unknown word problem. In
contrast, we introduce a novel, more robust statistical method utilizing
unsegmented training data. Despite its simplicity, the algorithm yields
performance on long kanji sequences comparable to and sometimes surpassing that
of state-of-the-art morphological analyzers over a variety of error metrics.
The algorithm also outperforms another mostly-unsupervised statistical
algorithm previously proposed for Chinese.
Additionally, we present a two-level annotation scheme for Japanese to
incorporate multiple segmentation granularities, and introduce two novel
evaluation metrics, both based on the notion of a compatible bracket, that can
account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin
Iterative Residual Rescaling: An Analysis and Generalization of LSI
We consider the problem of creating document representations in which
inter-document similarity measurements correspond to semantic similarity. We
first present a novel subspace-based framework for formalizing this task. Using
this framework, we derive a new analysis of Latent Semantic Indexing (LSI),
showing a precise relationship between its performance and the uniformity of
the underlying distribution of documents over topics. This analysis helps
explain the improvements gained by Ando's (2000) Iterative Residual Rescaling
(IRR) algorithm: IRR can compensate for distributional non-uniformity. A
further benefit of our framework is that it provides a well-motivated,
effective method for automatically determining the rescaling factor IRR depends
on, leading to further improvements. A series of experiments over various
settings and with several evaluation metrics validates our claims.Comment: To appear in the proceedings of SIGIR 2001. 11 page
Trauma related rumination mediates the effect of naturally occurring depressive symptoms, but not momentary low mood on trauma intrusions
Comorbid depression is known to contribute to the maintenance of posttraumatic stress disorder (PTSD ) including distressing intrusive trauma memories. It is theorised that depression is a risk factor for persistent PTSD through preventing optimal habituation of distress provoked by trauma memories and reminders, but the underlying cognitive mechanisms responsible are uncertain. The present study investigated trauma‐related rumination as a possible mediator for the effect of depression on trauma intrusions. Participants received a low mood induction or control procedure. Following viewing an analogue trauma film, frequency of film‐related intrusions and associated distress levels were measured and at 1‐week follow‐up. Between the two occasions, participants rated their levels of rumination about the film. Existing depression symptoms but not induced momentary sad mood predicted frequency of film intrusions and associated distress at 1‐week follow‐up. Some evidence was found that ruminative trauma processing mediated the relationship between baseline depressive symptoms and later intrusion frequency and associated distress. Future research is warranted to better understand the role of rumination in the depression-intrusion relationship, which may shed light on the clinical applicability of rumination‐targeted intervention for PTSD and comorbid depression
TITAN: A Spatiotemporal Feature Learning Framework for Traffic Incident Duration Prediction
Critical incident stages identification and reasonable prediction of traffic
incident duration are essential in traffic incident management. In this paper,
we propose a traffic incident duration prediction model that simultaneously
predicts the impact of the traffic incidents and identifies the critical groups
of temporal features via a multi-task learning framework. First, we formulate a
sparsity optimization problem that extracts low-level temporal features based
on traffic speed readings and then generalizes higher level features as phases
of traffic incidents. Second, we propose novel constraints on feature
similarity exploiting prior knowledge about the spatial connectivity of the
road network to predict the incident duration. The proposed problem is
challenging to solve due to the orthogonality constraints, non-convexity
objective, and non-smoothness penalties. We develop an algorithm based on the
alternating direction method of multipliers (ADMM) framework to solve the
proposed formulation. Extensive experiments and comparisons to other models on
real-world traffic data and traffic incident records justify the efficacy of
our model
- …