153 research outputs found

    Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

    Full text link
    Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin

    Iterative Residual Rescaling: An Analysis and Generalization of LSI

    Full text link
    We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's (2000) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.Comment: To appear in the proceedings of SIGIR 2001. 11 page

    Trauma related rumination mediates the effect of naturally occurring depressive symptoms, but not momentary low mood on trauma intrusions

    Get PDF
    Comorbid depression is known to contribute to the maintenance of posttraumatic stress disorder (PTSD ) including distressing intrusive trauma memories. It is theorised that depression is a risk factor for persistent PTSD through preventing optimal habituation of distress provoked by trauma memories and reminders, but the underlying cognitive mechanisms responsible are uncertain. The present study investigated trauma‐related rumination as a possible mediator for the effect of depression on trauma intrusions. Participants received a low mood induction or control procedure. Following viewing an analogue trauma film, frequency of film‐related intrusions and associated distress levels were measured and at 1‐week follow‐up. Between the two occasions, participants rated their levels of rumination about the film. Existing depression symptoms but not induced momentary sad mood predicted frequency of film intrusions and associated distress at 1‐week follow‐up. Some evidence was found that ruminative trauma processing mediated the relationship between baseline depressive symptoms and later intrusion frequency and associated distress. Future research is warranted to better understand the role of rumination in the depression-intrusion relationship, which may shed light on the clinical applicability of rumination‐targeted intervention for PTSD and comorbid depression

    TITAN: A Spatiotemporal Feature Learning Framework for Traffic Incident Duration Prediction

    Full text link
    Critical incident stages identification and reasonable prediction of traffic incident duration are essential in traffic incident management. In this paper, we propose a traffic incident duration prediction model that simultaneously predicts the impact of the traffic incidents and identifies the critical groups of temporal features via a multi-task learning framework. First, we formulate a sparsity optimization problem that extracts low-level temporal features based on traffic speed readings and then generalizes higher level features as phases of traffic incidents. Second, we propose novel constraints on feature similarity exploiting prior knowledge about the spatial connectivity of the road network to predict the incident duration. The proposed problem is challenging to solve due to the orthogonality constraints, non-convexity objective, and non-smoothness penalties. We develop an algorithm based on the alternating direction method of multipliers (ADMM) framework to solve the proposed formulation. Extensive experiments and comparisons to other models on real-world traffic data and traffic incident records justify the efficacy of our model
    corecore