119 research outputs found

    Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences

    Full text link
    Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin

    Iterative Residual Rescaling: An Analysis and Generalization of LSI

    Full text link
    We consider the problem of creating document representations in which inter-document similarity measurements correspond to semantic similarity. We first present a novel subspace-based framework for formalizing this task. Using this framework, we derive a new analysis of Latent Semantic Indexing (LSI), showing a precise relationship between its performance and the uniformity of the underlying distribution of documents over topics. This analysis helps explain the improvements gained by Ando's (2000) Iterative Residual Rescaling (IRR) algorithm: IRR can compensate for distributional non-uniformity. A further benefit of our framework is that it provides a well-motivated, effective method for automatically determining the rescaling factor IRR depends on, leading to further improvements. A series of experiments over various settings and with several evaluation metrics validates our claims.Comment: To appear in the proceedings of SIGIR 2001. 11 page

    Unsupervised Statistical Segmentation of Japanese Kanji Strings

    Full text link
    Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character nn-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of both standard and novel error metrics

    Ultrastructural and Immunohistochemical Studies on Uptake and Distribution of FITC-Conjugated PLGA Nanoparticles Administered Intratracheally in Rats

    Get PDF
    Polylactide-glycolide (PLGA) nanoparticles have been developed as pulmonary drug delivery carriers. To investigate their behavior, small- (d50 = 74 nm) and large-sized (d50 = 250 nm) FITC-conjugated PLGA nanoparticles were intratracheally administered to rats and were traced for 5, 30 and 60 minutes and 24 hours after administration (HAT). Immunohistochemically, a, FITC-positive reaction was observed in type-I alveolar epithelial cells (type-I AEC), endothelial cells and alveolar macrophages in the lungs from 5 minutes after treatment (MAT) to 24 HAT in both nanoparticle groups. In the kidneys, a positive reaction was observed in proximal tubular epithelial cells at 30 MAT; the reaction peaked at 60 MAT and was reduced at 24 HAT, while no positive reaction was seen in other sites. Ultrascructurally, the number of membrane-bound vesicles, which were approximately 70 nm in size and hard to distinguish from pinocytic vesicles, apparently increased in type-I AEC and endothelial cells at 5 MAT in the small-sized group, in comparison with the control group receiving physiological saline. The number of vesicles in the large-sized group was almost same as that in the control group. On the other hand, in both nanoparticle groups, lysosomes filled with nanoparticles appeared in alveolar macrophages from 30 MAT to 24 HAT. These results indicate that PLGA nanoparticles might be quickly transferred from the alveolar space to the blood vessel via type-I alveolar epithelial cells and excreted into urine, and that there is a threshold for particle size, less than approximately 70 nm in diameter, with regard to absorption through the alveolar wall

    TITAN: A Spatiotemporal Feature Learning Framework for Traffic Incident Duration Prediction

    Full text link
    Critical incident stages identification and reasonable prediction of traffic incident duration are essential in traffic incident management. In this paper, we propose a traffic incident duration prediction model that simultaneously predicts the impact of the traffic incidents and identifies the critical groups of temporal features via a multi-task learning framework. First, we formulate a sparsity optimization problem that extracts low-level temporal features based on traffic speed readings and then generalizes higher level features as phases of traffic incidents. Second, we propose novel constraints on feature similarity exploiting prior knowledge about the spatial connectivity of the road network to predict the incident duration. The proposed problem is challenging to solve due to the orthogonality constraints, non-convexity objective, and non-smoothness penalties. We develop an algorithm based on the alternating direction method of multipliers (ADMM) framework to solve the proposed formulation. Extensive experiments and comparisons to other models on real-world traffic data and traffic incident records justify the efficacy of our model

    Distribution and Sequence of Pyknotic Cells in Rat Fetuses Exposed to Busulfan

    Get PDF
    Busulfan, an antineoplastic bifunctional-alkylating agent, is known to induce developmental anomalies. In the present study, we examined the distribution and sequence of pyknotic cells in rat fetal tissues exposed to busulfan. Pregnant rats on gestation day 13 were administered intraperitoneally 30 mg/kg of busulfan, and fetal tissues were examined at 6, 12, 24, 36, 48, 72 and 96 hours after treatment (HAT). Pyknosis of component cells was observed markedly in the brain, moderately in the eyes and spinal cord and mildly in the craniofacial tissue, mandible, limb buds, tail bud, ganglions, alimentary tract, lungs, kidneys, pancreas and liver. In the brain, mitotic inhibition was also detected. Most of the pyknotic cells were considered to be apoptotic cells judging from the results of TUNEL staining and electron microscopic examination. Commonly in the above-mentioned tissues, pyknotic cells began to increase at 24 HAT, peaked at 36 or 48 HAT and disappeared at 96 HAT, which is when the histological picture returned to normal in most tissues except for the brain, spinal cord and eyes. The present study clarified the outline of busulfan-induced apoptosis in rat fetuses

    Risk factors for CAR-T cell manufacturing failure among DLBCL patients: A nationwide survey in Japan

    Get PDF
    CAR-T細胞製造を成功させるためのレシピ --アフェレーシス前の下ごしらえでの工夫--. 京都大学プレスリリース. 2023-04-27.For successful chimeric antigen receptor T (CAR-T) cell therapy, CAR-T cells must be manufactured without failure caused by suboptimal expansion. In order to determine risk factors for CAR-T cell manufacturing failure, we performed a nationwide cohort study in Japan and analysed patients with diffuse large B-cell lymphoma (DLBCL) who underwent tisagenlecleucel production. We compared clinical factors between 30 cases that failed (7.4%) with those that succeeded (n = 378). Among the failures, the proportion of patients previously treated with bendamustine (43.3% vs. 14.8%; p < 0.001) was significantly higher, and their platelet counts (12.0 vs. 17.0 × 10⁴/μL; p = 0.01) and CD4/CD8 T-cell ratio (0.30 vs. 0.56; p < 0.01) in peripheral blood at apheresis were significantly lower than in the successful group. Multivariate analysis revealed that repeated bendamustine use with short washout periods prior to apheresis (odds ratio [OR], 5.52; p = 0.013 for ≥6 cycles with washout period of 3–24 months; OR, 57.09; p = 0.005 for ≥3 cycles with washout period of <3 months), low platelet counts (OR, 0.495 per 105/μL; p = 0.022) or low CD4/CD8 ratios (<one third) (OR, 3.249; p = 0.011) in peripheral blood at apheresis increased the risk of manufacturing failure. Manufacturing failure remains an obstacle to CAR-T cell therapy for DLBCL patients. Avoiding risk factors, such as repeated bendamustine administration without sufficient washout, and risk-adapted strategies may help to optimize CAR-T cell therapy for DLBCL patients

    Overview of BioCreative II gene mention recognition.

    Get PDF
    Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions
    corecore