16 research outputs found

    Reasoning like human: Hierarchical reinforcement learning for knowledge graph reasoning

    Full text link
    Knowledge Graphs typically suffer from incompleteness. A popular approach to knowledge graph completion is to infer missing knowledge by multi-hop reasoning over the information found along other paths connecting a pair of entities. However, multi-hop reasoning is still challenging because the reasoning process usually experiences multiple semantic issue that a relation or an entity has multiple meanings. In order to deal with the situation, we propose a novel Hierarchical Reinforcement Learning framework to learn chains of reasoning from a Knowledge Graph automatically. Our framework is inspired by the hierarchical structure through which a human being handles cognitionally ambiguous cases. The whole reasoning process is decomposed into a hierarchy of two-level Reinforcement Learning policies for encoding historical information and learning structured action space. As a consequence, it is more feasible and natural for dealing with the multiple semantic issue. Experimental results show that our proposed model achieves substantial improvements in ambiguous relation tasks

    Leveraging Discourse Rewards for Document-Level Neural Machine Translation

    Full text link
    Document-level machine translation focuses on the translation of entire documents from a source to a target language. It is widely regarded as a challenging task since the translation of the individual sentences in the document needs to retain aspects of the discourse at document level. However, document-level translation models are usually not trained to explicitly ensure discourse quality. Therefore, in this paper we propose a training approach that explicitly optimizes two established discourse metrics, lexical cohesion (LC) and coherence (COH), by using a reinforcement learning objective. Experiments over four different language pairs and three translation domains have shown that our training approach has been able to achieve more cohesive and coherent document translations than other competitive approaches, yet without compromising the faithfulness to the reference translation. In the case of the Zh-En language pair, our method has achieved an improvement of 2.46 percentage points (pp) in LC and 1.17 pp in COH over the runner-up, while at the same time improving 0.63 pp in BLEU score and 0.47 pp in F_BERT.Comment: Accepted at COLING 202

    The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.

    Get PDF
    The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome

    Transductive learning for statistical machine translation

    No full text
    Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text in the target language. In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations on the French-English EuroParl data set and on data from the NIST Chinese-English large data track. We show a significant improvement in translation quality on both tasks.Les syst\ue8mes de traduction automatique statistique sont habituellement entra\ueen\ue9s avec des grandes quantit\ue9s de textes bilingues et de textes monolingues en langue cible. Dans cet article, nous allons examiner l'utilisation de m\ue9thodes transductives semi-supervis\ue9es visant l'utilisation efficace de donn\ue9es monolingues dans la langue source, ayant pour objectif l'am\ue9lioration de la qualit\ue9 de la traduction. Nous proposons plusieurs algorithmes ayant ce but et nous pr\ue9sentons les forces et faiblesses de chacun d'entre eux. Nous pr\ue9sentons aussi des \ue9valuations exp\ue9rimentales d\ue9taill\ue9es de l'ensemble de donn\ue9es fran\ue7ais-anglais EuroParl et de donn\ue9es de la grande piste de donn\ue9es chinoise-anglaise du NIST. Nous montrons qu'il y a eu une am\ue9lioration importante de la qualit\ue9 de la traduction avec ces deux t\ue2ches.NRC publication: Ye

    Multi-objective semi-supervised clustering to identify health service patterns for injured patients

    No full text
    Purpose: This study develops a pattern recognition method that identifies patterns based on their similarity and their association with the outcome of interest. The practical purpose of developing this pattern recognition method is to group patients, who are injured in transport accidents, in the early stages post-injury. This grouping is based on distinctive patterns in health service use within the first week post-injury. The groups also provide predictive information towards the total cost of medication process. As a result, the group of patients who have undesirable outcomes are identified as early as possible based health service use patterns. Methods: We propose a multi-objective optimization model to group patients. An objective function is the cost function of k-medians clustering to recognize the similar patterns. Another objective function is the cross-validated root-mean-square error to examine the association with the total cost. The best grouping is obtained by minimizing both objective functions. As a result, the multi-objective optimization model is a semi-supervised clustering which learns health service use patterns in both unsupervised and supervised ways. We also introduce an evolutionary computation approach includes stochastic gradient descent and Pareto optimal solutions to find the optimal solution. In addition, we use the decision tree method to reproduce the optimal groups using an interpretable classification model. Results: The results show that the proposed multi-objective semi-supervised clustering identifies distinct groups of health service uses and contributes to predict the total cost. The performance of the multi-objective model has been examined using two metrics such as the average silhouette width and the cross-validation error. The examination proves that the multi-objective model outperforms the single-objective ones. In addition, the interpretable classification model shows that imaging and therapeutic services are critical services in the first-week post-injury to group injured patients. Conclusion: The proposed multi-objective semi-supervised clustering finds the optimal clusters that not only are well-separated from each other but can provide informative insights regarding the outcome of interest. It also overcomes two drawback of clustering methods such as being sensitive to the initial cluster centers and need for specifying the number of clusters

    Early Identification of Undesirable Outcomes for Transport Accident Injured Patients Using Semi-Supervised Clustering.

    Get PDF
    Identifying those patient groups, who have unwanted outcomes, in the early stages is crucial to providing the most appropriate level of care. In this study, we intend to find distinctive patterns in health service use (HSU) of transport accident injured patients within the first week post-injury. Aiming those patterns that are associated with the outcome of interest. To recognize these patterns, we propose a multi-objective optimization model that minimizes the k-medians cost function and regression error simultaneously. Thus, we use a semi-supervised clustering approach to identify patient groups based on HSU patterns and their association with total cost. To solve the optimization problem, we introduce an evolutionary algorithm using stochastic gradient descent and Pareto optimal solutions. As a result, we find the best optimal clusters by minimizing both objective functions. The results show that the proposed semi-supervised approach identifies distinct groups of HSUs and contributes to predict total cost. Also, the experiments prove the performance of the multi-objective approach in comparison with single- objective approaches

    Text mining electronic hospital records to automatically classify admissions against disease: Measuring the impact of linking data sources

    No full text
    Objective: Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance. Methods: Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness. Results: Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. Conclusion: Overall, linking data sources significantly improved classification performance for all the diseases examined

    PROSPERous: High-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy

    No full text
    Summary: Proteases are enzymes that specifically cleave the peptide backbone of their target proteins. As an important type of irreversible post-translational modification, protein cleavage underlies many key physiological processes. When dysregulated, proteases’ actions are associated with numerous diseases. Many proteases are highly specific, cleaving only those target substrates that present certain particular amino acid sequence patterns. Therefore, tools that successfully identify potential target substrates for proteases may also identify previously unknown, physiologically relevant cleavage sites, thus providing insights into biological processes and guiding hypothesis-driven experiments aimed at verifying protease–substrate interaction. In this work, we present PROSPERous, a tool for rapid in silico prediction of protease-specific cleavage sites in substrate sequences. Our tool is based on logistic regression models and uses different scoring functions and their pairwise combinations to subsequently predict potential cleavage sites. PROSPERous represents a state-of-the-art tool that enables fast, accurate and high-throughput prediction of substrate cleavage sites for 90 proteases. Availability and implementation: http://prosperous.erc.monash.edu/Jiangning Song, Fuyi Li, André Leier, Tatiana T. Marquez-Lago, Tatsuya Akutsu, Gholamreza Haffari, Kuo-Chen Chou, Geoffrey I. Webb, and Robert N. Pik

    DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer

    No full text
    Simultaneous interrogation of tumor genomes and transcriptomes is underway in unprecedented global efforts. Yet, despite the essential need to separate driver mutations modulating gene expression networks from transcriptionally inert passenger mutations, robust computational methods to ascertain the impact of individual mutations on transcriptional networks are underdeveloped. We introduce a novel computational framework, DriverNet, to identify likely driver mutations by virtue of their effect on mRNA expression networks. Application to four cancer datasets reveals the prevalence of rare candidate driver mutations associated with disrupted transcriptional networks and a simultaneous modulation of oncogenic and metabolic networks, induced by copy number co-modification of adjacent oncogenic and metabolic drivers. DriverNet is available on Bioconductor or at http://compbio.bccrc.ca/software/drivernet/ .Computer Science, Department ofMedicine, Faculty ofPathology and Laboratory Medicine, Department ofScience, Faculty ofOther UBCReviewedFacult
    corecore