78 research outputs found

    Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

    Full text link
    How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.Comment: KDD 2019 Research Track. 11 pages. Changelog: Type 3 font removed, and minor updates made in the Appendix (v2

    Unsupervised Model Selection for Time-series Anomaly Detection

    Full text link
    Anomaly detection in time-series has a wide range of practical applications. While numerous anomaly detection methods have been proposed in the literature, a recent survey concluded that no single method is the most accurate across various datasets. To make matters worse, anomaly labels are scarce and rarely available in practice. The practical problem of selecting the most accurate model for a given dataset without labels has received little attention in the literature. This paper answers this question i.e. Given an unlabeled dataset and a set of candidate anomaly detectors, how can we select the most accurate model? To this end, we identify three classes of surrogate (unsupervised) metrics, namely, prediction error, model centrality, and performance on injected synthetic anomalies, and show that some metrics are highly correlated with standard supervised anomaly detection performance metrics such as the F1F_1 score, but to varying degrees. We formulate metric combination with multiple imperfect surrogate metrics as a robust rank aggregation problem. We then provide theoretical justification behind the proposed approach. Large-scale experiments on multiple real-world datasets demonstrate that our proposed unsupervised approach is as effective as selecting the most accurate model based on partially labeled data.Comment: Accepted at International Conference on Learning Representations (ICLR) 2023 with a notable-top-25% recommendation. Reviewer, AC and author discussion available at https://openreview.net/forum?id=gOZ_pKANaP

    MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals

    Full text link
    Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node importance estimation is a crucial and challenging task that can benefit a lot of applications including recommendation, search, and query disambiguation. A key challenge towards this goal is how to effectively use input from different sources. On the one hand, a KG is a rich source of information, with multiple types of nodes and edges. On the other hand, there are external input signals, such as the number of votes or pageviews, which can directly tell us about the importance of entities in a KG. While several methods have been developed to tackle this problem, their use of these external signals has been limited as they are not designed to consider multiple signals simultaneously. In this paper, we develop an end-to-end model MultiImport, which infers latent node importance from multiple, potentially overlapping, input signals. MultiImport is a latent variable model that captures the relation between node importance and input signals, and effectively learns from multiple signals with potential conflicts. Also, MultiImport provides an effective estimator based on attentive graph neural networks. We ran experiments on real-world KGs to show that MultiImport handles several challenges involved with inferring node importance from multiple input signals, and consistently outperforms existing methods, achieving up to 23.7% higher NDCG@100 than the state-of-the-art method.Comment: KDD 2020 Research Track. 10 page

    How to Securely Release Unverified Plaintext in Authenticated Encryption

    Get PDF
    Scenarios in which authenticated encryption schemes output decrypted plaintext before successful verification raise many security issues. These situations are sometimes unavoidable in practice, such as when devices have insufficient memory to store an entire plaintext, or when a decrypted plaintext needs early processing due to real-time requirements. We introduce the first formalization of the releasing unverified plaintext (RUP) setting. To achieve privacy, we propose using plaintext awareness (PA) along with IND-CPA. An authenticated encryption scheme is PA if it has a plaintext extractor, which tries to fool adversaries by mimicking the decryption oracle without the secret key. Releasing unverified plaintext then becomes harmless as it is infeasible to distinguish the decryption oracle from the plaintext extractor. We introduce two notions of plaintext awareness in the symmetric-key setting, PA1 and PA2, and show that they expose a new layer of security between IND-CPA and IND-CCA. To achieve integrity of ciphertexts, INT-CTXT in the RUP setting is required, which we refer to as INT-RUP. These new security notions are used to make a classification of symmetric-key schemes in the RUP setting. Furthermore, we re-analyze existing authenticated encryption schemes, and provide solutions to fix insecure schemes

    Regulatory T Cells Suppress Effector T Cell Proliferation by Limiting Division Destiny

    Get PDF
    Understanding how the strength of an effector T cell response is regulated is a fundamental problem in immunology with implications for immunity to pathogens, autoimmunity, and immunotherapy. The initial magnitude of the T cell response is determined by the sum of independent signals from antigen, co-stimulation and cytokines. By applying quantitative methods, the contribution of each signal to the number of divisions T cells undergo (division destiny) can be measured, and the resultant exponential increase in response magnitude accurately calculated. CD4+CD25+Foxp3+ regulatory T cells suppress self-reactive T cell responses and limit pathogen-directed immune responses before bystander damage occurs. Using a quantitative modeling framework to measure T cell signal integration and response, we show that Tregs modulate division destiny, rather than directly increasing the rate of death or delaying interdivision times. The quantitative effect of Tregs could be mimicked by modulating the availability of stimulatory co-stimuli and cytokines or through the addition of inhibitory signals. Thus, our analysis illustrates the primary effect of Tregs on the magnitude of effector T cell responses is mediated by modifying division destiny of responding cell populations

    Cyton2:A Model of Immune Cell Population Dynamics That Includes Familial Instructional Inheritance

    Get PDF
    Lymphocytes are the central actors in adaptive immune responses. When challenged with antigen, a small number of B and T cells have a cognate receptor capable of recognising and responding to the insult. These cells proliferate, building an exponentially growing, differentiating clone army to fight off the threat, before ceasing to divide and dying over a period of weeks, leaving in their wake memory cells that are primed to rapidly respond to any repeated infection. Due to the non-linearity of lymphocyte population dynamics, mathematical models are needed to interrogate data from experimental studies. Due to lack of evidence to the contrary and appealing to arguments based on Occam’s Razor, in these models newly born progeny are typically assumed to behave independently of their predecessors. Recent experimental studies, however, challenge that assumption, making clear that there is substantial inheritance of timed fate changes from each cell by its offspring, calling for a revision to the existing mathematical modelling paradigms used for information extraction. By assessing long-term live-cell imaging of stimulated murine B and T cells in vitro, we distilled the key phenomena of these within-family inheritances and used them to develop a new mathematical model, Cyton2, that encapsulates them. We establish the model’s consistency with these newly observed fine-grained features. Two natural concerns for any model that includes familial correlations would be that it is overparameterised or computationally inefficient in data fitting, but neither is the case for Cyton2. We demonstrate Cyton2’s utility by challenging it with high-throughput flow cytometry data, which confirms the robustness of its parameter estimation as well as its ability to extract biological meaning from complex mixed stimulation experiments. Cyton2, therefore, offers an alternate mathematical model, one that is, more aligned to experimental observation, for drawing inferences on lymphocyte population dynamics

    E670G PCSK9 polymorphism in HeFH & CAD with diabetes: is the bridge to personalized therapy within reach?

    Get PDF
    ObjectiveTo assess the distribution of PCSK9 E670G genetic polymorphism and PCSK9 levels in patients with Coronary Artery Disease (CAD) and Heterozygous Familial Hypercholesterolemia (HeFH), based on the presence of type 2 Diabetes Mellitus (T2DM).MethodsThe study included 201 patients with chronic CAD, including those with HeFH (n=57, group I) and without it (n=144, group II). DLCN was used to diagnose HeFH. The PCSK9 E670G (rs505151) polymorphism was genetically typed using the PCR-RFLP procedure. In both the patient and control groups, the genotype frequency matched the Hardy-Weinberg equilibrium distribution (P>0.05).ResultsThere were twice more G alleles in group I (13, 11.4%) than in group II (17, 6.0%), and thrice more (1, 3.0%) than in the healthy control group; nevertheless, these differences weren’t statistically significant. Simultaneously, PCSK9 levels were higher in HeFH patients (P<0.05) compared to non-HeFH patients not taking statins (n=63). T2DM was equally represented in groups I and II (31.6% vs. 33.3%). But carriers of AG+GG genotypes in group I had a higher chance of having a history of T2DM (RR 4.18; 95%CI 2.19-8.0; P<0.001), myocardial infarction (RR 1.79; 95%CI 1.18-2.73; P<0.05), and revascularization (RR 12.6; 95%CI 4.06-38.8; P<0.01), than AA carriers. T2DM was also more common among G allele carriers (RR 1.85; 95% CI 1.11-3.06; P<0.05) in patients with non-HeFH.ConclusionT2DM in patients with CAD, both with HeFH and non-HeFH, in the Uzbek population was significantly more often associated with the presence of the “gain-of-function” G allele of the PCSK9 E670G genetic polymorphism
    • …
    corecore