78 research outputs found
Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG
is a multi-relational graph that has proven valuable for many tasks including
question answering and semantic search. In this paper, we present GENI, a
method for tackling the problem of estimating node importance in KGs, which
enables several downstream applications such as item recommendation and
resource allocation. While a number of approaches have been developed to
address this problem for general graphs, they do not fully utilize information
available in KGs, or lack flexibility needed to model complex relationship
between entities and their importance. To address these limitations, we explore
supervised machine learning algorithms. In particular, building upon recent
advancement of graph neural networks (GNNs), we develop GENI, a GNN-based
method designed to deal with distinctive challenges involved with predicting
node importance in KGs. Our method performs an aggregation of importance scores
instead of aggregating node embeddings via predicate-aware attention mechanism
and flexible centrality adjustment. In our evaluation of GENI and existing
methods on predicting node importance in real-world KGs with different
characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.Comment: KDD 2019 Research Track. 11 pages. Changelog: Type 3 font removed,
and minor updates made in the Appendix (v2
Unsupervised Model Selection for Time-series Anomaly Detection
Anomaly detection in time-series has a wide range of practical applications.
While numerous anomaly detection methods have been proposed in the literature,
a recent survey concluded that no single method is the most accurate across
various datasets. To make matters worse, anomaly labels are scarce and rarely
available in practice. The practical problem of selecting the most accurate
model for a given dataset without labels has received little attention in the
literature. This paper answers this question i.e. Given an unlabeled dataset
and a set of candidate anomaly detectors, how can we select the most accurate
model? To this end, we identify three classes of surrogate (unsupervised)
metrics, namely, prediction error, model centrality, and performance on
injected synthetic anomalies, and show that some metrics are highly correlated
with standard supervised anomaly detection performance metrics such as the
score, but to varying degrees. We formulate metric combination with
multiple imperfect surrogate metrics as a robust rank aggregation problem. We
then provide theoretical justification behind the proposed approach.
Large-scale experiments on multiple real-world datasets demonstrate that our
proposed unsupervised approach is as effective as selecting the most accurate
model based on partially labeled data.Comment: Accepted at International Conference on Learning Representations
(ICLR) 2023 with a notable-top-25% recommendation. Reviewer, AC and author
discussion available at https://openreview.net/forum?id=gOZ_pKANaP
MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals
Given multiple input signals, how can we infer node importance in a knowledge
graph (KG)? Node importance estimation is a crucial and challenging task that
can benefit a lot of applications including recommendation, search, and query
disambiguation. A key challenge towards this goal is how to effectively use
input from different sources. On the one hand, a KG is a rich source of
information, with multiple types of nodes and edges. On the other hand, there
are external input signals, such as the number of votes or pageviews, which can
directly tell us about the importance of entities in a KG. While several
methods have been developed to tackle this problem, their use of these external
signals has been limited as they are not designed to consider multiple signals
simultaneously. In this paper, we develop an end-to-end model MultiImport,
which infers latent node importance from multiple, potentially overlapping,
input signals. MultiImport is a latent variable model that captures the
relation between node importance and input signals, and effectively learns from
multiple signals with potential conflicts. Also, MultiImport provides an
effective estimator based on attentive graph neural networks. We ran
experiments on real-world KGs to show that MultiImport handles several
challenges involved with inferring node importance from multiple input signals,
and consistently outperforms existing methods, achieving up to 23.7% higher
NDCG@100 than the state-of-the-art method.Comment: KDD 2020 Research Track. 10 page
How to Securely Release Unverified Plaintext in Authenticated Encryption
Scenarios in which authenticated encryption schemes output decrypted plaintext before successful verification raise many security issues. These situations are sometimes unavoidable in practice, such as when devices have insufficient memory to store an entire plaintext, or when a decrypted plaintext needs early processing due to real-time requirements. We introduce the first formalization of the releasing unverified plaintext (RUP) setting. To achieve privacy, we propose using plaintext awareness (PA) along with IND-CPA. An authenticated encryption scheme is PA if it has a plaintext extractor, which tries to fool adversaries by mimicking the decryption oracle without the secret key. Releasing unverified plaintext then becomes harmless as it is infeasible to distinguish the decryption oracle from the plaintext extractor. We introduce two notions of plaintext awareness in the symmetric-key setting, PA1 and PA2, and show that they expose a new layer of security between IND-CPA and IND-CCA. To achieve integrity of ciphertexts, INT-CTXT in the RUP setting is required, which we refer to as INT-RUP. These new security notions are used to make a classification of symmetric-key schemes in the RUP setting. Furthermore, we re-analyze existing authenticated encryption schemes, and provide solutions to fix insecure schemes
Regulatory T Cells Suppress Effector T Cell Proliferation by Limiting Division Destiny
Understanding how the strength of an effector T cell response is regulated is a fundamental problem in immunology with implications for immunity to pathogens, autoimmunity, and immunotherapy. The initial magnitude of the T cell response is determined by the sum of independent signals from antigen, co-stimulation and cytokines. By applying quantitative methods, the contribution of each signal to the number of divisions T cells undergo (division destiny) can be measured, and the resultant exponential increase in response magnitude accurately calculated. CD4+CD25+Foxp3+ regulatory T cells suppress self-reactive T cell responses and limit pathogen-directed immune responses before bystander damage occurs. Using a quantitative modeling framework to measure T cell signal integration and response, we show that Tregs modulate division destiny, rather than directly increasing the rate of death or delaying interdivision times. The quantitative effect of Tregs could be mimicked by modulating the availability of stimulatory co-stimuli and cytokines or through the addition of inhibitory signals. Thus, our analysis illustrates the primary effect of Tregs on the magnitude of effector T cell responses is mediated by modifying division destiny of responding cell populations
Cyton2:A Model of Immune Cell Population Dynamics That Includes Familial Instructional Inheritance
Lymphocytes are the central actors in adaptive immune responses. When challenged with antigen, a small number of B and T cells have a cognate receptor capable of recognising and responding to the insult. These cells proliferate, building an exponentially growing, differentiating clone army to fight off the threat, before ceasing to divide and dying over a period of weeks, leaving in their wake memory cells that are primed to rapidly respond to any repeated infection. Due to the non-linearity of lymphocyte population dynamics, mathematical models are needed to interrogate data from experimental studies. Due to lack of evidence to the contrary and appealing to arguments based on Occam’s Razor, in these models newly born progeny are typically assumed to behave independently of their predecessors. Recent experimental studies, however, challenge that assumption, making clear that there is substantial inheritance of timed fate changes from each cell by its offspring, calling for a revision to the existing mathematical modelling paradigms used for information extraction. By assessing long-term live-cell imaging of stimulated murine B and T cells in vitro, we distilled the key phenomena of these within-family inheritances and used them to develop a new mathematical model, Cyton2, that encapsulates them. We establish the model’s consistency with these newly observed fine-grained features. Two natural concerns for any model that includes familial correlations would be that it is overparameterised or computationally inefficient in data fitting, but neither is the case for Cyton2. We demonstrate Cyton2’s utility by challenging it with high-throughput flow cytometry data, which confirms the robustness of its parameter estimation as well as its ability to extract biological meaning from complex mixed stimulation experiments. Cyton2, therefore, offers an alternate mathematical model, one that is, more aligned to experimental observation, for drawing inferences on lymphocyte population dynamics
E670G PCSK9 polymorphism in HeFH & CAD with diabetes: is the bridge to personalized therapy within reach?
ObjectiveTo assess the distribution of PCSK9 E670G genetic polymorphism and PCSK9 levels in patients with Coronary Artery Disease (CAD) and Heterozygous Familial Hypercholesterolemia (HeFH), based on the presence of type 2 Diabetes Mellitus (T2DM).MethodsThe study included 201 patients with chronic CAD, including those with HeFH (n=57, group I) and without it (n=144, group II). DLCN was used to diagnose HeFH. The PCSK9 E670G (rs505151) polymorphism was genetically typed using the PCR-RFLP procedure. In both the patient and control groups, the genotype frequency matched the Hardy-Weinberg equilibrium distribution (P>0.05).ResultsThere were twice more G alleles in group I (13, 11.4%) than in group II (17, 6.0%), and thrice more (1, 3.0%) than in the healthy control group; nevertheless, these differences weren’t statistically significant. Simultaneously, PCSK9 levels were higher in HeFH patients (P<0.05) compared to non-HeFH patients not taking statins (n=63). T2DM was equally represented in groups I and II (31.6% vs. 33.3%). But carriers of AG+GG genotypes in group I had a higher chance of having a history of T2DM (RR 4.18; 95%CI 2.19-8.0; P<0.001), myocardial infarction (RR 1.79; 95%CI 1.18-2.73; P<0.05), and revascularization (RR 12.6; 95%CI 4.06-38.8; P<0.01), than AA carriers. T2DM was also more common among G allele carriers (RR 1.85; 95% CI 1.11-3.06; P<0.05) in patients with non-HeFH.ConclusionT2DM in patients with CAD, both with HeFH and non-HeFH, in the Uzbek population was significantly more often associated with the presence of the “gain-of-function” G allele of the PCSK9 E670G genetic polymorphism
- …