554 research outputs found
Distribution-based aggregation for relational learning with identifier attributes
Identifier attributes—very high-dimensional categorical attributes such as particular
product ids or people’s names—rarely are incorporated in statistical modeling. However,
they can play an important role in relational modeling: it may be informative to have communicated
with a particular set of people or to have purchased a particular set of products. A
key limitation of existing relational modeling techniques is how they aggregate bags (multisets)
of values from related entities. The aggregations used by existing methods are simple
summaries of the distributions of features of related entities: e.g., MEAN, MODE, SUM,
or COUNT. This paper’s main contribution is the introduction of aggregation operators that
capture more information about the value distributions, by storing meta-data about value
distributions and referencing this meta-data when aggregating—for example by computing
class-conditional distributional distances. Such aggregations are particularly important for
aggregating values from high-dimensional categorical attributes, for which the simple aggregates
provide little information. In the first half of the paper we provide general guidelines
for designing aggregation operators, introduce the new aggregators in the context of the
relational learning system ACORA (Automated Construction of Relational Attributes), and
provide theoretical justification.We also conjecture special properties of identifier attributes,
e.g., they proxy for unobserved attributes and for information deeper in the relationship
network. In the second half of the paper we provide extensive empirical evidence that the
distribution-based aggregators indeed do facilitate modeling with high-dimensional categorical
attributes, and in support of the aforementioned conjectures.NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
ACORA: Distribution-Based Aggregation for Relational Learning from Identifier Attributes
Feature construction through aggregation plays an essential role in modeling relational
domains with one-to-many relationships between tables. One-to-many relationships
lead to bags (multisets) of related entities, from which predictive information
must be captured. This paper focuses on aggregation from categorical attributes
that can take many values (e.g., object identifiers). We present a novel aggregation
method as part of a relational learning system ACORA, that combines the use of
vector distance and meta-data about the class-conditional distributions of attribute
values. We provide a theoretical foundation for this approach deriving a "relational
fixed-effect" model within a Bayesian framework, and discuss the implications of
identifier aggregation on the expressive power of the induced model. One advantage
of using identifier attributes is the circumvention of limitations caused either by
missing/unobserved object properties or by independence assumptions. Finally, we
show empirically that the novel aggregators can generalize in the presence of identi-
fier (and other high-dimensional) attributes, and also explore the limitations of the
applicability of the methods.Information Systems Working Papers Serie
Graph Few-shot Learning via Knowledge Transfer
Towards the challenging problem of semi-supervised node classification, there
have been extensive studies. As a frontier, Graph Neural Networks (GNNs) have
aroused great interest recently, which update the representation of each node
by aggregating information of its neighbors. However, most GNNs have shallow
layers with a limited receptive field and may not achieve satisfactory
performance especially when the number of labeled nodes is quite small. To
address this challenge, we innovatively propose a graph few-shot learning (GFL)
algorithm that incorporates prior knowledge learned from auxiliary graphs to
improve classification accuracy on the target graph. Specifically, a
transferable metric space characterized by a node embedding and a
graph-specific prototype embedding function is shared between auxiliary graphs
and the target, facilitating the transfer of structural knowledge. Extensive
experiments and ablation studies on four real-world graph datasets demonstrate
the effectiveness of our proposed model.Comment: Full paper (with Appendix) of AAAI 202
Aggregation-Based Feature Invention and Relational
Due to interest in social and economic networks, relational modeling is
attracting increasing attention. The field of relational data
mining/learning, which traditionally was dominated by logic-based
approaches, has recently been extended by adapting learning methods such
as naive Bayes, Baysian networks and decision trees to relational tasks.
One aspect inherent to all methods of model induction from relational
data is the construction of features through the aggregation of sets.
The theoretical part of this work (1) presents an ontology of relational
concepts of increasing complexity, (2) derives classes of aggregation
operators that are needed to learn these concepts, and (3) classifies
relational domains based on relational schema characteristics such as
cardinality. We then present a new class of aggregation functions, ones
that are particularly well suited for relational classification and
class probability estimation. The empirical part of this paper
demonstrates on real domain the effects on the system performance of
different aggregation methods on different relational concepts. The
results suggest that more complex aggregation methods can significantly
increase generalization performance and that, in particular,
task-specific aggregation can simplify relational prediction tasks into
well-understood propositional learning problems.Information Systems Working Papers Serie
- …