31 research outputs found
Generating Rules to Filter Candidate Triples for their Correctness Checking by Knowledge Graph Completion Techniques
Knowledge Graphs (KGs) contain large amounts of structured information.
Due to their inherent incompleteness, a process known
as KG completion is often carried out to find the missing triples in a
KG, usually by training a fact checking model that is able to discern
between correct and incorrect knowledge. After the fact checking
model has been trained and evaluated, it has to be applied to a set
of candidate triples, and those that are considered correct are added
to the KG as new knowledge. However, this process needs a set
of candidate triples of a reasonable size that represents possible
new knowledge, in order to be evaluated by the fact checking task
and, if considered to be correct, added to the KG, enriching it. Current
approaches for selecting candidate triples for their correctness
checking either use the full set possible missing candidate triples
(and thus provide no filtering) or apply very basic rules to filter
out unlikely candidates, which may have a negative effect on the
completion performance as very few candidate triples are filtered
out. In this paper we present CHAI, a method for producing more
complex rules that are able to filter candidate triples by combining
a set of criteria to optimize a fitness function. Our experiments
show that CHAI is able to generate rules that, when applied, yield
smaller candidate sets than similar proposals while still including
promising candidate triples.Ministerio de Economía y Competitividad TIN2016-75394-
Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs
Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a
list of non-discrete attributes for each entity. Intuitively, these attributes
such as height, price or population count are able to richly characterize
entities in knowledge graphs. This additional source of information may help to
alleviate the inherent sparsity and incompleteness problem that are prevalent
in knowledge graphs. Unfortunately, many state-of-the-art relational learning
models ignore this information due to the challenging nature of dealing with
non-discrete data types in the inherently binary-natured knowledge graphs. In
this paper, we propose a novel multi-task neural network approach for both
encoding and prediction of non-discrete attribute information in a relational
setting. Specifically, we train a neural network for triplet prediction along
with a separate network for attribute value regression. Via multi-task
learning, we are able to learn representations of entities, relations and
attributes that encode information about both tasks. Moreover, such attributes
are not only central to many predictive tasks as an information source but also
as a prediction target. Therefore, models that are able to encode, incorporate
and predict such information in a relational learning context are highly
attractive as well. We show that our approach outperforms many state-of-the-art
methods for the tasks of relational triplet classification and attribute value
prediction.Comment: Accepted at CIKM 201
Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection
Event detection (ED), a sub-task of event extraction, involves identifying
triggers and categorizing event mentions. Existing methods primarily rely upon
supervised learning and require large-scale labeled event datasets which are
unfortunately not readily available in many real-life applications. In this
paper, we consider and reformulate the ED task with limited labeled data as a
Few-Shot Learning problem. We propose a Dynamic-Memory-Based Prototypical
Network (DMB-PN), which exploits Dynamic Memory Network (DMN) to not only learn
better prototypes for event types, but also produce more robust sentence
encodings for event mentions. Differing from vanilla prototypical networks
simply computing event prototypes by averaging, which only consume event
mentions once, our model is more robust and is capable of distilling contextual
information from event mentions for multiple times due to the multi-hop
mechanism of DMNs. The experiments show that DMB-PN not only deals with sample
scarcity better than a series of baseline models but also performs more
robustly when the variety of event types is relatively large and the instance
quantity is extremely small.Comment: Accepted by WSDM 202
DMLR: Data-centric Machine Learning Research -- Past, Present and Future
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and
meetings prior, in this report we outline the relevance of community engagement
and infrastructure development for the creation of next-generation public
datasets that will advance machine learning science. We chart a path forward as
a collective effort to sustain the creation and maintenance of these datasets
and methods towards positive scientific, societal and business impact.Comment: This editorial report accompanies the inaugural Data-centric Machine
Learning Research (DMLR) Workshop that took place at ICML 2023
https://dmlr.ai
Introducing v0.5 of the AI Safety Benchmark from MLCommons
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark
Introducing v0.5 of the AI Safety Benchmark from MLCommons
This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark
Linear Feature Extractors Based on Mutual Information
This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only simple low order dependencies. As evidenced by several simulations on high dimensional data sets, the proposed techniques provide superior feature extraction and better dimensionality reduction while having similar computational requirements. 1. Introduction The capabilities of a classifier are ultimately limited by the quality of the features in each input vector. In particular, when the measurement space is highdimensional but the number of samples is limited, one is faced with the "curse of dimensionality" problem during training [3]. Feature extraction is often used to alleviate this problem. Although linear feature extractors are ultimately less flexible than the more general non-linear ..