840 research outputs found
A Survey on Interpretable Cross-modal Reasoning
In recent years, cross-modal reasoning (CMR), the process of understanding
and reasoning across different modalities, has emerged as a pivotal area with
applications spanning from multimedia analysis to healthcare diagnostics. As
the deployment of AI systems becomes more ubiquitous, the demand for
transparency and comprehensibility in these systems' decision-making processes
has intensified. This survey delves into the realm of interpretable cross-modal
reasoning (I-CMR), where the objective is not only to achieve high predictive
performance but also to provide human-understandable explanations for the
results. This survey presents a comprehensive overview of the typical methods
with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the
existing CMR datasets with annotations for explanations. Finally, this survey
summarizes the challenges for I-CMR and discusses potential future directions.
In conclusion, this survey aims to catalyze the progress of this emerging
research area by providing researchers with a panoramic and comprehensive
perspective, illuminating the state of the art and discerning the
opportunities
A survey on knowledge-enhanced multimodal learning
Multimodal learning has been a field of increasing interest, aiming to
combine various modalities in a single joint representation. Especially in the
area of visiolinguistic (VL) learning multiple models and techniques have been
developed, targeting a variety of tasks that involve images and text. VL models
have reached unprecedented performances by extending the idea of Transformers,
so that both modalities can learn from each other. Massive pre-training
procedures enable VL models to acquire a certain level of real-world
understanding, although many gaps can be identified: the limited comprehension
of commonsense, factual, temporal and other everyday knowledge aspects
questions the extendability of VL tasks. Knowledge graphs and other knowledge
sources can fill those gaps by explicitly providing missing information,
unlocking novel capabilities of VL models. In the same time, knowledge graphs
enhance explainability, fairness and validity of decision making, issues of
outermost importance for such complex implementations. The current survey aims
to unify the fields of VL representation learning and knowledge graphs, and
provides a taxonomy and analysis of knowledge-enhanced VL models
Relation Classification with Limited Supervision
Large reams of unstructured data, for instance in form textual document collections containing entities and relations, exist in many domains. The process of deriving valuable domain insights and intelligence from such documents collections usually involves the extraction of information such as the relations between the entities in such collections. Relation classification is the task of detecting relations between entities. Supervised machine learning models, which have become the tool of choice for relation classification, require substantial quantities of annotated data for each relation in order to perform optimally. For many domains, such quantities of annotated data for relations may not be readily available, and manually curating such annotations may not be practical due to time and cost constraints.
In this work, we develop both model-specific and model-agnostic approaches for relation classification with limited supervision. We start by proposing an approach for learning embeddings for contextual surface patterns, which are the set of surface patterns associated with entity pairs across a text corpus, to provide additional supervision signals for relation classification with limited supervision. We find that this approach improves classification performance on relations with limited supervision instances. However, this initial approach assumes the availability of at least one annotated instance per relation during training. In order to address this limitation, we propose an approach which formulates the task of relation classification as that of textual entailment. This reformulation allows us to use the textual descriptions of relations to classify their instances. It also allows us to utilize existing textual entailment datasets and models to classify relations with zero supervision instances.
The two methods proposed previously rely on the use of specific model architectures for relation classification. Since a wide variety of models have been proposed for relation classification in the literature, a more general approach is thus desirable. We subsequently propose our first model-agnostic meta-learning algorithm for relation classification with limited supervision. This algorithm is applicable to any gradient-optimized relation classification model. We show that the proposed approach improves the predictive performance of two existing relation classification models when supervision for relations is limited. Next, because all the approaches we have proposed so far assume the availability of all supervision needed for classifying relations prior to model training, they are unable to handle the case when new supervision for relations becomes available after training. Such new supervision may need to be incorporated into the model to enable it classify new relations or to improve its performance on existing relations. Our last approach addresses this short-coming. We propose a model-agnostic algorithm which enables relation classification models to learn continually from new supervision as it becomes available, while doing so in a data-efficient manner and without forgetting knowledge of previous relations
Semantic knowledge integration for learning from semantically imprecise data
Low availability of labeled training data often poses a fundamental limit to the accuracy of computer vision applications using machine learning methods. While these methods are improved continuously, e.g., through better neural network architectures, there cannot be a single methodical change that increases the accuracy on all possible tasks. This statement, known as the no free lunch theorem, suggests that we should consider aspects of machine learning other than learning algorithms for opportunities to escape the limits set by the available training data. In this thesis, we focus on two main aspects, namely the nature of the training data, where we introduce structure into the label set using concept hierarchies, and the learning paradigm, which we change in accordance with requirements of real-world applications as opposed to more academic setups.Concept hierarchies represent semantic relations, which are sets of statements such as "a bird is an animal." We propose a hierarchical classifier to integrate this domain knowledge in a pre-existing task, thereby increasing the information the classifier has access to. While the hierarchy's leaf nodes correspond to the original set of classes, the inner nodes are "new" concepts that do not exist in the original training data. However, we pose that such "imprecise" labels are valuable and should occur naturally, e.g., as an annotator's way of expressing their uncertainty. Furthermore, the increased number of concepts leads to more possible search terms when assembling a web-crawled dataset or using an image search. We propose CHILLAX, a method that learns from semantically imprecise training data, while still offering precise predictions to integrate seamlessly into a pre-existing application
Combining Representation Learning with Logic for Language Processing
The current state-of-the-art in many natural language processing and
automated knowledge base completion tasks is held by representation learning
methods which learn distributed vector representations of symbols via
gradient-based optimization. They require little or no hand-crafted features,
thus avoiding the need for most preprocessing steps and task-specific
assumptions. However, in many cases representation learning requires a large
amount of annotated training data to generalize well to unseen data. Such
labeled training data is provided by human annotators who often use formal
logic as the language for specifying annotations. This thesis investigates
different combinations of representation learning methods with logic for
reducing the need for annotated training data, and for improving
generalization.Comment: PhD Thesis, University College London, Submitted and accepted in 201
A study of spatial data models and their application to selecting information from pictorial databases
People have always used visual techniques to locate information in the space
surrounding them. However with the advent of powerful computer systems and
user-friendly interfaces it has become possible to extend such techniques to stored
pictorial information. Pictorial database systems have in the past primarily used
mathematical or textual search techniques to locate specific pictures contained
within such databases. However these techniques have largely relied upon complex
combinations of numeric and textual queries in order to find the required
pictures. Such techniques restrict users of pictorial databases to expressing what is
in essence a visual query in a numeric or character based form. What is required
is the ability to express such queries in a form that more closely matches the user's
visual memory or perception of the picture required. It is suggested in this thesis
that spatial techniques of search are important and that two of the most important
attributes of a picture are the spatial positions and the spatial relationships of
objects contained within such pictures. It is further suggested that a database
management system which allows users to indicate the nature of their query by
visually placing iconic representations of objects on an interface in spatially
appropriate positions, is a feasible method by which pictures might be found from
a pictorial database. This thesis undertakes a detailed study of spatial techniques
using a combination of historical evidence, psychological conclusions and practical
examples to demonstrate that the spatial metaphor is an important concept and that
pictures can be readily found by visually specifying the spatial positions and
relationships between objects contained within them
Grounded Semantic Reasoning for Robotic Interaction with Real-World Objects
Robots are increasingly transitioning from specialized, single-task machines to general-purpose systems that operate in unstructured environments, such as homes, offices, and warehouses. In these real-world domains, robots need to manipulate novel objects while adapting to changes in environments and goals. Semantic knowledge, which concisely describes target domains with symbols, can potentially reveal the meaningful patterns shared between problems and environments. However, existing robots are yet to effectively reason about semantic data encoding complex relational knowledge or jointly reason about symbolic semantic data and multimodal data pertinent to robotic manipulation (e.g., object point clouds, 6-DoF poses, and attributes detected with multimodal sensing).
This dissertation develops semantic reasoning frameworks capable of modeling complex semantic knowledge grounded in robot perception and action. We show that grounded semantic reasoning enables robots to more effectively perceive, model, and interact with objects in real-world environments. Specifically, this dissertation makes the following contributions: (1) a survey providing a unified view for the diversity of works in the field by formulating semantic reasoning as the integration of knowledge sources, computational frameworks, and world representations; (2) a method for predicting missing relations in large-scale knowledge graphs by leveraging type hierarchies of entities, effectively avoiding ambiguity while maintaining generalization of multi-hop reasoning patterns; (3) a method for predicting unknown properties of objects in various environmental contexts, outperforming prior knowledge graph and statistical relational learning methods due to the use of n-ary relations for modeling object properties; (4) a method for purposeful robotic grasping that accounts for a broad range of contexts (including object visual affordance, material, state, and task constraint), outperforming existing approaches in novel contexts and for unknown objects; (5) a systematic investigation into the generalization of task-oriented grasping that includes a benchmark dataset of 250k grasps, and a novel graph neural network that incorporates semantic relations into end-to-end learning of 6-DoF grasps; (6) a method for rearranging novel objects into semantically meaningful spatial structures based on high-level language instructions, more effectively capturing multi-object spatial constraints than existing pairwise spatial representations; (7) a novel planning-inspired approach that iteratively optimizes placements of partially observed objects subject to both physical constraints and semantic constraints inferred from language instructions.Ph.D
Is Neuro-Symbolic AI Meeting its Promise in Natural Language Processing? A Structured Review
Advocates for Neuro-Symbolic Artificial Intelligence (NeSy) assert that
combining deep learning with symbolic reasoning will lead to stronger AI than
either paradigm on its own. As successful as deep learning has been, it is
generally accepted that even our best deep learning systems are not very good
at abstract reasoning. And since reasoning is inextricably linked to language,
it makes intuitive sense that Natural Language Processing (NLP), would be a
particularly well-suited candidate for NeSy. We conduct a structured review of
studies implementing NeSy for NLP, with the aim of answering the question of
whether NeSy is indeed meeting its promises: reasoning, out-of-distribution
generalization, interpretability, learning and reasoning from small data, and
transferability to new domains. We examine the impact of knowledge
representation, such as rules and semantic networks, language structure and
relational structure, and whether implicit or explicit reasoning contributes to
higher promise scores. We find that systems where logic is compiled into the
neural network lead to the most NeSy goals being satisfied, while other factors
such as knowledge representation, or type of neural architecture do not exhibit
a clear correlation with goals being met. We find many discrepancies in how
reasoning is defined, specifically in relation to human level reasoning, which
impact decisions about model architectures and drive conclusions which are not
always consistent across studies. Hence we advocate for a more methodical
approach to the application of theories of human reasoning as well as the
development of appropriate benchmarks, which we hope can lead to a better
understanding of progress in the field. We make our data and code available on
github for further analysis.Comment: Surve
Metalearning
This open access book as one of the fastest-growing areas of research in machine learning, metalearning studies principled methods to obtain efficient models and solutions by adapting machine learning and data mining processes. This adaptation usually exploits information from past experience on other tasks and the adaptive processes can involve machine learning approaches. As a related area to metalearning and a hot topic currently, automated machine learning (AutoML) is concerned with automating the machine learning processes. Metalearning and AutoML can help AI learn to control the application of different learning methods and acquire new solutions faster without unnecessary interventions from the user. This book offers a comprehensive and thorough introduction to almost all aspects of metalearning and AutoML, covering the basic concepts and architecture, evaluation, datasets, hyperparameter optimization, ensembles and workflows, and also how this knowledge can be used to select, combine, compose, adapt and configure both algorithms and models to yield faster and better solutions to data mining and data science problems. It can thus help developers to develop systems that can improve themselves through experience. This book is a substantial update of the first edition published in 2009. It includes 18 chapters, more than twice as much as the previous version. This enabled the authors to cover the most relevant topics in more depth and incorporate the overview of recent research in the respective area. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining, data science and artificial intelligence. ; Metalearning is the study of principled methods that exploit metaknowledge to obtain efficient models and solutions by adapting machine learning and data mining processes. While the variety of machine learning and data mining techniques now available can, in principle, provide good model solutions, a methodology is still needed to guide the search for the most appropriate model in an efficient way. Metalearning provides one such methodology that allows systems to become more effective through experience. This book discusses several approaches to obtaining knowledge concerning the performance of machine learning and data mining algorithms. It shows how this knowledge can be reused to select, combine, compose and adapt both algorithms and models to yield faster, more effective solutions to data mining problems. It can thus help developers improve their algorithms and also develop learning systems that can improve themselves. The book will be of interest to researchers and graduate students in the areas of machine learning, data mining and artificial intelligence
- …