808 research outputs found
A survey on knowledge-enhanced multimodal learning
Multimodal learning has been a field of increasing interest, aiming to
combine various modalities in a single joint representation. Especially in the
area of visiolinguistic (VL) learning multiple models and techniques have been
developed, targeting a variety of tasks that involve images and text. VL models
have reached unprecedented performances by extending the idea of Transformers,
so that both modalities can learn from each other. Massive pre-training
procedures enable VL models to acquire a certain level of real-world
understanding, although many gaps can be identified: the limited comprehension
of commonsense, factual, temporal and other everyday knowledge aspects
questions the extendability of VL tasks. Knowledge graphs and other knowledge
sources can fill those gaps by explicitly providing missing information,
unlocking novel capabilities of VL models. In the same time, knowledge graphs
enhance explainability, fairness and validity of decision making, issues of
outermost importance for such complex implementations. The current survey aims
to unify the fields of VL representation learning and knowledge graphs, and
provides a taxonomy and analysis of knowledge-enhanced VL models
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
To facilitate the research on intelligent and human-like chatbots with
multi-modal context, we introduce a new video-based multi-modal dialogue
dataset, called TikTalk. We collect 38K videos from a popular video-sharing
platform, along with 367K conversations posted by users beneath them. Users
engage in spontaneous conversations based on their multi-modal experiences from
watching videos, which helps recreate real-world chitchat context. Compared to
previous multi-modal dialogue datasets, the richer context types in TikTalk
lead to more diverse conversations, but also increase the difficulty in
capturing human interests from intricate multi-modal information to generate
personalized responses. Moreover, external knowledge is more frequently evoked
in our dataset. These facts reveal new challenges for multi-modal dialogue
models. We quantitatively demonstrate the characteristics of TikTalk, propose a
video-based multi-modal chitchat task, and evaluate several dialogue baselines.
Experimental results indicate that the models incorporating large language
models (LLM) can generate more diverse responses, while the model utilizing
knowledge graphs to introduce external knowledge performs the best overall.
Furthermore, no existing model can solve all the above challenges well. There
is still a large room for future improvements, even for LLM with visual
extensions. Our dataset is available at
\url{https://ruc-aimind.github.io/projects/TikTalk/}.Comment: Accepted to ACM Multimedia 202
Development of a Corpus for Userbased Scientific Question Answering
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021In recent years Question & Answering (QA) tasks became particularly relevant in the research field of natural language understanding. However, the lack of good quality datasets has been an important limiting factor in the quest for better models. Particularly in the biomedical domain, the scarcity of gold standard labelled datasets has been a recognized obstacle given its idiosyncrasies and complexities often require the participation of skilled domain¬specific experts in producing such datasets. To address this issue, a method for automatically gather Question¬Answer pairs from online QA biomedical forums has been suggested yielding a corpus named BiQA. The authors describe several strategies to validate this new dataset but a human manual verification has not been conducted. With this in mind, this dissertation was set out with the objectives of performing a manual verification of a sample of 1200 questions of BiQA and also to expanding these questions, by adding features, into a new corpus of text ¬ BiQA2 ¬ with the goal of contributing with a new corpusfor biomedical QA research. Regarding the manual verification of BiQA, a methodology for its characterization was laid out and allowed the identification of an array of potential problems related to the nature of its questions and answers aptness for which possible improvement solutions were presented. Concomitantly, the proposed new BiQA2 corpus ¬ created upon the validated questions and answers from the perused samples from BiQA ¬ builds new features similar to those observed in other biomedical corpus such as the BioASQ dataset. Both BiQA and BiQA2 were applied to deep learning strategies previously submitted to the BioASQ competition to assess their performance as a source of training data. Although the results achieved with the models created using BiQA2 exhibit limited capability pertaining to the BioASQ challenge, they also show some potential to contribute positively to model training in tasks such as Document re-ranking and answering to ‘yes/no’ questions
A Survey of Natural Language Generation
This paper offers a comprehensive review of the research on Natural Language
Generation (NLG) over the past two decades, especially in relation to
data-to-text generation and text-to-text generation deep learning methods, as
well as new applications of NLG technology. This survey aims to (a) give the
latest synthesis of deep learning research on the NLG core tasks, as well as
the architectures adopted in the field; (b) detail meticulously and
comprehensively various NLG tasks and datasets, and draw attention to the
challenges in NLG evaluation, focusing on different evaluation methods and
their relationships; (c) highlight some future emphasis and relatively recent
research issues that arise due to the increasing synergy between NLG and other
artificial intelligence areas, such as computer vision, text and computational
creativity.Comment: Accepted by ACM Computing Survey (CSUR) 202
Towards an astronomical foundation model for stars with a Transformer-based model
Rapid strides are currently being made in the field of artificial
intelligence using Transformer-based models like Large Language Models (LLMs).
The potential of these methods for creating a single, large, versatile model in
astronomy has not yet been explored. In this work, we propose a framework for
data-driven astronomy that uses the same core techniques and architecture as
used by LLMs. Using a variety of observations and labels of stars as an
example, we build a Transformer-based model and train it in a self-supervised
manner with cross-survey data sets to perform a variety of inference tasks. In
particular, we demonstrate that a model can perform both
discriminative and generative tasks even if the model was not trained or
fine-tuned to do any specific task. For example, on the discriminative task of
deriving stellar parameters from Gaia XP spectra, we achieve an accuracy of 47
K in , 0.11 dex in , and 0.07 dex in ,
outperforming an expert model in the same setting. But the
same model can also generate XP spectra from stellar parameters, inpaint
unobserved spectral regions, extract empirical stellar loci, and even determine
the interstellar extinction curve. Our framework demonstrates that building and
training a foundation model without fine-tuning using data
and parameters from multiple surveys to predict unmeasured observations and
parameters is well within reach. Such "Large Astronomy Models" trained on large
quantities of observational data will play a large role in the analysis of
current and future large surveys
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Whether current or near-term AI systems could be conscious is a topic of
scientific interest and increasing public concern. This report argues for, and
exemplifies, a rigorous and empirically grounded approach to AI consciousness:
assessing existing AI systems in detail, in light of our best-supported
neuroscientific theories of consciousness. We survey several prominent
scientific theories of consciousness, including recurrent processing theory,
global workspace theory, higher-order theories, predictive processing, and
attention schema theory. From these theories we derive "indicator properties"
of consciousness, elucidated in computational terms that allow us to assess AI
systems for these properties. We use these indicator properties to assess
several recent AI systems, and we discuss how future systems might implement
them. Our analysis suggests that no current AI systems are conscious, but also
suggests that there are no obvious technical barriers to building AI systems
which satisfy these indicators
Explainable and Interpretable Face Presentation Attack Detection Methods
Decision support systems based on machine learning (ML) techniques are excelling in most artificial intelligence (AI) fields, over-performing other AI methods, as well as humans. However, challenges still exist that do not favour the dominance of AI in some applications. This proposal focuses on a critical one: lack of transparency and explainability, reducing trust and accountability of an AI system. The fact that most AI methods still operate as complex black boxes, makes the inner processes which sustain their predictions still unattainable. The awareness around these observations foster the need to regulate many sensitive domains where AI has been applied in order to interpret, explain and audit the reliability of the ML based systems.
Although modern-day biometric recognition (BR) systems are already benefiting from the performance gains achieved with AI (which can account for and learn subtle changes in the person to be authenticated or statistical mismatches between samples), it is still in the dark ages of black box models, without reaping the benefits of the mismatches between samples), it is still in the dark ages of black box models, without reaping the benefits of the XAI field. This work will focus on studying AI explainability in the field of biometrics focusing in particular use cases in BR, such as verification/ identification of individuals and liveness detection (LD) (aka, antispoofing).
The main goals of this work are: i) to become acquainted with the state-of-the-art in explainability and biometric recognition and PAD methods; ii) to develop an experimental work xxxxx
Tasks 1st semester
(1) Study of the state of the art- bibliography review on state of the art for presentation attack detection
(2) Get acquainted with the previous work of the group in the topic
(3) Data preparation and data pre-processing
(3) Define the experimental protocol, including performance metrics
(4) Perform baseline experiments
(5) Write monography
Tasks 2nd semester
(1) Update on the state of the art
(2) Data preparation and data pre-processing
(3) Propose and implement a methodology for interpretability in biometrics
(4) Evaluation of the performance and comparison with baseline and state of the art approaches
(5) Dissertation writing
Referências bibliográficas principais: (*)
[Doshi17] B. Kim and F. Doshi-Velez, "Interpretable machine learning: The fuss, the concrete and the questions," 2017
[Mol19] Christoph Molnar. Interpretable Machine Learning. 2019
[Sei18] C. Seibold, W. Samek, A. Hilsmann, and P. Eisert, "Accurate and robust neural networks for security related applications exampled by face morphing attacks," arXiv preprint arXiv:1806.04265, 2018
[Seq20] Sequeira, Ana F., João T. Pinto, Wilson Silva, Tiago Gonçalves and Cardoso, Jaime S., "Interpretable Biometrics: Should We Rethink How Presentation Attack Detection is Evaluated?", 8th IWBF2020
[Wilson18] W. Silva, K. Fernandes, M. J. Cardoso, and J. S. Cardoso, "Towards complementary explanations using deep neural networks," in Understanding and Interpreting Machine Learning in MICA. Springer, 2018
[Wilson19] W. Silva, K. Fernandes, and J. S. Cardoso, "How to produce complementary explanations using an Ensemble Model," in IJCNN. 2019
[Wilson19A] W. Silva, M. J. Cardoso, and J. S. Cardoso, "Image captioning as a proxy for Explainable Decisions" in Understanding and Interpreting Machine Learning in MICA, 2019 (Submitted
Contextualizing Artificial Intelligence: The History, Values, and Epistemology of Technology in the Philosophy of Science
Artificial intelligence (AI) and other advanced technologies pose new questions for philosophers of science regarding epistemology, science and values, and the history of science. I will address these issues across three essays in this dissertation. The first essay concerns epistemic problems that emerge with existing accounts of scientific explanation when they are applied to deep neural networks (DNNs). Causal explanations in particular, which appear at first to be well suited to the task of explaining DNNs, fail to provide any such explanation. The second essay will explore bias in systems of automated decision-making, and the role of various conceptions of objectivity in either reinforcing or mitigating bias. I focus on conceptions of objectivity common in social epistemology and the feminist philosophy of science. The third essay probes the history of the development of 20th century telecommunications technology and the relationship between formal and informal systems of scientific knowledge production. Inquiring into the role that early phone and computer hackers played in the scientific developments of those technologies, I untangle the messy web of relationships between various groups that had a lasting impact on this history while engaging in a conceptual analysis of hacking and hackers
- …