808 research outputs found

    A survey on knowledge-enhanced multimodal learning

    Full text link
    Multimodal learning has been a field of increasing interest, aiming to combine various modalities in a single joint representation. Especially in the area of visiolinguistic (VL) learning multiple models and techniques have been developed, targeting a variety of tasks that involve images and text. VL models have reached unprecedented performances by extending the idea of Transformers, so that both modalities can learn from each other. Massive pre-training procedures enable VL models to acquire a certain level of real-world understanding, although many gaps can be identified: the limited comprehension of commonsense, factual, temporal and other everyday knowledge aspects questions the extendability of VL tasks. Knowledge graphs and other knowledge sources can fill those gaps by explicitly providing missing information, unlocking novel capabilities of VL models. In the same time, knowledge graphs enhance explainability, fairness and validity of decision making, issues of outermost importance for such complex implementations. The current survey aims to unify the fields of VL representation learning and knowledge graphs, and provides a taxonomy and analysis of knowledge-enhanced VL models

    TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World

    Full text link
    To facilitate the research on intelligent and human-like chatbots with multi-modal context, we introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them. Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context. Compared to previous multi-modal dialogue datasets, the richer context types in TikTalk lead to more diverse conversations, but also increase the difficulty in capturing human interests from intricate multi-modal information to generate personalized responses. Moreover, external knowledge is more frequently evoked in our dataset. These facts reveal new challenges for multi-modal dialogue models. We quantitatively demonstrate the characteristics of TikTalk, propose a video-based multi-modal chitchat task, and evaluate several dialogue baselines. Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall. Furthermore, no existing model can solve all the above challenges well. There is still a large room for future improvements, even for LLM with visual extensions. Our dataset is available at \url{https://ruc-aimind.github.io/projects/TikTalk/}.Comment: Accepted to ACM Multimedia 202

    Development of a Corpus for User­based Scientific Question Answering

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021In recent years Question & Answering (QA) tasks became particularly relevant in the research field of natural language understanding. However, the lack of good quality datasets has been an important limiting factor in the quest for better models. Particularly in the biomedical domain, the scarcity of gold standard labelled datasets has been a recognized obstacle given its idiosyncrasies and complexities often require the participation of skilled domain¬specific experts in producing such datasets. To address this issue, a method for automatically gather Question¬Answer pairs from online QA biomedical forums has been suggested yielding a corpus named BiQA. The authors describe several strategies to validate this new dataset but a human manual verification has not been conducted. With this in mind, this dissertation was set out with the objectives of performing a manual verification of a sample of 1200 questions of BiQA and also to expanding these questions, by adding features, into a new corpus of text ¬ BiQA2 ¬ with the goal of contributing with a new corpusfor biomedical QA research. Regarding the manual verification of BiQA, a methodology for its characterization was laid out and allowed the identification of an array of potential problems related to the nature of its questions and answers aptness for which possible improvement solutions were presented. Concomitantly, the proposed new BiQA2 corpus ¬ created upon the validated questions and answers from the perused samples from BiQA ¬ builds new features similar to those observed in other biomedical corpus such as the BioASQ dataset. Both BiQA and BiQA2 were applied to deep learning strategies previously submitted to the BioASQ competition to assess their performance as a source of training data. Although the results achieved with the models created using BiQA2 exhibit limited capability pertaining to the BioASQ challenge, they also show some potential to contribute positively to model training in tasks such as Document re-ranking and answering to ‘yes/no’ questions

    A Survey of Natural Language Generation

    Full text link
    This paper offers a comprehensive review of the research on Natural Language Generation (NLG) over the past two decades, especially in relation to data-to-text generation and text-to-text generation deep learning methods, as well as new applications of NLG technology. This survey aims to (a) give the latest synthesis of deep learning research on the NLG core tasks, as well as the architectures adopted in the field; (b) detail meticulously and comprehensively various NLG tasks and datasets, and draw attention to the challenges in NLG evaluation, focusing on different evaluation methods and their relationships; (c) highlight some future emphasis and relatively recent research issues that arise due to the increasing synergy between NLG and other artificial intelligence areas, such as computer vision, text and computational creativity.Comment: Accepted by ACM Computing Survey (CSUR) 202

    Towards an astronomical foundation model for stars with a Transformer-based model

    Full text link
    Rapid strides are currently being made in the field of artificial intelligence using Transformer-based models like Large Language Models (LLMs). The potential of these methods for creating a single, large, versatile model in astronomy has not yet been explored. In this work, we propose a framework for data-driven astronomy that uses the same core techniques and architecture as used by LLMs. Using a variety of observations and labels of stars as an example, we build a Transformer-based model and train it in a self-supervised manner with cross-survey data sets to perform a variety of inference tasks. In particular, we demonstrate that a single\textit{single} model can perform both discriminative and generative tasks even if the model was not trained or fine-tuned to do any specific task. For example, on the discriminative task of deriving stellar parameters from Gaia XP spectra, we achieve an accuracy of 47 K in TeffT_\mathrm{eff}, 0.11 dex in logg\log{g}, and 0.07 dex in [M/H][\mathrm{M/H}], outperforming an expert XGBoost\texttt{XGBoost} model in the same setting. But the same model can also generate XP spectra from stellar parameters, inpaint unobserved spectral regions, extract empirical stellar loci, and even determine the interstellar extinction curve. Our framework demonstrates that building and training a single\textit{single} foundation model without fine-tuning using data and parameters from multiple surveys to predict unmeasured observations and parameters is well within reach. Such "Large Astronomy Models" trained on large quantities of observational data will play a large role in the analysis of current and future large surveys

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

    Full text link
    Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators

    Biases in Large Language Models: Origins, Inventory and Discussion

    Get PDF

    Explainable and Interpretable Face Presentation Attack Detection Methods

    Get PDF
    Decision support systems based on machine learning (ML) techniques are excelling in most artificial intelligence (AI) fields, over-performing other AI methods, as well as humans. However, challenges still exist that do not favour the dominance of AI in some applications. This proposal focuses on a critical one: lack of transparency and explainability, reducing trust and accountability of an AI system. The fact that most AI methods still operate as complex black boxes, makes the inner processes which sustain their predictions still unattainable. The awareness around these observations foster the need to regulate many sensitive domains where AI has been applied in order to interpret, explain and audit the reliability of the ML based systems. Although modern-day biometric recognition (BR) systems are already benefiting from the performance gains achieved with AI (which can account for and learn subtle changes in the person to be authenticated or statistical mismatches between samples), it is still in the dark ages of black box models, without reaping the benefits of the mismatches between samples), it is still in the dark ages of black box models, without reaping the benefits of the XAI field. This work will focus on studying AI explainability in the field of biometrics focusing in particular use cases in BR, such as verification/ identification of individuals and liveness detection (LD) (aka, antispoofing). The main goals of this work are: i) to become acquainted with the state-of-the-art in explainability and biometric recognition and PAD methods; ii) to develop an experimental work xxxxx Tasks 1st semester (1) Study of the state of the art- bibliography review on state of the art for presentation attack detection (2) Get acquainted with the previous work of the group in the topic (3) Data preparation and data pre-processing (3) Define the experimental protocol, including performance metrics (4) Perform baseline experiments (5) Write monography Tasks 2nd semester (1) Update on the state of the art (2) Data preparation and data pre-processing (3) Propose and implement a methodology for interpretability in biometrics (4) Evaluation of the performance and comparison with baseline and state of the art approaches (5) Dissertation writing Referências bibliográficas principais: (*) [Doshi17] B. Kim and F. Doshi-Velez, "Interpretable machine learning: The fuss, the concrete and the questions," 2017 [Mol19] Christoph Molnar. Interpretable Machine Learning. 2019 [Sei18] C. Seibold, W. Samek, A. Hilsmann, and P. Eisert, "Accurate and robust neural networks for security related applications exampled by face morphing attacks," arXiv preprint arXiv:1806.04265, 2018 [Seq20] Sequeira, Ana F., João T. Pinto, Wilson Silva, Tiago Gonçalves and Cardoso, Jaime S., "Interpretable Biometrics: Should We Rethink How Presentation Attack Detection is Evaluated?", 8th IWBF2020 [Wilson18] W. Silva, K. Fernandes, M. J. Cardoso, and J. S. Cardoso, "Towards complementary explanations using deep neural networks," in Understanding and Interpreting Machine Learning in MICA. Springer, 2018 [Wilson19] W. Silva, K. Fernandes, and J. S. Cardoso, "How to produce complementary explanations using an Ensemble Model," in IJCNN. 2019 [Wilson19A] W. Silva, M. J. Cardoso, and J. S. Cardoso, "Image captioning as a proxy for Explainable Decisions" in Understanding and Interpreting Machine Learning in MICA, 2019 (Submitted

    Contextualizing Artificial Intelligence: The History, Values, and Epistemology of Technology in the Philosophy of Science

    Get PDF
    Artificial intelligence (AI) and other advanced technologies pose new questions for philosophers of science regarding epistemology, science and values, and the history of science. I will address these issues across three essays in this dissertation. The first essay concerns epistemic problems that emerge with existing accounts of scientific explanation when they are applied to deep neural networks (DNNs). Causal explanations in particular, which appear at first to be well suited to the task of explaining DNNs, fail to provide any such explanation. The second essay will explore bias in systems of automated decision-making, and the role of various conceptions of objectivity in either reinforcing or mitigating bias. I focus on conceptions of objectivity common in social epistemology and the feminist philosophy of science. The third essay probes the history of the development of 20th century telecommunications technology and the relationship between formal and informal systems of scientific knowledge production. Inquiring into the role that early phone and computer hackers played in the scientific developments of those technologies, I untangle the messy web of relationships between various groups that had a lasting impact on this history while engaging in a conceptual analysis of hacking and hackers
    corecore