206 research outputs found

    QNRs: toward language for intelligent machines

    Get PDF
    Impoverished syntax and nondifferentiable vocabularies make natural language a poor medium for neural representation learning and applications. Learned, quasilinguistic neural representations (QNRs) can upgrade words to embeddings and syntax to graphs to provide a more expressive and computationally tractable medium. Graph-structured, embedding-based quasilinguistic representations can support formal and informal reasoning, human and inter-agent communication, and the development of scalable quasilinguistic corpora with characteristics of both literatures and associative memory. To achieve human-like intellectual competence, machines must be fully literate, able not only to read and learn, but to write things worth retaining as contributions to collective knowledge. In support of this goal, QNR-based systems could translate and process natural language corpora to support the aggregation, refinement, integration, extension, and application of knowledge at scale. Incremental development of QNRbased models can build on current methods in neural machine learning, and as systems mature, could potentially complement or replace today’s opaque, error-prone “foundation models” with systems that are more capable, interpretable, and epistemically reliable. Potential applications and implications are broad

    Grounding semantic cognition using computational modelling and network analysis

    Get PDF
    The overarching objective of this thesis is to further the field of grounded semantics using a range of computational and empirical studies. Over the past thirty years, there have been many algorithmic advances in the modelling of semantic cognition. A commonality across these cognitive models is a reliance on hand-engineering “toy-models”. Despite incorporating newer techniques (e.g. Long short-term memory), the model inputs remain unchanged. We argue that the inputs to these traditional semantic models have little resemblance with real human experiences. In this dissertation, we ground our neural network models by training them with real-world visual scenes using naturalistic photographs. Our approach is an alternative to both hand-coded features and embodied raw sensorimotor signals. We conceptually replicate the mutually reinforcing nature of hybrid (feature-based and grounded) representations using silhouettes of concrete concepts as model inputs. We next gradually develop a novel grounded cognitive semantic representation which we call scene2vec, starting with object co-occurrences and then adding emotions and language-based tags. Limitations of our scene-based representation are identified for more abstract concepts (e.g. freedom). We further present a large-scale human semantics study, which reveals small-world semantic network topologies are context-dependent and that scenes are the most dominant cognitive dimension. This finding leads us to conclude that there is no meaning without context. Lastly, scene2vec shows promising human-like context-sensitive stereotypes (e.g. gender role bias), and we explore how such stereotypes are reduced by targeted debiasing. In conclusion, this thesis provides support for a novel computational viewpoint on investigating meaning - scene-based grounded semantics. Future research scaling scene-based semantic models to human-levels through virtual grounding has the potential to unearth new insights into the human mind and concurrently lead to advancements in artificial general intelligence by enabling robots, embodied or otherwise, to acquire and represent meaning directly from the environment

    Humanistic interpretation and machine learning

    Get PDF
    This paper investigates how unsupervised machine learning methods might make hermeneutic interpretive text analysis more objective in the social sciences. Through a close examination of the uses of topic modeling—a popular unsupervised approach in the social sciences—it argues that the primary way in which unsupervised learning supports interpretation is by allowing interpreters to discover unanticipated information in larger and more diverse corpora and by improving the transparency of the interpretive process. This view highlights that unsupervised modeling does not eliminate the researchers’ judgments from the process of producing evidence for social scientific theories. The paper shows this by distinguishing between two prevalent attitudes toward topic modeling, i.e., topic realism and topic instrumentalism. Under neither can modeling provide social scientific evidence without the researchers’ interpretive engagement with the original text materials. Thus the unsupervised text analysis cannot improve the objectivity of interpretation by alleviating the problem of underdetermination in interpretive debate. The paper argues that the sense in which unsupervised methods can improve objectivity is by providing researchers with the resources to justify to others that their interpretations are correct. This kind of objectivity seeks to reduce suspicions in collective debate that interpretations are the products of arbitrary processes influenced by the researchers’ idiosyncratic decisions or starting points. The paper discusses this view in relation to alternative approaches to formalizing interpretation and identifies several limitations on what unsupervised learning can be expected to achieve in terms of supporting interpretive work.This paper investigates how unsupervised machine learning methods might make hermeneutic interpretive text analysis more objective in the social sciences. Through a close examination of the uses of topic modeling—a popular unsupervised approach in the social sciences—it argues that the primary way in which unsupervised learning supports interpretation is by allowing interpreters to discover unanticipated information in larger and more diverse corpora and by improving the transparency of the interpretive process. This view highlights that unsupervised modeling does not eliminate the researchers’ judgments from the process of producing evidence for social scientific theories. The paper shows this by distinguishing between two prevalent attitudes toward topic modeling, i.e., topic realism and topic instrumentalism. Under neither can modeling provide social scientific evidence without the researchers’ interpretive engagement with the original text materials. Thus the unsupervised text analysis cannot improve the objectivity of interpretation by alleviating the problem of underdetermination in interpretive debate. The paper argues that the sense in which unsupervised methods can improve objectivity is by providing researchers with the resources to justify to others that their interpretations are correct. This kind of objectivity seeks to reduce suspicions in collective debate that interpretations are the products of arbitrary processes influenced by the researchers’ idiosyncratic decisions or starting points. The paper discusses this view in relation to alternative approaches to formalizing interpretation and identifies several limitations on what unsupervised learning can be expected to achieve in terms of supporting interpretive work.Peer reviewe

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Linguistic Competence and New Empiricism in Philosophy and Science

    Get PDF
    The topic of this dissertation is the nature of linguistic competence, the capacity to understand and produce sentences of natural language. I defend the empiricist account of linguistic competence embedded in the connectionist cognitive science. This strand of cognitive science has been opposed to the traditional symbolic cognitive science, coupled with transformational-generative grammar, which was committed to nativism due to the view that human cognition, including language capacity, should be construed in terms of symbolic representations and hardwired rules. Similarly, linguistic competence in this framework was regarded as being innate, rule-governed, domain-specific, and fundamentally different from performance, i.e., idiosyncrasies and factors governing linguistic behavior. I analyze state-of-the-art connectionist, deep learning models of natural language processing, most notably large language models, to see what they can tell us about linguistic competence. Deep learning is a statistical technique for the classification of patterns through which artificial intelligence researchers train artificial neural networks containing multiple layers that crunch a gargantuan amount of textual and/or visual data. I argue that these models suggest that linguistic competence should be construed as stochastic, pattern-based, and stemming from domain-general mechanisms. Moreover, I distinguish syntactic from semantic competence, and I show for each the ramifications of the endorsement of a connectionist research program as opposed to the traditional symbolic cognitive science and transformational-generative grammar. I provide a unifying front, consisting of usage-based theories, a construction grammar approach, and an embodied approach to cognition to show that the more multimodal and diverse models are in terms of architectural features and training data, the stronger the case is for the connectionist linguistic competence. I also propose to discard the competence vs. performance distinction as theoretically inferior so that a novel and integrative account of linguistic competence originating in connectionism and empiricism that I propose and defend in the dissertation could be put forward in scientific and philosophical literature

    Investigating the role of linguistic knowledge in vision and language tasks

    Get PDF
    Artificial Intelligence (AI) has transformed the way we interact with technology e.g., chatbots, voice-based assistants, smart devices, and so on. One particular area that has gained tremendous attention and importance is learning through multimodal data sources within AI systems. By incorporating multimodal learning into AI systems, we can bridge the gap between human and machine communication, enabling more intuitive and natural interactions. Multimodal learning is the integration of multiple sensory modalities, such as text, images, speech, and gestures, to enable machines to understand and interpret humans and the world around us more comprehensively. In this thesis we develop strategies to exploit multimodal data (specifically text and images) along with linguistic knowledge, making multimodal systems more reliable and accurate for various vision and language tasks. In the first part of the thesis, we focus on developing AI systems that can understand the visual world around us and respond in a more natural and human-like manner. This task is popularly known as image captioning. Despite the significant progress in this task, the image captions generated by the models are extremely generic and template-like for visually similar images. We address this limitation and generate detailed and image-specific captions by exploiting prior and implicit linguistic knowledge, without the need for more labeled data or computational overhead. Unlike previous work, our proposed method generates captions that reflect the image in detail. To further allow AI models to better understand and interpret context, in the second part of the thesis we leverage information from multiple modalities to gather a more comprehensive understanding of the visual data by generating scene graphs. Unlike image captioning that provides a high-level interpretation of the scene, in this setting a key question is – how do different objects/entities in the scene interact with each other? Collecting large amounts of labeled data that can capture every possible interaction is very expensive and infeasible. Hence, we propose an efficient training strategy that generates complete and informative scene graphs from incomplete and missing labels using the knowledge of label informativeness from linguistics. In the third part of the thesis, we study the narrative descriptions of images generated from human speech i.e., natural language, to enable natural interaction between humans and machines. One fundamental and challenging problem when dealing with natural language is the task of coreference resolution. For example, in the sentence “John saw a dog. He petted it,” coreference resolution determines that “he” refers to “John” and “it” refers to the “dog.” While coreference resolution may seem straightforward to humans, it poses several significant challenges for AI systems. Without proper coreference resolution, models will struggle to derive the correct meaning and produce coherent outputs. To address this important and complex problem, we propose a novel benchmark dataset for multimodal coreference resolution to evaluate coreference resolution in text and narrative grounding in images. We also propose a weakly supervised method with rule-based linguistic knowledge to address multimodal coreference resolution without a large supervised training dataset. Finally, we address the limitations of the weakly supervised learning setup in multimodal coreference resolution by proposing a semi-supervised learning strategy. By using a small labeled and a large unlabeled dataset with robust self-supervised and pseudo-labeled loss functions, we achieve strong performance gains for coreference resolution and narrative grounding in a data-efficient way. Our work addresses important aspects in vision and language and paves the way for interesting future avenues. In the last part of the thesis, we discuss in more detail directions for the future that are important for advancing the field and unlocking its full potential. Hence, continued research is needed to push the boundaries of multimodal learning

    Scalable Methodologies and Analyses for Modality Bias and Feature Exploitation in Language-Vision Multimodal Deep Learning

    Get PDF
    Multimodal machine learning benchmarks have exponentially grown in both capability and popularity over the last decade. Language-vision question-answering tasks such as Visual Question Answering (VQA) and Video Question Answering (video-QA) have ---thanks to their high difficulty--- become a particularly popular means through which to develop and test new modelling designs and methodology for multimodal deep learning. The challenging nature of VQA and video-QA tasks leaves plenty of room for innovation at every component of the deep learning pipeline: from dataset to modelling methodology. Such circumstances are ideal for innovating in the space of language-vision multimodality. Furthermore, the wider field is currently undergoing an incredible period of growth and increasing interest. I therefore aim to contribute to multiple key components of the VQA and video-QA pipeline, but specifically in a manner such that my contributions remain relevant, ‘scaling’ with the revolutionary new benchmark models and datasets of the near future instead of being rendered obsolete by them. The work in this thesis: highlights and explores the disruptive and problematic presence of language bias in the popular TVQA video-QA dataset, and proposes a dataset-invariant method to identify subsets that respond to different modalities; thoroughly explores the suitability of bilinear pooling as a language-vision fusion technique in video-QA, offering experimental and theoretical insight, and highlighting the parallels in multimodal processing with neurological theories; explores the nascent visual equivalent of languague modelling (`visual modelling') in order to boost the power of visual features; and proposes a dataset-invariant neurolinguistically-inspired labelling scheme for use in multimodal question-answering. I explore the positive and negative results that my experiments across this thesis yield. I conclude by discussing the limitations of my contributions, and conclude with proposals for future directions of study in the areas I contribute to

    From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought

    Full text link
    How does language inform our downstream thinking? In particular, how do humans make meaning from language -- and how can we leverage a theory of linguistic meaning to build machines that think in more human-like ways? In this paper, we propose \textit{rational meaning construction}, a computational framework for language-informed thinking that combines neural models of language with probabilistic models for rational inference. We frame linguistic meaning as a context-sensitive mapping from natural language into a \textit{probabilistic language of thought} (PLoT) -- a general-purpose symbolic substrate for probabilistic, generative world modeling. Our architecture integrates two powerful computational tools that have not previously come together: we model thinking with \textit{probabilistic programs}, an expressive representation for flexible commonsense reasoning; and we model meaning construction with \textit{large language models} (LLMs), which support broad-coverage translation from natural language utterances to code expressions in a probabilistic programming language. We illustrate our framework in action through examples covering four core domains from cognitive science: probabilistic reasoning, logical and relational reasoning, visual and physical reasoning, and social reasoning about agents and their plans. In each, we show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings, while Bayesian inference with the generated programs supports coherent and robust commonsense reasoning. We extend our framework to integrate cognitively-motivated symbolic modules to provide a unified commonsense thinking interface from language. Finally, we explore how language can drive the construction of world models themselves
    • 

    corecore