482,347 research outputs found

    What conceptual graph workbenches need for natural language processing

    Get PDF
    An important capability of the conceptual graph knowledge engineering tools now under development will be the transformation of natural language texts into graphs (conceptual parsing) and its reverse, the production of text from graphs (conceptual generation). Are the existing basic designs adequate for these tasks? Experience developing the BEELINE system's natural language capabilities suggests that good entry/editing tools, a generous but not unlimited storage capacity and efficient, bidirectional lexical access techniques are needed to support the supply of data structures at both the linguistic and conceptual knowledge levels. An active formalism capable of supporting declarative and procedural programs containing both linguistic and knowledge level terms is also important. If these requirements are satisfied, future text-readers can be included as part of a conceptual knowledge workbench without unexpected problems

    Social Web Communities

    Get PDF
    Blogs, Wikis, and Social Bookmark Tools have rapidly emerged onthe Web. The reasons for their immediate success are that people are happy to share information, and that these tools provide an infrastructure for doing so without requiring any specific skills. At the moment, there exists no foundational research for these systems, and they provide only very simple structures for organising knowledge. Individual users create their own structures, but these can currently not be exploited for knowledge sharing. The objective of the seminar was to provide theoretical foundations for upcoming Web 2.0 applications and to investigate further applications that go beyond bookmark- and file-sharing. The main research question can be summarized as follows: How will current and emerging resource sharing systems support users to leverage more knowledge and power from the information they share on Web 2.0 applications? Research areas like Semantic Web, Machine Learning, Information Retrieval, Information Extraction, Social Network Analysis, Natural Language Processing, Library and Information Sciences, and Hypermedia Systems have been working for a while on these questions. In the workshop, researchers from these areas came together to assess the state of the art and to set up a road map describing the next steps towards the next generation of social software

    Social Web Communities

    No full text
    Blogs, Wikis, and Social Bookmark Tools have rapidly emerged on the Web. The reasons for their immediate success are that people are happy to share information, and that these tools provide an infrastructure for doing so without requiring any specific skills. At the moment, there exists no foundational research for these systems, and they provide only very simple structures for organising knowledge. Individual users create their own structures, but these can currently not be exploited for knowledge sharing. The objective of the seminar was to provide theoretical foundations for upcoming Web 2.0 applications and to investigate further applications that go beyond bookmark- and file-sharing. The main research question can be summarized as follows: How will current and emerging resource sharing systems support users to leverage more knowledge and power from the information they share on Web 2.0 applications? Research areas like Semantic Web, Machine Learning, Information Retrieval, Information Extraction, Social Network Analysis, Natural Language Processing, Library and Information Sciences, and Hypermedia Systems have been working for a while on these questions. In the workshop, researchers from these areas came together to assess the state of the art and to set up a road map describing the next steps towards the next generation of social software

    Natural language generation as neural sequence learning and beyond

    Get PDF
    Natural Language Generation (NLG) is the task of generating natural language (e.g., English sentences) from machine readable input. In the past few years, deep neural networks have received great attention from the natural language processing community due to impressive performance across different tasks. This thesis addresses NLG problems with deep neural networks from two different modeling views. Under the first view, natural language sentences are modelled as sequences of words, which greatly simplifies their representation and allows us to apply classic sequence modelling neural networks (i.e., recurrent neural networks) to various NLG tasks. Under the second view, natural language sentences are modelled as dependency trees, which are more expressive and allow to capture linguistic generalisations leading to neural models which operate on tree structures. Specifically, this thesis develops several novel neural models for natural language generation. Contrary to many existing models which aim to generate a single sentence, we propose a novel hierarchical recurrent neural network architecture to represent and generate multiple sentences. Beyond the hierarchical recurrent structure, we also propose a means to model context dynamically during generation. We apply this model to the task of Chinese poetry generation and show that it outperforms competitive poetry generation systems. Neural based natural language generation models usually work well when there is a lot of training data. When the training data is not sufficient, prior knowledge for the task at hand becomes very important. To this end, we propose a deep reinforcement learning framework to inject prior knowledge into neural based NLG models and apply it to sentence simplification. Experimental results show promising performance using our reinforcement learning framework. Both poetry generation and sentence simplification are tackled with models following the sequence learning view, where sentences are treated as word sequences. In this thesis, we also explore how to generate natural language sentences as tree structures. We propose a neural model, which combines the advantages of syntactic structure and recurrent neural networks. More concretely, our model defines the probability of a sentence by estimating the generation probability of its dependency tree. At each time step, a node is generated based on the representation of the generated subtree. We show experimentally that this model achieves good performance in language modeling and can also generate dependency trees

    DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

    Full text link
    Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasized retrieval from unstructured text corpora, owing to its seamless integration into prompts. When using structured data such as knowledge graphs, most methods simplify it into natural text, neglecting the underlying structures. Moreover, a significant gap in the current landscape is the absence of a realistic benchmark for evaluating the effectiveness of grounding LLMs on heterogeneous knowledge sources (e.g., knowledge base and text). To fill this gap, we have curated a comprehensive dataset that poses two unique challenges: (1) Two-hop multi-source questions that require retrieving information from both open-domain structured and unstructured knowledge sources; retrieving information from structured knowledge sources is a critical component in correctly answering the questions. (2) The generation of symbolic queries (e.g., SPARQL for Wikidata) is a key requirement, which adds another layer of challenge. Our dataset is created using a combination of automatic generation through predefined reasoning chains and human annotation. We also introduce a novel approach that leverages multiple retrieval tools, including text passage retrieval and symbolic language-assisted retrieval. Our model outperforms previous approaches by a significant margin, demonstrating its effectiveness in addressing the above-mentioned reasoning challenges

    A Study Towards Spanish Abstract Meaning Representation

    Get PDF
    Taking into account the increasing attention that researchers of Natural Language Understanding (NLU) and Natural Language Generation (NLG) are paying to Computational Semantics, we analyze the feasibility of annotating Spanish Abstract Meaning Representations. The Abstract Meaning Representation (AMR) project aims to create a large- scale sembank of simple structures that represent unified, complete semantic information contained in English sentences. Although AMR is not destined to be an interlingua, one of its key features is the ability to focus on events rather than on word forms. They do this, for instance, by abstracting away from morpho-syntactic idiosyncrasies. In this thesis, we investigate the requirements to – and we come up with a proposal to – annotate Spanish AMRs, based on the premise that many of these idiosyncrasies mark differences between languages. To our knowledge, this is the first work towards the development of Abstract Meaning Representation for Spanish

    Semantic Structure based Query Graph Prediction for Question Answering over Knowledge Graph

    Get PDF
    Building query graphs from questions is an important step in complex question answering over knowledge graph (Complex KGQA). In general, a question can be correctly answered if its query graph is built correctly and the right answer is then retrieved by issuing the query graph against the KG. Therefore, this paper focuses on query graph generation from natural language questions. Existing approaches for query graph generation ignore the semantic structure of a question, resulting in a large number of noisy query graph candidates that undermine prediction accuracies. In this paper, we define six semantic structures from common questions in KGQA and develop a novel Structure-BERT to predict the semantic structure of a question, and then rank the remaining candidates with a BERT-based ranking model. Extensive experiments on two popular benchmarks MetaQA and WebQuestionsSP demonstrate the effectiveness of our method as compared to state-of-the-arts

    A uniform computational model for natural language parsing and generation

    Get PDF
    In the area of natural language processing in recent years, there has been a strong tendency towards reversible natural language grammars, i.e., the use of one and the same grammar for grammatical analysis (parsing) and grammatical synthesis (generation) in a natural language system. The idea of representing grammatical knowledge only once and of using it for performing both tasks seems to be quite plausible, and there are many arguments based on practical and psychological considerations for adopting such a view (in section 2.1 we discuss the most important arguments in more detail). Nevertheless, in almost all large natural language systems in which parsing and generation are considered in similar depth, different algorithms are used - even when the same grammar is used. At present, the first attempts are being made at uniform architectures which are based on the paradigm of natural language processing as deduction (they are described and discussed in section 2.3 in detail). Here, grammatical processing is performed by means of the same underlying deduction mechanism, which can be parameterized for the specific tasks at hand. Natural language processing based on a uniform deduction process has a formal elegance and results in more compact systems. There is one further advantage that is of both theoretical and practical relevance: a uniform architecture offers the possibility of viewing parsing and generation as strongly interleaved tasks. Interleaving parsing and generation is important if we assume that natural language understanding and production are not performed in an isolated way but rather can work together to obtain a flexible use of language. In particular this means a.) the use of one mode of operation for monitoring the other and b.) the use of structures resulting from one direction directly in the other. For example, during generation integrated parsing can be used to monitor the generation process and to cause some kind of revision, e.g., to reduce the risk of misunderstandings. Research on monitoring and revision strategies is a very active area in cognitive science; however, currently there exists no algorithmic model of such a behaviour. A uniform architecture can be an important step in that direction. Unfortunately, the currently proposed uniform architectures are very inefficient and it is yet unclear how an efficiency-oriented uniform model could be achieved. An obvious problem is that in each direction different input structures are involved - a string for parsing and a semantic expression for generation - which causes a different traversal of the search space defined by the grammar. Even if this problem were solved, it is not that obvious how a uniform model could re-use partial results computed in one direction efficiently in the other direction for obtaining a practical interleaved approach to parsing and generation.Liegt nicht vor

    A uniform computational model for natural language parsing and generation

    Get PDF
    In the area of natural language processing in recent years, there has been a strong tendency towards reversible natural language grammars, i.e., the use of one and the same grammar for grammatical analysis (parsing) and grammatical synthesis (generation) in a natural language system. The idea of representing grammatical knowledge only once and of using it for performing both tasks seems to be quite plausible, and there are many arguments based on practical and psychological considerations for adopting such a view (in section 2.1 we discuss the most important arguments in more detail). Nevertheless, in almost all large natural language systems in which parsing and generation are considered in similar depth, different algorithms are used - even when the same grammar is used. At present, the first attempts are being made at uniform architectures which are based on the paradigm of natural language processing as deduction (they are described and discussed in section 2.3 in detail). Here, grammatical processing is performed by means of the same underlying deduction mechanism, which can be parameterized for the specific tasks at hand. Natural language processing based on a uniform deduction process has a formal elegance and results in more compact systems. There is one further advantage that is of both theoretical and practical relevance: a uniform architecture offers the possibility of viewing parsing and generation as strongly interleaved tasks. Interleaving parsing and generation is important if we assume that natural language understanding and production are not performed in an isolated way but rather can work together to obtain a flexible use of language. In particular this means a.) the use of one mode of operation for monitoring the other and b.) the use of structures resulting from one direction directly in the other. For example, during generation integrated parsing can be used to monitor the generation process and to cause some kind of revision, e.g., to reduce the risk of misunderstandings. Research on monitoring and revision strategies is a very active area in cognitive science; however, currently there exists no algorithmic model of such a behaviour. A uniform architecture can be an important step in that direction. Unfortunately, the currently proposed uniform architectures are very inefficient and it is yet unclear how an efficiency-oriented uniform model could be achieved. An obvious problem is that in each direction different input structures are involved - a string for parsing and a semantic expression for generation - which causes a different traversal of the search space defined by the grammar. Even if this problem were solved, it is not that obvious how a uniform model could re-use partial results computed in one direction efficiently in the other direction for obtaining a practical interleaved approach to parsing and generation.Liegt nicht vor
    corecore