4,028 research outputs found

    Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs

    No full text
    Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines

    Reading in the Disciplines: The Challenges of Adolescent Literacy

    Get PDF
    A companion report to Carnegie's Time to Act, focuses on the specific skills and literacy support needed for reading in academic subject areas in higher grades. Outlines strategies for teaching content knowledge and reading strategies together

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Sequence locally, think globally:The Darwin tree of life project

    Get PDF
    The goals of the Earth Biogenome Project—to sequence the genomes of all eukaryotic life on earth—are as daunting as they are ambitious. The Darwin Tree of Life Project was founded to demonstrate the credibility of these goals and to deliver at-scale genome sequences of unprecedented quality for a biogeographic region: the archipelago of islands that constitute Britain and Ireland. The Darwin Tree of Life Project is a collaboration between biodiversity organizations (museums, botanical gardens, and biodiversity institutes) and genomics institutes. Together, we have built a workflow that collects specimens from the field, robustly identifies them, performs sequencing, generates high-quality, curated assemblies, and releases these openly for the global community to use to build future science and conservation efforts.</jats:p

    Sea of Genes: Combining Animation and Narrative Strategies to Visualize Metagenomic Data for Museums

    Full text link
    We examine the application of narrative strategies to present a complex and unfamiliar metagenomics dataset to the public in a science museum. Our dataset contains information about microbial gene expressions that scientists use to infer the behavior of microbes. This exhibit had three goals: to inform (the) public about microbes' behavior, cycles, and patterns; to link their behavior to the concept of gene expression; and to highlight scientists' use of gene expression data to understand the role of microbes. To address these three goals, we created a visualization with three narrative layers, each layer corresponding to a goal. This study presented us with an opportunity to assess existing frameworks for narrative visualization in a naturalistic setting. We present three successive rounds of design and evaluation of our attempts to engage visitors with complex data through narrative visualization. We highlight our design choices and their underlying rationale based on extant theories. We conclude that a central animation based on a curated dataset could successfully achieve our first goal, i.e., to communicate the aggregate behavior and interactions of microbes. We failed to achieve our second goal and had limited success with the third goal. Overall, this study highlights the challenges of telling multi-layered stories and the need for new frameworks for communicating layered stories in public settings.Comment: This manuscript has been accepted to VIS 2020 and TVCG 9 pages 2 reference

    COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

    Full text link
    Recent years have brought about a renewed interest in commonsense representation and reasoning in the field of natural language understanding. The development of new commonsense knowledge graphs (CSKG) has been central to these advances as their diverse facts can be used and referenced by machine learning models for tackling new and challenging tasks. At the same time, there remain questions about the quality and coverage of these resources due to the massive scale required to comprehensively encompass general commonsense knowledge. In this work, we posit that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents. Therefore, we propose a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them. With this new goal, we propose ATOMIC 2020, a new CSKG of general-purpose commonsense knowledge containing knowledge that is not readily available in pretrained language models. We evaluate its properties in comparison with other leading CSKGs, performing the first large-scale pairwise study of commonsense knowledge resources. Next, we show that ATOMIC 2020 is better suited for training knowledge models that can generate accurate, representative knowledge for new, unseen entities and events. Finally, through human evaluation, we show that the few-shot performance of GPT-3 (175B parameters), while impressive, remains ~12 absolute points lower than a BART-based knowledge model trained on ATOMIC 2020 despite using over 430x fewer parameters
    • …
    corecore