34 research outputs found

    Citation recommendation via proximity full-text citation analysis and supervised topical prior

    Get PDF
    Currently the many publications are now available electronically and online, which has had a significant effect, while brought several challenges. With the objective to enhance citation recommendation based on innovative text and graph mining algorithms along with full-text citation analysis, we utilized proximity-based citation contexts extracted from a large number of full-text publications, and then used a publication/citation topic distribution to generate a novel citation graph to calculate the publication topical importance. The importance score can be utilized as a new means to enhance the recommendation performance. Experiment with full-text citation data showed that the novel method could significantly (p < 0.001) enhance citation recommendation performance

    Interactive visualization systems and data integration methods for supporting discovery in collections of scientific information

    Get PDF
    Technological developments have been enabling additional sharing and reuse of scientific information. Current indexing methods support query-based search and filtering, however they do not support overviews and exploration. Due to these limitations of existing indexing methods, it is challenging to discover records and connections that relate information in new and potentially insightful ways. We developed prototype systems and computational methods for integrating collections from multiple sources within a domain into a single, unified graph data structure. Graph-theoretic measures and visualizations were then applied to identify relations and records that support discovery tasks. Three collections of molecular information were studied: (1) influenza protein sequences from the National Center for Biotechnology Information, (2) Open Notebook Science notebooks and databases from Drexel University and other academic chemical research laboratories, and (3) project data from drug discovery projects at Pfizer R&D. We designed methods for data integration within these collections. We then analyzed the integrated collections to design interactive visual tools and computational methods that could systematically identify relations and records that have a high potential to lead to novel discoveries in these areas. We conducted interviews with domain experts to evaluate the effectiveness of these designs. These studies demonstrate the feasibility of the new indexing methods to improve the discoverability of novel connections across multiple collections within a domain.Ph.D., Information Science -- Drexel University, 201

    Deep Understanding of Technical Documents : Automated Generation of Pseudocode from Digital Diagrams & Analysis/Synthesis of Mathematical Formulas

    Get PDF
    The technical document is an entity that consists of several essential and interconnected parts, often referred to as modalities. Despite the extensive attention that certain parts have already received, per say the textual information, there are several aspects that severely under researched. Two such modalities are the utility of diagram images and the deep automated understanding of mathematical formulas. Inspired by existing holistic approaches to the deep understanding of technical documents, we develop a novel formal scheme for the modelling of digital diagram images. This extends to a generative framework that allows for the creation of artificial images and their annotation. We contribute on the field with the creation of a novel synthetic dataset and its generation mechanism. We propose the conversion of the pseudocode generation problem to an image captioning task and provide a family of techniques based on adaptive image partitioning. We address the mathematical formulas’ semantic understanding by conducting an evaluating survey on the field, published in May 2021. We then propose a formal synthesis framework that utilized formula graphs as metadata, reaching for novel valuable formulas. The synthesis framework is validated by a deep geometric learning mechanism, that outsources formula data to simulate the missing a priori knowledge. We close with the proof of concept, the description of the overall pipeline and our future aims

    Winter 2011

    Get PDF
    corecore