2,385 research outputs found

    Etiology of sarcoidosis

    Get PDF

    Maximizing Equitable Reach and Accessibility of ETDs

    Full text link
    This poster addresses accessibility issues of electronic theses and dissertations (ETDs) in digital libraries (DLs). ETDs are available primarily as PDF files, which present barriers to equitable access, especially for users with visual impairments, cognitive or learning disabilities, or for anyone needing more efficient and effective ways of finding relevant information within these long documents. We propose using AI techniques, including natural language processing (NLP), computer vision, and text analysis, to convert PDFs into machine-readable HTML documents with semantic tags and structure, extracting figures and tables, and generating summaries and keywords. Our goal is to increase the accessibility of ETDs and to make this important scholarship available to a wider audience

    Opening Books and the National Corpus of Graduate Research

    Get PDF
    Virginia Tech University Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, request $505,214 in grant funding for a 3-year project, the goal of which is to bring computational access to book-length documents, demonstrating that with Electronic Theses and Dissertations (ETDs). The project is motivated by the following library and community needs. (1) Despite huge volumes of book-length documents in digital libraries, there is a lack of models offering effective and efficient computational access to these long documents. (2) Nationwide open access services for ETDs generally function at the metadata level. Much important knowledge and scientific data lie hidden in ETDs, and we need better tools to mine the content and facilitate the identification, discovery, and reuse of these important components. (3) A wide range of audiences can potentially benefit from this research, including but not limited to Librarians, Students, Authors, Educators, Researchers, and other interested readers. We will answer the following key research questions: (1) How can we effectively identify and extract key parts (chapters, sections, tables, figures, citations), in both born digital and page image formats? (2) How can we develop effective automatic classication as well as chapter summarization techniques? (3) How can our ETD digital library most effectively serve stakeholders? In response to these questions, we plan to first compile an ETD corpus consisting of at least 50,000 documents from multiple institutional repositories. We will make the corpus inclusive and diverse, covering a range of degrees (master’s and doctoral), years, graduate programs (STEM and non-STEM), and authors (from HBCUs and non-HBCUs). Testing first with this sample, we will investigate three major research areas (RAs), outlined below. RA 1: Document analysis and extraction, in which we experiment with machine/deep learning models for effective ETD segmentation and subsequent information extraction. Anticipated results of this research include new software tools that can be used and adapted by libraries for automatic extraction of structural metadata and document components (chapters, sections, figures, tables, citations, bibliographies) from ETDs - applied to both page image and born digital documents. RA 2: Adding value, in which we investigate techniques and build machine/deep learning models to automatically summarize and classify ETD chapters. Anticipated results of this research include software implementations of a chapter-level text summarizer that generates paragraph-length summaries of ETD chapters, and a multi-label classifier that assigns subject categories to ETD chapters. Our aim is to develop software that can be adapted or replicated by libraries to add value to their existing ETD services. RA 3: User services, in which we study users to identify and understand their information needs and information seeking behaviors, so that we may establish corresponding requirements for user interface and service components most useful for interacting with ETD content. Basing our design decisions on empirical evidence obtained from user analysis, we will construct a prototype system to demonstrate how these components can improve the user experience with ETD collections, and ultimately increase the capacity of libraries to provide access to ETDs and other long-form document content. Our project brings to bear cutting-edge computer science and machine/deep learning technologies to advance discovery, use, and potential for reuse of the knowledge hidden in the text of books and book-length documents. In addition, by focusing on libraries\u27 ETD collections (where legal restrictions from book publishers generally are not applicable), our research will open this rich corpus of graduate research and scholarship, leverage ETDs to advance further research and education, and allow libraries to achieve greater impact

    Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations

    Get PDF
    Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents, so they often fail to extract metadata from scanned documents such as for ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a new ground truth corpus consisting of 500 ETD cover pages with human validated metadata. Our experiments show that CRF with visual features outperformed both a heuristic and a CRF model with only text-based features. The proposed model achieved 81.3%-96% F1 measure on seven metadata fields. The data and source code are publicly available on Google Drive (https://tinyurl.com/y8kxzwrp) and a GitHub repository (https://github.com/lamps-lab/ETDMiner/tree/master/etd_crf), respectively.Comment: 7 pages, 4 figures, 1 table. Accepted by JCDL '21 as a short pape

    MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries

    Full text link
    Metadata quality is crucial for digital objects to be discovered through digital library interfaces. However, due to various reasons, the metadata of digital objects often exhibits incomplete, inconsistent, and incorrect values. We investigate methods to automatically detect, correct, and canonicalize scholarly metadata, using seven key fields of electronic theses and dissertations (ETDs) as a case study. We propose MetaEnhance, a framework that utilizes state-of-the-art artificial intelligence methods to improve the quality of these fields. To evaluate MetaEnhance, we compiled a metadata quality evaluation benchmark containing 500 ETDs, by combining subsets sampled using multiple criteria. We tested MetaEnhance on this benchmark and found that the proposed methods achieved nearly perfect F1-scores in detecting errors and F1-scores in correcting errors ranging from 0.85 to 1.00 for five of seven fields.Comment: 7 pages, 3 tables, and 1 figure. Accepted by 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL '23) as a short pape

    Development of a Pressure Sensitive Paint System for Measuring Global Surface Pressures on Rotorcraft Blades

    Get PDF
    This paper will describe the results from a proof of concept test to examine the feasibility of using Pressure Sensitive Paint (PSP) to measure global surface pressures on rotorcraft blades in hover. The test was performed using the U.S. Army 2-meter Rotor Test Stand (2MRTS) and 15% scale swept rotor blades. Data were collected from five blades using both the intensity- and lifetime-based approaches. This paper will also outline several modifications and improvements that are underway to develop a system capable of measuring pressure distributions on up to four blades simultaneously at hover and forward flight conditions

    The Resilient Organization: A Meta-Analysis of the Effect of Communication on Team Diversity and Team Performance

    Get PDF
    The Input-Process-Output framework is adopted to examine the impact of diversity attributes (the input) on communication (the process) and their influence on performance (the output), to understand the internal group/team working mechanisms of organizational resilience. A meta-analysis of 174 correlations from 35 empirical studies undertaken over 35 years (1982-2017) showed that members of a team who have different experiences are more likely to share information and communicate openly when they deal with a task that requires collaboration outside the team. This supports the view that organizations are more resilient by being more closely connected with the external environment. Differences in social categories tend to favor openness of communication, especially in the case of age diversity and race/ethnicity diversity. An increase in openness of communication is likely to enhance team performance, particularly for small and medium sized teams operating in manufacturing industries, while frequency of communication can be beneficial for both large and medium sized teams working in the high technology industry. The positive workings of these associations form the resilient organization
    corecore