2,385 research outputs found
Maximizing Equitable Reach and Accessibility of ETDs
This poster addresses accessibility issues of electronic theses and
dissertations (ETDs) in digital libraries (DLs). ETDs are available primarily
as PDF files, which present barriers to equitable access, especially for users
with visual impairments, cognitive or learning disabilities, or for anyone
needing more efficient and effective ways of finding relevant information
within these long documents. We propose using AI techniques, including natural
language processing (NLP), computer vision, and text analysis, to convert PDFs
into machine-readable HTML documents with semantic tags and structure,
extracting figures and tables, and generating summaries and keywords. Our goal
is to increase the accessibility of ETDs and to make this important scholarship
available to a wider audience
Opening Books and the National Corpus of Graduate Research
Virginia Tech University Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, request $505,214 in grant funding for a 3-year project, the goal of which is to bring computational access to book-length documents, demonstrating that with Electronic Theses and Dissertations (ETDs). The project is motivated by the following library and community needs. (1) Despite huge volumes of book-length documents in digital libraries, there is a lack of models offering effective and efficient computational access to these long documents. (2) Nationwide open access services for ETDs generally function at the metadata level. Much important knowledge and scientific data lie hidden in ETDs, and we need better tools to mine the content and facilitate the identification, discovery, and reuse of these important components. (3) A wide range of audiences can potentially benefit from this research, including but not limited to Librarians, Students, Authors, Educators, Researchers, and other interested readers.
We will answer the following key research questions: (1) How can we effectively identify and extract key parts (chapters, sections, tables, figures, citations), in both born digital and page image formats? (2) How can we develop effective automatic classication as well as chapter summarization techniques? (3) How can our ETD digital library most effectively serve stakeholders? In response to these questions, we plan to first compile an ETD corpus consisting of at least 50,000 documents from multiple institutional repositories. We will make the corpus inclusive and diverse, covering a range of degrees (master’s and doctoral), years, graduate programs (STEM and non-STEM), and authors (from HBCUs and non-HBCUs). Testing first with this sample, we will investigate three major research areas (RAs), outlined below.
RA 1: Document analysis and extraction, in which we experiment with machine/deep learning models for effective ETD segmentation and subsequent information extraction. Anticipated results of this research include new software tools that can be used and adapted by libraries for automatic extraction of structural metadata and document components (chapters, sections, figures, tables, citations, bibliographies) from ETDs - applied to both page image and born digital documents.
RA 2: Adding value, in which we investigate techniques and build machine/deep learning models to automatically summarize and classify ETD chapters. Anticipated results of this research include software implementations of a chapter-level text summarizer that generates paragraph-length summaries of ETD chapters, and a multi-label classifier that assigns subject categories to ETD chapters. Our aim is to develop software that can be adapted or replicated by libraries to add value to their existing ETD services.
RA 3: User services, in which we study users to identify and understand their information needs and information seeking behaviors, so that we may establish corresponding requirements for user interface and service components most useful for interacting with ETD content. Basing our design decisions on empirical evidence obtained from user analysis, we will construct a prototype system to demonstrate how these components can improve the user experience with ETD collections, and ultimately increase the capacity of libraries to provide access to ETDs and other long-form document content.
Our project brings to bear cutting-edge computer science and machine/deep learning technologies to advance discovery, use, and potential for reuse of the knowledge hidden in the text of books and book-length documents. In addition, by focusing on libraries\u27 ETD collections (where legal restrictions from book publishers generally are not applicable), our research will open this rich corpus of graduate research and scholarship, leverage ETDs to advance further research and education, and allow libraries to achieve greater impact
Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations
Electronic Theses and Dissertations (ETDs) contain domain knowledge that can
be used for many digital library tasks, such as analyzing citation networks and
predicting research trends. Automatic metadata extraction is important to build
scalable digital library search engines. Most existing methods are designed for
born-digital documents, so they often fail to extract metadata from scanned
documents such as for ETDs. Traditional sequence tagging methods mainly rely on
text-based features. In this paper, we propose a conditional random field (CRF)
model that combines text-based and visual features. To verify the robustness of
our model, we extended an existing corpus and created a new ground truth corpus
consisting of 500 ETD cover pages with human validated metadata. Our
experiments show that CRF with visual features outperformed both a heuristic
and a CRF model with only text-based features. The proposed model achieved
81.3%-96% F1 measure on seven metadata fields. The data and source code are
publicly available on Google Drive (https://tinyurl.com/y8kxzwrp) and a GitHub
repository (https://github.com/lamps-lab/ETDMiner/tree/master/etd_crf),
respectively.Comment: 7 pages, 4 figures, 1 table. Accepted by JCDL '21 as a short pape
MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries
Metadata quality is crucial for digital objects to be discovered through
digital library interfaces. However, due to various reasons, the metadata of
digital objects often exhibits incomplete, inconsistent, and incorrect values.
We investigate methods to automatically detect, correct, and canonicalize
scholarly metadata, using seven key fields of electronic theses and
dissertations (ETDs) as a case study. We propose MetaEnhance, a framework that
utilizes state-of-the-art artificial intelligence methods to improve the
quality of these fields. To evaluate MetaEnhance, we compiled a metadata
quality evaluation benchmark containing 500 ETDs, by combining subsets sampled
using multiple criteria. We tested MetaEnhance on this benchmark and found that
the proposed methods achieved nearly perfect F1-scores in detecting errors and
F1-scores in correcting errors ranging from 0.85 to 1.00 for five of seven
fields.Comment: 7 pages, 3 tables, and 1 figure. Accepted by 2023 ACM/IEEE Joint
Conference on Digital Libraries (JCDL '23) as a short pape
Development of a Pressure Sensitive Paint System for Measuring Global Surface Pressures on Rotorcraft Blades
This paper will describe the results from a proof of concept test to examine the feasibility of using Pressure Sensitive Paint (PSP) to measure global surface pressures on rotorcraft blades in hover. The test was performed using the U.S. Army 2-meter Rotor Test Stand (2MRTS) and 15% scale swept rotor blades. Data were collected from five blades using both the intensity- and lifetime-based approaches. This paper will also outline several modifications and improvements that are underway to develop a system capable of measuring pressure distributions on up to four blades simultaneously at hover and forward flight conditions
The Resilient Organization: A Meta-Analysis of the Effect of Communication on Team Diversity and Team Performance
The Input-Process-Output framework is adopted to examine the impact of diversity attributes (the input) on communication (the process) and their influence on performance (the output), to understand the internal group/team working mechanisms of organizational resilience. A meta-analysis of 174 correlations from 35 empirical studies undertaken over 35 years (1982-2017) showed that members of a team who have different experiences are more likely to share information and communicate openly when they deal with a task that requires collaboration outside the team. This supports the view that organizations are more resilient by being more closely connected with the external environment. Differences in social categories tend to favor openness of communication, especially in the case of age diversity and race/ethnicity diversity. An increase in openness of communication is likely to enhance team performance, particularly for small and medium sized teams operating in manufacturing industries, while frequency of communication can be beneficial for both large and medium sized teams working in the high technology industry. The positive workings of these associations form the resilient organization
- …