32,974 research outputs found

    What Others Say About This Work? Scalable Extraction of Citation Contexts from Research Papers

    Get PDF
    This work presents a new, scalable solution to the problem of extracting citation contexts: the textual fragments surrounding citation references. These citation contexts can be used to navigate digital libraries of research papers to help users in deciding what to read. We have developed a prototype system which can retrieve, on-demand, citation contexts from the full text of over 15 million research articles in the Mendeley catalog for a given reference research paper. The evaluation results show that our citation extraction system provides additional functionality over existing tools, has two orders of magnitude faster runtime performance, while providing a 9% improvement in F-measure over the current state-of-the-art

    Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

    Full text link
    Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on documents with familiar layout features (e.g., papers from the same publisher), but in practice models encounter documents with unfamiliar distributions of layout features, such as new combinations of text sizes and styles, or new spatial configurations of textual elements. In this work we test whether layout-infused LMs are robust to layout distribution shifts. As a case study we use the task of scientific document structure recovery, segmenting a scientific paper into its structural categories (e.g., "title", "caption", "reference"). To emulate distribution shifts that occur in practice we re-partition the GROTOAP2 dataset. We find that under layout distribution shifts model performance degrades by up to 20 F1. Simple training strategies, such as increasing training diversity, can reduce this degradation by over 35% relative F1; however, models fail to reach in-distribution performance in any tested out-of-distribution conditions. This work highlights the need to consider layout distribution shifts during model evaluation, and presents a methodology for conducting such evaluations.Comment: To appear in ACL Findings 202

    An International Prospectus for Library & Information Professionals: Development, Leadership and Resources for Evolving Patron Needs

    Get PDF
    The roles of library and information professionals must change and evolve to: 1. accommodate needs of tech-savvy patrons; 2. thrive in the Commons & Library 2.0; 3. provide integrated, just-in-time services; 4. constantly update and enhance technology; 5. design appropriate library spaces for research and productivity; 6.adapt to new models of scholarly communication and publication, especially: the Open Archives Initiative and digital repositories; 7. remain abreast of national and interanational academic and legislative initiatives affecting the provision of information services and resources. Professionals will need to collaborate in: 1. Formal & informal networks – regional, national, and international; and; 2. Library staff development initiatives – regional, national, international Professionals will need to use libraries as laboratories for ongoing, lifelong training and education of patrons and of all library staff ( internal patrons ): the library is the framework in which Information Research Literacy is the curriculum . Professionals will need to remain aware of trends and challenges in their regions, the EU, the US and North America, of models which might provide inspiration and support: 1. Top Technology Trends; 2. New paradigms of professionalism; 3. Knowledge-creation and knowledge consumption; 4. The shifting balance of the physical library with the virtual-digital librar

    Changing Trains at Wigan: Digital Preservation and the Future of Scholarship

    Get PDF
    This paper examines the impact of the emerging digital landscape on long term access to material created in digital form and its use for research; it examines challenges, risks and expectations.

    The Arabic Language

    Get PDF
    The chapter looks at the historical background of the language of Arabic and its place within the religious traditio

    Legal Classics: After Deconstructing the Legal Canon

    Get PDF
    The debate over the canon has gripped the University in recent years. Defenders of the canon argue that canonical texts embody timeless and universal themes, but critics argue that the process of canonization subordinates certain people and viewpoints within society in order to assert the existence of a univocal tradition. Originating primarily in the field of literary criticism, the canon debate recently has emerged in legal theory. Professor Francis J. Mootz argues that the issues raised by the canon debate are relevant to legal scholarship, teaching and practice. After reviewing the extensive commentary on the literary canon, Professor Mootz criticizes the polemical structure of the debate and asserts that an appreciation of classical, as opposed to canonical, texts opens the way for a productive inquiry. He defines a classical text as one that both shapes contemporary concerns and also serves as a point of reference for revising these concerns. Classical texts enable critical perspectives rather than submitting to them, he continues, because they provide the arena for debates about issues of public concern. Using Hadley v. Baxendale as an example of a legal classic, Professor Mootz contends that the power of such a classical text is its ability to shape hotly contested legal debates. Our time . . . seems unpropitious for thinking about the question of the classic, for . . . it seems to be a simple either/or that requires merely a choosing of sides: for or against? back to the classics or away from them? Our time calls not for thinking but a vote. And it may well be too late for thinking about the classic in any case, for the vote is already in, and the nays have it

    Doctor of Philosophy

    Get PDF
    dissertationMedical knowledge learned in medical school can become quickly outdated given the tremendous growth of the biomedical literature. It is the responsibility of medical practitioners to continuously update their knowledge with recent, best available clinical evidence to make informed decisions about patient care. However, clinicians often have little time to spend on reading the primary literature even within their narrow specialty. As a result, they often rely on systematic evidence reviews developed by medical experts to fulfill their information needs. At the present, systematic reviews of clinical research are manually created and updated, which is expensive, slow, and unable to keep up with the rapidly growing pace of medical literature. This dissertation research aims to enhance the traditional systematic review development process using computer-aided solutions. The first study investigates query expansion and scientific quality ranking approaches to enhance literature search on clinical guideline topics. The study showed that unsupervised methods can improve retrieval performance of a popular biomedical search engine (PubMed). The proposed methods improve the comprehensiveness of literature search and increase the ratio of finding relevant studies with reduced screening effort. The second and third studies aim to enhance the traditional manual data extraction process. The second study developed a framework to extract and classify texts from PDF reports. This study demonstrated that a rule-based multipass sieve approach is more effective than a machine-learning approach in categorizing document-level structures and iv that classifying and filtering publication metadata and semistructured texts enhances the performance of an information extraction system. The proposed method could serve as a document processing step in any text mining research on PDF documents. The third study proposed a solution for the computer-aided data extraction by recommending relevant sentences and key phrases extracted from publication reports. This study demonstrated that using a machine-learning classifier to prioritize sentences for specific data elements performs equally or better than an abstract screening approach, and might save time and reduce errors in the full-text screening process. In summary, this dissertation showed that there are promising opportunities for technology enhancement to assist in the development of systematic reviews. In this modern age when computing resources are getting cheaper and more powerful, the failure to apply computer technologies to assist and optimize the manual processes is a lost opportunity to improve the timeliness of systematic reviews. This research provides methodologies and tests hypotheses, which can serve as the basis for further large-scale software engineering projects aimed at fully realizing the prospect of computer-aided systematic reviews
    • …
    corecore