385 research outputs found

    Embedding visualization.

    No full text
    A screenshot of the embedding projector visualizing tokens similar to ‚Äúspike protein‚ÄĚ, using FastText [44] embeddings trained on the COVIDScholar corpus.</p

    Publication counts by source.

    No full text
    The source of papers, patents, and clinical trials in the COVIDScholar collection, with the count of COVID-19 related publications from each source. Papers are sourced from [6, 7, 9‚Äď19]. Note: many papers in our database are available from multiple sources. The total number of unique documents is approximately 260,000.</p

    COVIDScholar: An automated COVID-19 research aggregation and analysis platform

    No full text
    The ongoing COVID-19 pandemic produced far-reaching effects throughout society, and science is no exception. The scale, speed, and breadth of the scientific community's COVID-19 response lead to the emergence of new research at the remarkable rate of more than 250 papers published per day. This posed a challenge for the scientific community as traditional methods of engagement with the literature were strained by the volume of new research being produced. Meanwhile, the urgency of response lead to an increasingly prominent role for preprint servers and a diffusion of relevant research through many channels simultaneously. These factors created a need for new tools to change the way scientific literature is organized and found by researchers. With this challenge in mind, we present an overview of COVIDScholar https://covidscholar.org, an automated knowledge portal which utilizes natural language processing (NLP) that was built to meet these urgent needs. The search interface for this corpus of more than 260,000 research articles, patents, and clinical trials served more than 33,000 users at an average of 2,000 monthly active users and a peak of more than 8,600 weekly active users in the summer of 2020. Additionally, we include an analysis of trends in COVID-19 research over the course of the pandemic with a particular focus on the first 10 months, which represents a unique period of rapid worldwide shift in scientific attention

    Classifier performance.

    No full text
    ROC curves for discipline classification models of paper abstracts using a fine-tuned SciBERT [28] model adapted for classification. Training is performed using a set of roughly 3,300 human-annotated abstracts, and results shown are generated with 10-fold cross validation.</p
    • ‚Ķ