44 research outputs found
Geoparsing biodiversity heritage library collections: A preliminary exploration
A short pilot study was conducted to provide recommendations on methods and workflows for extracting geographic references from the text of Biodiversity Heritage Library collections and disambiguating these references. An initial survey of the literature was conducted, and a variety of possible techniques and software were subsequently explored for natural language processing, machine learning, document annotation, and map visualization. A test corpus was evaluated, and preliminary findings identify challenges for a full-scale effort towards automated geoparsing, including: varying OCR quality, diversity of the corpus, historical context, and ambiguity of geographic references. The project background, approaches, and preliminary assessment are described here
Recommended from our members
Selecting and Categorizing Textual Descriptions of Images in the Context of an Image Indexer's Toolkit
We describe a series of studies aimed at identifying specifications for a text extraction module of an image indexer's toolkit. The materials used in the studies consist of images paired with paragraph sequences that describe the images. We administered a pilot survey to visual resource center professionals at three universities to determine what types of paragraphs would be preferred for metadata selection. Respondents generally showed a strong preference for one of two paragraphs they were presented with, indicating that not all paragraphs that describe images are seen as good sources of metadata. We developed a set of semantic category labels to assign to spans of text in order to distinguish between different types of information about the images, thus to classify metadata contexts. Human agreement on metadata is notoriously variable. In order to maximize agreement, we conducted four human labeling experiments using the seven semantic category labels we developed. A subset of our labelers had much higher inter-annotator reliability, and highest reliability occurs when labelers can pick two labels per text unit
Testing the Waters: Blogging for User Needs Analysis, Information Access, and Building a Community of Practitioners
ABSTRACT This panel session will focus on three strategies for using blogs to improve access to collections, understand information needs of those searching the collections, and build communities of practice with information professionals serving similar user groups. Three presenters will share their experiences, goals, methods, and results. A facilitated discussion with the audience will follow the presentations and allow attendees to brainstorm on possible uses of blogging outside the box to reach the goals of their current projects or initiatives that they are hoping to undertake in the near future
Computational linguistics for metadata building: Aggregating text processing technologies for enhanced image access
We present a system which applies text mining using computational linguistic techniques to automatically extract, categorize, disambiguate and filter metadata for image access. Candidate subject terms are identified through standard approaches; novel semantic categorization using machine learning and disambiguation using both WordNet and a domain specific thesaurus are applied. The resulting metadata can be manually edited by image catalogers or filtered by semi-automatic rules. We describe the implementation of this workbench created for, and evaluated by, image catalogers. We discuss the system\u27s current functionality, developed under the Computational Linguistics for Metadata Building (CLiMB) research project. The CLiMB Toolkit has been tested with several collections, including: Art Images for College Teaching (AICT), ARTStor, the National Gallery of Art (NGA), the Senate Museum, and from collaborative projects such as the Landscape Architecture Image Resource (LAIR) and the field guides of the Vernacular Architecture Group (VAG)
Agricultural data management and sharing: Best practices and case study
Agricultural data are crucial to many aspects of production, commerce, and research involved in feeding the global community. However, in most agricultural research disciplines standard best practices for data management and publication do not exist. Here we propose a set of best practices in the areas of peer review, minimal dataset development, data repositories, citizen science initiatives, and support for best data management. We illustrate some of these best practices with a case study in dairy agroecosystems research. While many common, and increasingly disparate data management and publication practices are entrenched in agricultural disciplines, opportunities are readily available for promoting and adopting best practices that better enable and enhance data-intensive agricultural research and production
Recommended from our members
Individual common variants exert weak effects on the risk for autism spectrum disorders.
While it is apparent that rare variation can play an important role in the genetic architecture of autism spectrum disorders (ASDs), the contribution of common variation to the risk of developing ASD is less clear. To produce a more comprehensive picture, we report Stage 2 of the Autism Genome Project genome-wide association study, adding 1301 ASD families and bringing the total to 2705 families analysed (Stages 1 and 2). In addition to evaluating the association of individual single nucleotide polymorphisms (SNPs), we also sought evidence that common variants, en masse, might affect the risk. Despite genotyping over a million SNPs covering the genome, no single SNP shows significant association with ASD or selected phenotypes at a genome-wide level. The SNP that achieves the smallest P-value from secondary analyses is rs1718101. It falls in CNTNAP2, a gene previously implicated in susceptibility for ASD. This SNP also shows modest association with age of word/phrase acquisition in ASD subjects, of interest because features of language development are also associated with other variation in CNTNAP2. In contrast, allele scores derived from the transmission of common alleles to Stage 1 cases significantly predict case status in the independent Stage 2 sample. Despite being significant, the variance explained by these allele scores was small (Vm< 1%). Based on results from individual SNPs and their en masse effect on risk, as inferred from the allele score results, it is reasonable to conclude that common variants affect the risk for ASD but their individual effects are modest
Recommended from our members
Effect of Hydrocortisone on Mortality and Organ Support in Patients With Severe COVID-19: The REMAP-CAP COVID-19 Corticosteroid Domain Randomized Clinical Trial.
Importance: Evidence regarding corticosteroid use for severe coronavirus disease 2019 (COVID-19) is limited. Objective: To determine whether hydrocortisone improves outcome for patients with severe COVID-19. Design, Setting, and Participants: An ongoing adaptive platform trial testing multiple interventions within multiple therapeutic domains, for example, antiviral agents, corticosteroids, or immunoglobulin. Between March 9 and June 17, 2020, 614 adult patients with suspected or confirmed COVID-19 were enrolled and randomized within at least 1 domain following admission to an intensive care unit (ICU) for respiratory or cardiovascular organ support at 121 sites in 8 countries. Of these, 403 were randomized to open-label interventions within the corticosteroid domain. The domain was halted after results from another trial were released. Follow-up ended August 12, 2020. Interventions: The corticosteroid domain randomized participants to a fixed 7-day course of intravenous hydrocortisone (50 mg or 100 mg every 6 hours) (nâ=â143), a shock-dependent course (50 mg every 6 hours when shock was clinically evident) (nâ=â152), or no hydrocortisone (nâ=â108). Main Outcomes and Measures: The primary end point was organ support-free days (days alive and free of ICU-based respiratory or cardiovascular support) within 21 days, where patients who died were assigned -1 day. The primary analysis was a bayesian cumulative logistic model that included all patients enrolled with severe COVID-19, adjusting for age, sex, site, region, time, assignment to interventions within other domains, and domain and intervention eligibility. Superiority was defined as the posterior probability of an odds ratio greater than 1 (threshold for trial conclusion of superiority >99%). Results: After excluding 19 participants who withdrew consent, there were 384 patients (mean age, 60 years; 29% female) randomized to the fixed-dose (nâ=â137), shock-dependent (nâ=â146), and no (nâ=â101) hydrocortisone groups; 379 (99%) completed the study and were included in the analysis. The mean age for the 3 groups ranged between 59.5 and 60.4 years; most patients were male (range, 70.6%-71.5%); mean body mass index ranged between 29.7 and 30.9; and patients receiving mechanical ventilation ranged between 50.0% and 63.5%. For the fixed-dose, shock-dependent, and no hydrocortisone groups, respectively, the median organ support-free days were 0 (IQR, -1 to 15), 0 (IQR, -1 to 13), and 0 (-1 to 11) days (composed of 30%, 26%, and 33% mortality rates and 11.5, 9.5, and 6 median organ support-free days among survivors). The median adjusted odds ratio and bayesian probability of superiority were 1.43 (95% credible interval, 0.91-2.27) and 93% for fixed-dose hydrocortisone, respectively, and were 1.22 (95% credible interval, 0.76-1.94) and 80% for shock-dependent hydrocortisone compared with no hydrocortisone. Serious adverse events were reported in 4 (3%), 5 (3%), and 1 (1%) patients in the fixed-dose, shock-dependent, and no hydrocortisone groups, respectively. Conclusions and Relevance: Among patients with severe COVID-19, treatment with a 7-day fixed-dose course of hydrocortisone or shock-dependent dosing of hydrocortisone, compared with no hydrocortisone, resulted in 93% and 80% probabilities of superiority with regard to the odds of improvement in organ support-free days within 21 days. However, the trial was stopped early and no treatment strategy met prespecified criteria for statistical superiority, precluding definitive conclusions. Trial Registration: ClinicalTrials.gov Identifier: NCT02735707
Effect of angiotensin-converting enzyme inhibitor and angiotensin receptor blocker initiation on organ support-free days in patients hospitalized with COVID-19
IMPORTANCE Overactivation of the renin-angiotensin system (RAS) may contribute to poor clinical outcomes in patients with COVID-19.
Objective To determine whether angiotensin-converting enzyme (ACE) inhibitor or angiotensin receptor blocker (ARB) initiation improves outcomes in patients hospitalized for COVID-19.
DESIGN, SETTING, AND PARTICIPANTS In an ongoing, adaptive platform randomized clinical trial, 721 critically ill and 58 nonâcritically ill hospitalized adults were randomized to receive an RAS inhibitor or control between March 16, 2021, and February 25, 2022, at 69 sites in 7 countries (final follow-up on June 1, 2022).
INTERVENTIONS Patients were randomized to receive open-label initiation of an ACE inhibitor (nâ=â257), ARB (nâ=â248), ARB in combination with DMX-200 (a chemokine receptor-2 inhibitor; nâ=â10), or no RAS inhibitor (control; nâ=â264) for up to 10 days.
MAIN OUTCOMES AND MEASURES The primary outcome was organ supportâfree days, a composite of hospital survival and days alive without cardiovascular or respiratory organ support through 21 days. The primary analysis was a bayesian cumulative logistic model. Odds ratios (ORs) greater than 1 represent improved outcomes.
RESULTS On February 25, 2022, enrollment was discontinued due to safety concerns. Among 679 critically ill patients with available primary outcome data, the median age was 56 years and 239 participants (35.2%) were women. Median (IQR) organ supportâfree days among critically ill patients was 10 (â1 to 16) in the ACE inhibitor group (nâ=â231), 8 (â1 to 17) in the ARB group (nâ=â217), and 12 (0 to 17) in the control group (nâ=â231) (median adjusted odds ratios of 0.77 [95% bayesian credible interval, 0.58-1.06] for improvement for ACE inhibitor and 0.76 [95% credible interval, 0.56-1.05] for ARB compared with control). The posterior probabilities that ACE inhibitors and ARBs worsened organ supportâfree days compared with control were 94.9% and 95.4%, respectively. Hospital survival occurred in 166 of 231 critically ill participants (71.9%) in the ACE inhibitor group, 152 of 217 (70.0%) in the ARB group, and 182 of 231 (78.8%) in the control group (posterior probabilities that ACE inhibitor and ARB worsened hospital survival compared with control were 95.3% and 98.1%, respectively).
CONCLUSIONS AND RELEVANCE In this trial, among critically ill adults with COVID-19, initiation of an ACE inhibitor or ARB did not improve, and likely worsened, clinical outcomes.
TRIAL REGISTRATION ClinicalTrials.gov Identifier: NCT0273570
Improving Search Efficiency in the Biodiversity Heritage Library Corpus
Biodiversity literature and archival collections are not only indispensable in taxonomic research, they provide crucial information for understanding of museumsâ natural history collections. Literature and archives document collecting events resulting in specimen collections, contain original descriptions based on those specimens, and provide a wealth of other contextual information for the study of life on earth. The Biodiversity Heritage Library is committed to improving research efficiency by providing open access to a growing body of biodiversity literature and archives. While descriptive metadata is widely available for both specimen collections (i.e., DarwinCore) and literature (i.e., MARCXML), connections between the two collection types cannot generally be found at these descriptive levels thus hindering efficient discovery of relevant materials. The integration of name finding services, powered by Global Names Architecture, provides a significant value-add through page-level access to mentions of a given taxon name. Yet how might one search based on a museum code, a common name, or a place name? This presentation will share how BHLâs top technical priorities for 2018 will help facilitate more efficient searching and discovery of information in the pages of the BHL corpus. Specifically, updates on BHLâs top two priorities â implementation of full text search and incorporation of available crowdsourced transcriptionsâwill be covered