Search CORE

40 research outputs found

Erratum to: Evolving hard problems: generating human genetics datasets with a complex etiology

Author: Casey S. Greene
Daniel S. Himmelstein
DS Himmelstein
Jason H. Moore
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Sci-Hub provides access to nearly all scholarly literature

Author: Greene Casey S
Greshake Tzovaras Bastian
Himmelstein Daniel S
Levernier Jacob G
McLaughlin Stephen Reid
Munro Thomas Anthony
Romero Ariel Rodriguez
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 01/01/2018
Field of study

The website Sci-Hub enables users to download PDF versions of scholarly articles, including many articles that are paywalled at their journal\u27s site. Sci-Hub has grown rapidly since its creation in 2011, but the extent of its coverage was unclear. Here we report that, as of March 2017, Sci-Hub\u27s database contains 68.9% of the 81.6 million scholarly articles registered with Crossref and 85.1% of articles published in toll access journals. We find that coverage varies by discipline and publisher, and that Sci-Hub preferentially covers popular, paywalled content. For toll access articles, we find that Sci-Hub provides greater coverage than the University of Pennsylvania, a major research university in the United States. Green open access to toll access articles via licit services, on the other hand, remains quite limited. Our interactive browser at https://greenelab.github.io/scihub allows users to explore these findings in more detail. For the first time, nearly all scholarly literature is available gratis to anyone with an Internet connection, suggesting the toll access business model may become unsustainable

Deakin Research Online

Is authorship sufficient for today’s collaborative research? A call for contributor roles

Author: Colomb Julien
Edmunds Scott C
Gutzman Karen
Haendel Melissa
Himmelstein Daniel S
Holmes Kristi L
Hosseini Mohammad
Ilik Violeta
Kern Barbara
Mohammadi Ehsan
O'Keefe Lisa
Schneider Juliane
Smith Britton D.
Teplitzky Samantha
Vasilevsky Nicole A
White Marijane
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2020
Field of study

Assigning authorship and recognizing contributions to scholarly works is challenging on many levels. Here we discuss ethical, social, and technical challenges to the concept of authorship that may impede the recognition of contributions to a scholarly work. Recent work in the field of authorship shows that shifting to a more inclusive contributorship approach may address these challenges. Recent efforts to enable better recognition of contributions to scholarship include the development of the Contributor Role Ontology (CRO), which extends the CRediT taxonomy and can be used in information systems for structuring contributions. We also introduce the Contributor Attribution Model (CAM), which provides a simple data model that relates the contributor to research objects via the role that they played, as well as the provenance of the information. Finally, requirements for the adoption of a contributorship-based approach are discussed

Irish Universities

PubMed Central

Scholar Commons - Institutional Repository of the University of South Carolina

Edinburgh Research Explorer

eScholarship - University of California

DCU Online Research Access Service

Is Authorship Sufficient for Today’s Collaborative Research? A Call for Contributor Roles

Author: Colomb Julien
Edmunds Scott C.
Gutzman Karen
Haendel Melissa
Himmelstein Daniel S.
Holmes Kristi L.
Hosseini Mohammad
Ilik Violeta
Kern Barbara
Mohammadi Ehsan
O\u27Keefe Lisa
Schneider Juliane
Smith Britton
Teplitzky Samantha
Vasilevsky Nicole A.
White Marijane
Publication venue: Scholar Commons
Publication date: 01/01/2020
Field of study

Scholar Commons - Institutional Repository of the University of South Carolina

Recommended from our members

The hetnet awakens: understanding complex diseases through data integration and open science

Author: Himmelstein Daniel S.
Publication venue: University of California, San Francisco
Publication date: 01/01/2016
Field of study

Human disease is complex. However, the explosion of biomedical data is providing new opportunities to improve our understanding. My dissertation focused on how to harness the biodata revolution. Broadly, I addressed three questions: how to integrate data, how to extract insights from data, and how to make science more open. To integrate data, we pioneered the hetnet—a network with multiple node and relationship types. After several preludes, we released Hetionet v1.0, which contains 2,250,197 relationships of 24 types. Hetionet encodes the collective knowledge produced by millions of studies over the last half century. To extract insights from data, we developed a machine learning approach for hetnets. In order to predict the probability that an unknown relationship exists, our algorithm identifies influential network patterns. We used the approach to prioritize disease—gene associations and drug repurposing opportunities. By evaluating our predictions on withheld knowledge, we demonstrated the systematic success of our method. After encountering friction that interfered with data integration and rapid communication, I began looking at how to make science more open. The quest led me to explore realtime open notebook science and expose publishing delays at journals as well as the problematic licensing of publicly-funded research data

eScholarship - University of California

ProQuest OAI Repository

Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes.

Author: Daniel S Himmelstein
Sergio E Baranzini
Publication venue: Public Library of Science (PLoS)
Publication date: 01/01/2015
Field of study

The first decade of Genome Wide Association Studies (GWAS) has uncovered a wealth of disease-associated variants. Two important derivations will be the translation of this information into a multiscale understanding of pathogenic variants and leveraging existing data to increase the power of existing and future studies through prioritization. We explore edge prediction on heterogeneous networks--graphs with multiple node and edge types--for accomplishing both tasks. First we constructed a network with 18 node types--genes, diseases, tissues, pathophysiologies, and 14 MSigDB (molecular signatures database) collections--and 19 edge types from high-throughput publicly-available resources. From this network composed of 40,343 nodes and 1,608,168 edges, we extracted features that describe the topology between specific genes and diseases. Next, we trained a model from GWAS associations and predicted the probability of association between each protein-coding gene and each of 29 well-studied complex diseases. The model, which achieved 132-fold enrichment in precision at 10% recall, outperformed any individual domain, highlighting the benefit of integrative approaches. We identified pleiotropy, transcriptional signatures of perturbations, pathways, and protein interactions as influential mechanisms explaining pathogenesis. Our method successfully predicted the results (with AUROC = 0.79) from a withheld multiple sclerosis (MS) GWAS despite starting with only 13 previously associated genes. Finally, we combined our network predictions with statistical evidence of association to propose four novel MS genes, three of which (JAK2, REL, RUNX3) validated on the masked GWAS. Furthermore, our predictions provide biological support highlighting REL as the causal gene within its gene-rich locus. Users can browse all predictions online (http://het.io). Heterogeneous network edge prediction effectively prioritized genetic associations and provides a powerful new approach for data integration across multiple domains

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

FigShare

Archivo Digital UPM

Lung cancer incidence decreases with elevation: evidence for oxygen as an inhaled carcinogen

Author: Daniel S. Himmelstein
Kamen P. Simeonov
Publication venue: PeerJ Inc.
Publication date: 01/01/2015
Field of study

The level of atmospheric oxygen, a driver of free radical damage and tumorigenesis, decreases sharply with rising elevation. To understand whether ambient oxygen plays a role in human carcinogenesis, we characterized age-adjusted cancer incidence (compiled by the National Cancer Institute from 2005 to 2009) across counties of the elevation-varying Western United States and compared trends displayed by respiratory cancer (lung) and non-respiratory cancers (breast, colorectal, and prostate). To adjust for important demographic and cancer-risk factors, 8–12 covariates were considered for each cancer. We produced regression models that captured known risks. Models demonstrated that elevation is strongly, negatively associated with lung cancer incidence (p < 10−16), but not with the incidence of non-respiratory cancers. For every 1,000 m rise in elevation, lung cancer incidence decreased by 7.23 99% CI [5.18–9.29] cases per 100,000 individuals, equivalent to 12.7% of the mean incidence, 56.8. As a predictor of lung cancer incidence, elevation was second only to smoking prevalence in terms of significance and effect size. Furthermore, no evidence of ecological fallacy or of confounding arising from evaluated factors was detected: the lung cancer association was robust to varying regression models, county stratification, and population subgrouping; additionally seven environmental correlates of elevation, such as exposure to sunlight and fine particulate matter, could not capture the association. Overall, our findings suggest the presence of an inhaled carcinogen inherently and inversely tied to elevation, offering epidemiological support for oxygen-driven tumorigenesis. Finally, highlighting the need to consider elevation in studies of lung cancer, we demonstrated that previously reported inverse lung cancer associations with radon and UVB became insignificant after accounting for elevation

Directory of Open Access Journals

PubMed Central

Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes - Table 3

Author: Daniel S. Himmelstein (767582)
Sergio E. Baranzini (261823)
Publication venue
Publication date: 01/01/2015
Field of study

<p>Diseases. Associations were predicted for 29 diseases with at least 10 positives. For these diseases, the number of high-confidence primary (HC-P), high-confidence secondary (HC-S), low-confidence primary (LC-P), and low-confidence secondary associations (LC-S) that were extracted from the GWAS Catalog is indicated.</p><p>Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes - Table 3 </p

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare