57 research outputs found

    Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

    Full text link
    Many of the most commonly explored natural language processing (NLP) information extraction tasks can be thought of as evaluations of declarative knowledge, or fact-based information extraction. Procedural knowledge extraction, i.e., breaking down a described process into a series of steps, has received much less attention, perhaps in part due to the lack of structured datasets that capture the knowledge extraction process from end-to-end. To address this unmet need, we present FlaMB\'e (Flow annotations for Multiverse Biological entities), a collection of expert-curated datasets across a series of complementary tasks that capture procedural knowledge in biomedical texts. This dataset is inspired by the observation that one ubiquitous source of procedural knowledge that is described as unstructured text is within academic papers describing their methodology. The workflows annotated in FlaMB\'e are from texts in the burgeoning field of single cell research, a research area that has become notorious for the number of software tools and complexity of workflows used. Additionally, FlaMB\'e provides, to our knowledge, the largest manually curated named entity recognition (NER) and disambiguation (NED) datasets for tissue/cell type, a fundamental biological entity that is critical for knowledge extraction in the biomedical research domain. Beyond providing a valuable dataset to enable further development of NLP models for procedural knowledge extraction, automating the process of workflow mining also has important implications for advancing reproducibility in biomedical research.Comment: Submitted to NeurIPS 2023 Datasets and Benchmarks Trac

    Exploiting novel valve interstitial cell lines to study calcific aortic valve disease

    Get PDF
    Calcific aortic valve disease (CAVD) involves progressive valve leaflet thickening and severe calcification, impairing leaflet motion. The in vitro calcification of primary rat, human, porcine and bovine aortic valve interstitial cells (VICs) is commonly employed to investigate CAVD mechanisms. However, to date, no published studies have utilised cell lines to investigate this process. The present study has therefore generated and evaluated the calcification potential of immortalized cell lines derived from sheep and rat VICs. Immortalised sheep (SAVIC) and rat (RAVIC) cell lines were produced by transduction with a recombinant lentivirus encoding the Simian virus (SV40) large and small T antigens (sheep), or large T antigen only (rat), which expressed markers of VICs (vimentin and -smooth muscle actin). Calcification was induced in the presence of calcium (Ca; 2.7 mM) in SAVICs (1.9 fold;

    Pathway-based subnetworks enable cross-disease biomarker discovery.

    Get PDF
    Biomarkers lie at the heart of precision medicine. Surprisingly, while rapid genomic profiling is becoming ubiquitous, the development of biomarkers usually involves the application of bespoke techniques that cannot be directly applied to other datasets. There is an urgent need for a systematic methodology to create biologically-interpretable molecular models that robustly predict key phenotypes. Here we present SIMMS (Subnetwork Integration for Multi-Modal Signatures): an algorithm that fragments pathways into functional modules and uses these to predict phenotypes. We apply SIMMS to multiple data types across five diseases, and in each it reproducibly identifies known and novel subtypes, and makes superior predictions to the best bespoke approaches. To demonstrate its ability on a new dataset, we profile 33 genes/nodes of the PI3K pathway in 1734 FFPE breast tumors and create a four-subnetwork prediction model. This model out-performs a clinically-validated molecular test in an independent cohort of 1742 patients. SIMMS is generic and enables systematic data integration for robust biomarker discovery

    Targeted gene sanger sequencing should remain the first-tier genetic test for children suspected to have the five common X-linked inborn errors of immunity

    Get PDF
    DATA AVAILABILITY STATEMENT : The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.To address inborn errors of immunity (IEI) which were underdiagnosed in resource-limited regions, our centre developed and offered free genetic testing for the most common IEI by Sanger sequencing (SS) since 2001. With the establishment of The Asian Primary Immunodeficiency (APID) Network in 2009, the awareness and definitive diagnosis of IEI were further improved with collaboration among centres caring for IEI patients from East and Southeast Asia. We also started to use whole exome sequencing (WES) for undiagnosed cases and further extended our collaboration with centres from South Asia and Africa. With the increased use of Next Generation Sequencing (NGS), we have shifted our diagnostic practice from SS to WES. However, SS was still one of the key diagnostic tools for IEI for the past two decades. Our centre has performed 2,024 IEI SS genetic tests, with in-house protocol designed specifically for 84 genes, in 1,376 patients with 744 identified to have disease-causing mutations (54.1%). The high diagnostic rate after just one round of targeted gene SS for each of the 5 common IEI (X-linked agammaglobulinemia (XLA) 77.4%, Wiskott–Aldrich syndrome (WAS) 69.2%, X-linked chronic granulomatous disease (XCGD) 59.5%, X-linked severe combined immunodeficiency (XSCID) 51.1%, and X-linked hyper-IgM syndrome (HIGM1) 58.1%) demonstrated targeted gene SS should remain the first-tier genetic test for the 5 common X-linked IEI.The Hong Kong Society for Relief of Disabled Children and Jeffrey Modell Foundation.http://www.frontiersin.org/Immunologyam2023Paediatrics and Child Healt

    Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK

    Get PDF
    Background: A safe and efficacious vaccine against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), if deployed with high coverage, could contribute to the control of the COVID-19 pandemic. We evaluated the safety and efficacy of the ChAdOx1 nCoV-19 vaccine in a pooled interim analysis of four trials. Methods: This analysis includes data from four ongoing blinded, randomised, controlled trials done across the UK, Brazil, and South Africa. Participants aged 18 years and older were randomly assigned (1:1) to ChAdOx1 nCoV-19 vaccine or control (meningococcal group A, C, W, and Y conjugate vaccine or saline). Participants in the ChAdOx1 nCoV-19 group received two doses containing 5 × 1010 viral particles (standard dose; SD/SD cohort); a subset in the UK trial received a half dose as their first dose (low dose) and a standard dose as their second dose (LD/SD cohort). The primary efficacy analysis included symptomatic COVID-19 in seronegative participants with a nucleic acid amplification test-positive swab more than 14 days after a second dose of vaccine. Participants were analysed according to treatment received, with data cutoff on Nov 4, 2020. Vaccine efficacy was calculated as 1 - relative risk derived from a robust Poisson regression model adjusted for age. Studies are registered at ISRCTN89951424 and ClinicalTrials.gov, NCT04324606, NCT04400838, and NCT04444674. Findings: Between April 23 and Nov 4, 2020, 23 848 participants were enrolled and 11 636 participants (7548 in the UK, 4088 in Brazil) were included in the interim primary efficacy analysis. In participants who received two standard doses, vaccine efficacy was 62·1% (95% CI 41·0–75·7; 27 [0·6%] of 4440 in the ChAdOx1 nCoV-19 group vs71 [1·6%] of 4455 in the control group) and in participants who received a low dose followed by a standard dose, efficacy was 90·0% (67·4–97·0; three [0·2%] of 1367 vs 30 [2·2%] of 1374; pinteraction=0·010). Overall vaccine efficacy across both groups was 70·4% (95·8% CI 54·8–80·6; 30 [0·5%] of 5807 vs 101 [1·7%] of 5829). From 21 days after the first dose, there were ten cases hospitalised for COVID-19, all in the control arm; two were classified as severe COVID-19, including one death. There were 74 341 person-months of safety follow-up (median 3·4 months, IQR 1·3–4·8): 175 severe adverse events occurred in 168 participants, 84 events in the ChAdOx1 nCoV-19 group and 91 in the control group. Three events were classified as possibly related to a vaccine: one in the ChAdOx1 nCoV-19 group, one in the control group, and one in a participant who remains masked to group allocation. Interpretation: ChAdOx1 nCoV-19 has an acceptable safety profile and has been found to be efficacious against symptomatic COVID-19 in this interim analysis of ongoing clinical trials. Funding: UK Research and Innovation, National Institutes for Health Research (NIHR), Coalition for Epidemic Preparedness Innovations, Bill & Melinda Gates Foundation, Lemann Foundation, Rede D’Or, Brava and Telles Foundation, NIHR Oxford Biomedical Research Centre, Thames Valley and South Midland's NIHR Clinical Research Network, and AstraZeneca

    Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK.

    Get PDF
    BACKGROUND: A safe and efficacious vaccine against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), if deployed with high coverage, could contribute to the control of the COVID-19 pandemic. We evaluated the safety and efficacy of the ChAdOx1 nCoV-19 vaccine in a pooled interim analysis of four trials. METHODS: This analysis includes data from four ongoing blinded, randomised, controlled trials done across the UK, Brazil, and South Africa. Participants aged 18 years and older were randomly assigned (1:1) to ChAdOx1 nCoV-19 vaccine or control (meningococcal group A, C, W, and Y conjugate vaccine or saline). Participants in the ChAdOx1 nCoV-19 group received two doses containing 5 × 1010 viral particles (standard dose; SD/SD cohort); a subset in the UK trial received a half dose as their first dose (low dose) and a standard dose as their second dose (LD/SD cohort). The primary efficacy analysis included symptomatic COVID-19 in seronegative participants with a nucleic acid amplification test-positive swab more than 14 days after a second dose of vaccine. Participants were analysed according to treatment received, with data cutoff on Nov 4, 2020. Vaccine efficacy was calculated as 1 - relative risk derived from a robust Poisson regression model adjusted for age. Studies are registered at ISRCTN89951424 and ClinicalTrials.gov, NCT04324606, NCT04400838, and NCT04444674. FINDINGS: Between April 23 and Nov 4, 2020, 23 848 participants were enrolled and 11 636 participants (7548 in the UK, 4088 in Brazil) were included in the interim primary efficacy analysis. In participants who received two standard doses, vaccine efficacy was 62·1% (95% CI 41·0-75·7; 27 [0·6%] of 4440 in the ChAdOx1 nCoV-19 group vs71 [1·6%] of 4455 in the control group) and in participants who received a low dose followed by a standard dose, efficacy was 90·0% (67·4-97·0; three [0·2%] of 1367 vs 30 [2·2%] of 1374; pinteraction=0·010). Overall vaccine efficacy across both groups was 70·4% (95·8% CI 54·8-80·6; 30 [0·5%] of 5807 vs 101 [1·7%] of 5829). From 21 days after the first dose, there were ten cases hospitalised for COVID-19, all in the control arm; two were classified as severe COVID-19, including one death. There were 74 341 person-months of safety follow-up (median 3·4 months, IQR 1·3-4·8): 175 severe adverse events occurred in 168 participants, 84 events in the ChAdOx1 nCoV-19 group and 91 in the control group. Three events were classified as possibly related to a vaccine: one in the ChAdOx1 nCoV-19 group, one in the control group, and one in a participant who remains masked to group allocation. INTERPRETATION: ChAdOx1 nCoV-19 has an acceptable safety profile and has been found to be efficacious against symptomatic COVID-19 in this interim analysis of ongoing clinical trials. FUNDING: UK Research and Innovation, National Institutes for Health Research (NIHR), Coalition for Epidemic Preparedness Innovations, Bill & Melinda Gates Foundation, Lemann Foundation, Rede D'Or, Brava and Telles Foundation, NIHR Oxford Biomedical Research Centre, Thames Valley and South Midland's NIHR Clinical Research Network, and AstraZeneca

    Safety and efficacy of the ChAdOx1 nCoV-19 vaccine (AZD1222) against SARS-CoV-2: an interim analysis of four randomised controlled trials in Brazil, South Africa, and the UK

    Get PDF
    Background A safe and efficacious vaccine against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), if deployed with high coverage, could contribute to the control of the COVID-19 pandemic. We evaluated the safety and efficacy of the ChAdOx1 nCoV-19 vaccine in a pooled interim analysis of four trials. Methods This analysis includes data from four ongoing blinded, randomised, controlled trials done across the UK, Brazil, and South Africa. Participants aged 18 years and older were randomly assigned (1:1) to ChAdOx1 nCoV-19 vaccine or control (meningococcal group A, C, W, and Y conjugate vaccine or saline). Participants in the ChAdOx1 nCoV-19 group received two doses containing 5 × 1010 viral particles (standard dose; SD/SD cohort); a subset in the UK trial received a half dose as their first dose (low dose) and a standard dose as their second dose (LD/SD cohort). The primary efficacy analysis included symptomatic COVID-19 in seronegative participants with a nucleic acid amplification test-positive swab more than 14 days after a second dose of vaccine. Participants were analysed according to treatment received, with data cutoff on Nov 4, 2020. Vaccine efficacy was calculated as 1 - relative risk derived from a robust Poisson regression model adjusted for age. Studies are registered at ISRCTN89951424 and ClinicalTrials.gov, NCT04324606, NCT04400838, and NCT04444674. Findings Between April 23 and Nov 4, 2020, 23 848 participants were enrolled and 11 636 participants (7548 in the UK, 4088 in Brazil) were included in the interim primary efficacy analysis. In participants who received two standard doses, vaccine efficacy was 62·1% (95% CI 41·0–75·7; 27 [0·6%] of 4440 in the ChAdOx1 nCoV-19 group vs71 [1·6%] of 4455 in the control group) and in participants who received a low dose followed by a standard dose, efficacy was 90·0% (67·4–97·0; three [0·2%] of 1367 vs 30 [2·2%] of 1374; pinteraction=0·010). Overall vaccine efficacy across both groups was 70·4% (95·8% CI 54·8–80·6; 30 [0·5%] of 5807 vs 101 [1·7%] of 5829). From 21 days after the first dose, there were ten cases hospitalised for COVID-19, all in the control arm; two were classified as severe COVID-19, including one death. There were 74 341 person-months of safety follow-up (median 3·4 months, IQR 1·3–4·8): 175 severe adverse events occurred in 168 participants, 84 events in the ChAdOx1 nCoV-19 group and 91 in the control group. Three events were classified as possibly related to a vaccine: one in the ChAdOx1 nCoV-19 group, one in the control group, and one in a participant who remains masked to group allocation. Interpretation ChAdOx1 nCoV-19 has an acceptable safety profile and has been found to be efficacious against symptomatic COVID-19 in this interim analysis of ongoing clinical trials

    Splitpea: quantifying protein interaction network rewiring changes due to alternative splicing in cancer

    No full text
    <p>This serves as the supplementary data repo for the paper, <em><a href="https://doi.org/10.1101/2023.09.04.556262">Splitpea: quantifying protein interaction network rewiring changes due to alternative splicing in cancer</a>. </em></p> <p>The file structure is as follows:</p> <ul> <li>Splitpea rewired PPI networks for individual patient samples (files ending in `patient-rewired-networks.zip`). The networks are stored as dat files, which are tab delimited with one row per edge. Gene ids are reported as entrez id, the edge weights determine the directionality of the interaction (positive weights are potential gains, negative weights are edges likely lost). Chaos edges can have a positive or negative weight but will be indicated by a boolean value in the `chaos` column.</li> <li>The consensus networks (files ending in `consensus_networks`) are separated into two networks one for positive edges and one for negative edges present in at least 80% of tumor samples. These files are tab delimited files with one row per edge in the corresponding undirected weighted network. The following columns are included: <ul> <li>node1, node2: genes incident on the edge as entrez gene IDs</li> <li>mean_weight: average weight of edge across networks (where edge is present) used to build the consensus</li> <li>total_weight: sum of weights across networks (where edge is present) used to build the consensus</li> <li>num_graphs: # of networks where edge was present</li> <li>prop_graphs: # of networks where edge was present / total # of networks used to build the consensus</li> </ul> </li> <li>IRIS.zip: Splicing matrix spliced exon files from IRIS (Pan et al. <a href="https://doi.org/10.1073/pnas.2221116120">https://doi.org/10.1073/pnas.2221116120</a>). These are the initial inputs for the Splitpea run described in the paper.</li> <li>BRCA-psi.zip: precalculated PSI values for breast cancer samples as described in the paper</li> <li>PAAD-psi.zip: precalculated PSI values for pancreatic cancer samples as described in the paper</li> <li>*-centralities.zip: precalculated centralities for each patient network</li> <li>*-pickle.zip: network representations as pickle files</li> </ul&gt

    Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

    No full text
    <p>This data repository accompanies the GitHub repository (https://github.com/ylaboratory/flambe) for the paper, <a href="https://arxiv.org/abs/2309.01812"><i>Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts.</i></a></p><p>The high level file structure is as follows:</p><ul><li><strong>data</strong>: contains processed datasets for BioNLP tasks</li><li><strong>models</strong>: fine-tuned PubmedBERT PyTorch models for tissue and cell type tagging</li></ul><p>The data section is further divided into sections depending on downstream use cases:</p><ul><li><strong>corpus</strong>: the text for 55 full papers from PubMed and PMC</li><li><strong>disambiguation</strong>: all files used for downstream disambiguation of tissue, cell type, and software terms</li><li><strong>sentiment</strong>: files for tool context prediction (similar to sentiment classification)</li><li><strong>tags</strong>: contains IOB and CoNLL tag files for fine-tuning BERT-based models for tissue and cell type tagging, as well as software tagging.</li><li><strong>workflow</strong>: 3 files of curated tuples for various tool and workflow extraction tasks</li></ul><h3>Annotation file formats</h3><p>In this section we describe in detail the various file formats of the accessory files and main annotation files: IOB, CoNLL, disambiguation, and workflow files.</p><h4>IOB files</h4><p>Files ending in .iob follow the <a href="https://en.wikipedia.org/wiki/Inside–outside–beginning_(tagging)">Inside-outside-beginning</a> tagging format. These files are tab-delimited text files made with the SpaCy English tokenizer having one token per line followed by a tag signifying a named entity. Unlike traditional IOB files, we include additional lines that mark the start and end of papers or abstracts. These lines contain the PMID or PMC identifier in the token column and the words <i>begin</i> or <i>end</i> in the tag column.</p><h4>CoNLL files</h4><p>CoNLL files, like the IOB files have tokenized text for both full text and abstracts, but are augmented with additional information such as disambiguated terms and identifiers. Unlike the IOB files, which cover the entire abstract and full text corpus, we release one CoNLL per paper.</p><h4>Licensing files</h4><p>Each paper has its own license and usage agreements. We keep track of these licenses for our collection of full text and abstract papers. Each file is indexed either by PubMed Central (pmc) identifiers (in the case of full text), or PubMed ids (pmid). These files can be found in the `data` directory ending in `_licenses.txt`.</p><h4>Disambiguation files</h4><p>Tissues and cell types are disambiguated to the <a href="https://www.ebi.ac.uk/ols/ontologies/ncit">NCI Thesaurus</a>. In the `tissue_ned_table.txt` file we take tokens that were present in the full text and abstract files and map them to NCIT identifiers. An additional file `NCI_thesaurus_info.txt` contains the relevant identifiers, names, aliases, and descriptions for the `tissue`, `organ`, `body part`, `fluid`, and `cell type` branches of the ontology.</p><p>Tools are manually disambiguated to a standardized name or acronym taken from their initial paper. In `tool_ned_table.txt` we map tokens present in the full text and abstract files to these standardized names. The file `tools_info.txt` maps these standardized names to project websites (personal or GitHub links) and to the original publication when available. The `uns_method_ned.txt` is a tab delimited file that maps generic (unspecified) method tokens present in the full text and abstract files to standardized method names. Where applicable we link the method to a wikipedia or library page (e.g., scikit-learn).</p><h4>Workflow files</h4><p>Workflow files are presented as three tab delimited files of tuples.</p><ul><li>`sample` file links any experimental assay (e.g., RNA-seq, single cell RNA-seq, ChIP-seq) with tissue and cell type annotations</li><li>`tools_applied` file joins samples, tools, and with a standardized description of how the tool is applied (context / mode)</li><li>`sequence` file captures the pairwise ordering of applied tools and their contexts</li></ul><p>Each of the three files start each new line with PMC identifiers linking defined annotations with relevant papers. Furthermore, the `sample` and `tools_applied` files have sequential id numbers within each PMC for the extraction of unambiguous sample workflows. When one sample in the `sample` file can be described with multiple tissue and cell type annotations we tie it back to the same sequential sample identifier.</p><p>We constrain the set of tool contexts / modes to the following list of 23 actions:</p><blockquote><p>Alignment, Alternative Splicing, Batch Correction, Classification, CNV calling, Clustering, Deconvolution, Differential Expression, Dimensionality Reduction, Gene Enrichment / Gene set analysis, Integration, Imputation, Marker Genes / Feature Selection, Networks, Normalization, Quality Control, Quantification, Rare Cell Identification, Simulation, TCR, Tree Inference, Visualization, Variable Genes</p></blockquote><p> </p&gt

    A Germ Line Mutation in the Death Domain of DAPK-1 Inactivates ERK-induced Apoptosis

    No full text
    p53 is activated genetically by a set of kinases that are components of the calcium calmodulin kinase superfamily, including CHK2, AMP kinase, and DAPK-1. In dissecting the mechanism of DAPK-1 control, a novel mutation (N1347S) was identified in the death domain of DAPK-1. The N1347S mutation prevented the death domain module binding stably to ERK in vitro and in vivo. Gel filtration demonstrated that the N1347S mutation disrupted the higher order oligomeric nature of the purified recombinant death domain miniprotein. Accordingly, the N1347S death domain module is defective in vivo in the formation of high molecular weight oligomeric intermediates after cross-linking with ethylene glycol bis(succinimidylsuccinate). Full-length DAPK-1 protein harboring a N1347S mutation in the death domain was also defective in binding to ERK in cells and was defective in formation of an ethylene glycol bis(succinimidylsuccinate)-cross-linked intermediate in vivo. Full-length DAPK-1 encoding the N1347S mutation was attenuated in tumor necrosis factor receptor-induced apoptosis. However, the N1347S mutation strikingly prevented ERK:DAPK-1-dependent apoptosis as defined by poly(ADP-ribose) polymerase cleavage, Annexin V staining, and terminal deoxynucleotidyl transferase-mediated dUTP nick end labeling imaging. Significant penetrance of the N1347S allele was identified in normal genomic DNA indicating the mutation is germ line, not tumor derived. The frequency observed in genomic DNA was from 37 to 45% for homozygous wild-type, 41 to 47% for heterozygotes, and 12 to 15% for homozygous mutant. These data highlight a naturally occurring DAPK-1 mutation that alters the oligomeric structure of the death domain, de-stabilizes DAPK-1 binding to ERK, and prevents ERK:DAPK-1-dependent apoptosis
    corecore