12 research outputs found

    MIMIC-Extract: A Data Extraction, Preprocessing, and Representation Pipeline for MIMIC-III

    Full text link
    Robust machine learning relies on access to data that can be used with standardized frameworks in important tasks and the ability to develop models whose performance can be reasonably reproduced. In machine learning for healthcare, the community faces reproducibility challenges due to a lack of publicly accessible data and a lack of standardized data processing frameworks. We present MIMIC-Extract, an open-source pipeline for transforming raw electronic health record (EHR) data for critical care patients contained in the publicly-available MIMIC-III database into dataframes that are directly usable in common machine learning pipelines. MIMIC-Extract addresses three primary challenges in making complex health records data accessible to the broader machine learning community. First, it provides standardized data processing functions, including unit conversion, outlier detection, and aggregating semantically equivalent features, thus accounting for duplication and reducing missingness. Second, it preserves the time series nature of clinical data and can be easily integrated into clinically actionable prediction tasks in machine learning for health. Finally, it is highly extensible so that other researchers with related questions can easily use the same pipeline. We demonstrate the utility of this pipeline by showcasing several benchmark tasks and baseline results

    Inhibitor-Sensitive FGFR1 Amplification in Human Non-Small Cell Lung Cancer

    Get PDF
    Background Squamous cell lung carcinomas account for approximately 25% of new lung carcinoma cases and 40,000 deaths per year in the United States. Although there are multiple genomically targeted therapies for lung adenocarcinoma, none has yet been reported in squamous cell lung carcinoma. Methodology/Principal Findings Using SNP array analysis, we found that a region of chromosome segment 8p11-12 containing three genes–WHSC1L1, LETM2, and FGFR1–is amplified in 3% of lung adenocarcinomas and 21% of squamous cell lung carcinomas. Furthermore, we demonstrated that a non-small cell lung carcinoma cell line harboring focal amplification of FGFR1 is dependent on FGFR1 activity for cell growth, as treatment of this cell line either with FGFR1-specific shRNAs or with FGFR small molecule enzymatic inhibitors leads to cell growth inhibition. Conclusions/Significance These studies show that FGFR1 amplification is common in squamous cell lung cancer, and that FGFR1 may represent a promising therapeutic target in non-small cell lung cancer.Novartis Pharmaceuticals CorporationAmerican Lung AssociationUniting Against Lung CancerSara Thomas Monopoli FundSeaman FoundationIndia. Dept. of BiotechnologyNational Lung Cancer Partnershi

    Promoting fit bodies, healthy eating and physical activity among Indigenous Australian men: a study protocol

    Get PDF
    Background: Overall the physical health of Indigenous men is among the worst in Australia. Research has indicated that modifiable lifestyle factors, such as poor nutrition and physical inactivity, appear to contribute strongly to these poor health conditions. To effectively develop and implement strategies to improve the health of Australia&rsquo;s Indigenous peoples, a greater understanding is needed of how Indigenous men perceive health, and how they view and care for their bodies. Further, a more systematic understanding of how sociocultural factors affect their health attitudes and behaviours is needed. This article presents the study protocol of a communitybased investigation into the factors surrounding the health and body image of Indigenous Australian men.Methods and design: The study will be conducted in a collaborative manner with Indigenous Australian men using a participatory action research framework. Men will be recruited from three locations around Australia (metropolitan, regional, and rural) and interviewed to understand their experiences and perspectives on a number of issues related to health and health behaviour. The information that is collected will be analysed using modified grounded theory and thematic analysis. The results will then be used to develop and implement community events in each location to provide feedback on the findings to the community, promote health enhancing strategies, and determine future action and collaboration.Discussion: This study will explore both risk and protective factors that affect the health of Indigenous Australian men. This knowledge will be disseminated to the wider Indigenous community and can be used to inform future health promotion strategies. The expected outcome of this study is therefore an increased understanding of health and health change in Indigenous Australian men, the development of strategies that promote healthy eating and positive patterns of physical activity and, in the longer term, more effective and culturally-appropriate interventions to improve health.<br /

    Widespread Over-Expression of the X Chromosome in Sterile F1 Hybrid Mice

    Get PDF
    The X chromosome often plays a central role in hybrid male sterility between species, but it is unclear if this reflects underlying regulatory incompatibilities. Here we combine phenotypic data with genome-wide expression data to directly associate aberrant expression patterns with hybrid male sterility between two species of mice. We used a reciprocal cross in which F1 males are sterile in one direction and fertile in the other direction, allowing us to associate expression differences with sterility rather than with other hybrid phenotypes. We found evidence of extensive over-expression of the X chromosome during spermatogenesis in sterile but not in fertile F1 hybrid males. Over-expression was most pronounced in genes that are normally expressed after meiosis, consistent with an X chromosome-wide disruption of expression during the later stages of spermatogenesis. This pattern was not a simple consequence of faster evolutionary divergence on the X chromosome, because X-linked expression was highly conserved between the two species. Thus, transcriptional regulation of the X chromosome during spermatogenesis appears particularly sensitive to evolutionary divergence between species. Overall, these data provide evidence for an underlying regulatory basis to reproductive isolation in house mice and underscore the importance of transcriptional regulation of the X chromosome to the evolution of hybrid male sterility

    Comprehensive Rare Variant Analysis via Whole-Genome Sequencing to Determine the Molecular Pathology of Inherited Retinal Disease

    Get PDF
    Inherited retinal disease is a common cause of visual impairment and represents a highly heterogeneous group of conditions. Here, we present findings from a cohort of 722 individuals with inherited retinal disease, who have had whole-genome sequencing (n = 605), whole-exome sequencing (n = 72), or both (n = 45) performed, as part of the NIHR-BioResource Rare Diseases research study. We identified pathogenic variants (single-nucleotide variants, indels, or structural variants) for 404/722 (56%) individuals. Whole-genome sequencing gives unprecedented power to detect three categories of pathogenic variants in particular: structural variants, variants in GC-rich regions, which have significantly improved coverage compared to whole-exome sequencing, and variants in non-coding regulatory regions. In addition to previously reported pathogenic regulatory variants, we have identified a previously unreported pathogenic intronic variant in CHM\textit{CHM} in two males with choroideremia. We have also identified 19 genes not previously known to be associated with inherited retinal disease, which harbor biallelic predicted protein-truncating variants in unsolved cases. Whole-genome sequencing is an increasingly important comprehensive method with which to investigate the genetic causes of inherited retinal disease.This work was supported by The National Institute for Health Research England (NIHR) for the NIHR BioResource – Rare Diseases project (grant number RG65966). The Moorfields Eye Hospital cohort of patients and clinical and imaging data were ascertained and collected with the support of grants from the National Institute for Health Research Biomedical Research Centre at Moorfields Eye Hospital, National Health Service Foundation Trust, and UCL Institute of Ophthalmology, Moorfields Eye Hospital Special Trustees, Moorfields Eye Charity, the Foundation Fighting Blindness (USA), and Retinitis Pigmentosa Fighting Blindness. M.M. is a recipient of an FFB Career Development Award. E.M. is supported by UCLH/UCL NIHR Biomedical Research Centre. F.L.R. and D.G. are supported by Cambridge NIHR Biomedical Research Centre

    Pan-cancer analysis of whole genomes

    Get PDF
    Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale(1-3). Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter(4); identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation(5,6); analyses timings and patterns of tumour evolution(7); describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity(8,9); and evaluates a range of more-specialized features of cancer genomes(8,10-18).Peer reviewe
    corecore