81 research outputs found

    ParMap, an Algorithm for the Identification of Complex Genomic Variations in Nextgen Sequencing Data

    Get PDF
    Next-generation sequencing produces high-throughput data, albeit with greater error and shorter reads than traditional Sanger sequencing methods. This complicates the detection of genomic variations, especially, small insertions and deletions. Here we describe ParMap, a statistical algorithm for the identification of complex genetic variants using partially mapped reads in nextgen sequencing data. We also report ParMap’s successful application to the mutation analysis of chromosome X exome-captured leukemia DNA samples

    Data-driven discovery of seasonally linked diseases from an Electronic Health Records system

    Get PDF
    Background: Patterns of disease incidence can identify new risk factors for the disease or provide insight into the etiology. For example, allergies and infectious diseases have been shown to follow periodic temporal patterns due to seasonal changes in environmental or infectious agents. Previous work searching for seasonal or other temporal patterns in disease diagnosis rates has been limited both in the scope of the diseases examined and in the ability to distinguish unexpected seasonal patterns. Electronic Health Records (EHR) compile extensive longitudinal clinical information, constituting a unique source for discovery of trends in occurrence of disease. However, the data suffer from inherent biases that preclude an identification of temporal trends. Methods: Motivated by observation of the biases in this data source, we developed a method (Lomb-Scargle periodograms in detrended data, LSP-detrend) to find periodic patterns by adjusting the temporal information for broad trends in incidence, as well as seasonal changes in total hospitalizations. LSP-detrend can sensitively uncover periodic temporal patterns in the corrected data and identify the significance of the trend. We apply LSP-detrend to a compilation of records from 1.5 million patients encoded by ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification), including 2,805 disorders with more than 500 occurrences across a 12 year period, recorded from 1.5 million patients. Results and conclusions: Although EHR data, and ICD-9 coded records in particular, were not created with the intention of aggregated use for research, these data can in fact be mined for periodic patterns in incidence of disease, if confounders are properly removed. Of all diagnoses, around 10% are identified as seasonal by LSP-detrend, including many known phenomena. We robustly reproduce previous findings, even for relatively rare diseases. For instance, Kawasaki disease, a rare childhood disease that has been associated with weather patterns, is detected as strongly linked with winter months. Among the novel results, we find a bi-annual increase in exacerbations of myasthenia gravis, a potentially life threatening complication of an autoimmune disease. We dissect the causes of this seasonal incidence and propose that factors predisposing patients to this event vary through the year

    Understanding the Origins of a Pandemic Virus

    Get PDF
    Understanding the pre-pandemic conditions and the origin of infectious diseases provides scientifically based rationales for implementing public health measures that help to avoid, or at least to mitigate, future epidemics. The recent ancestors of a pandemic virus provide an invaluable information about the set of minimal genomic alterations that transformed a zoonotic agent into a full human pandemic. Since the first confirmed cases of the H1N1pdm virus in the spring of 2009 several hypotheses about the strain's origins have been proposed. However, how, where, and when it first infected humans is still far from clear. The only way to piece together such an epidemiological puzzle relies on the collective effort of the international scientific community to increase genomic sequencing of influenza isolates, especially ones collected in the months prior to the origin of the pandemic

    Differences in Patient Age Distribution between Influenza A Subtypes

    Get PDF
    Since the spring of 1977, two subtypes of influenza A virus (H3N2 and H1N1) have been seasonally infecting the human population. In this work we study the distribution of patient ages within the populations that exhibit the symptomatic disease caused by each of the different subtypes of influenza virus. When the publicly available extensive information is pooled across multiple geographical locations and seasons, striking differences emerge between these subtypes. We report that the symptomatic flu due to H3N2 is distributed across all age groups, whereas H1N1 causes symptomatic disease mainly in a younger population. These distinct characteristic spectra of age groups, possibly carried over from previous pandemics, are consistent with previous findings on the evolutionary dynamics of each subtype. Moreover, they are relevant to age-related risk assessments, modeling of epidemiological networks for specific age groups, and age-specific vaccine design. Recently, a novel H1N1 virus has spread around the world. Preliminary reports suggest that this new strain causes symptomatic disease in the younger population in a similar fashion to the seasonal H1N1 strains

    Discovering Disease Associations by Integrating Electronic Clinical Data and Medical Literature

    Get PDF
    Electronic health record (EHR) systems offer an exceptional opportunity for studying many diseases and their associated medical conditions within a population. The increasing number of clinical record entries that have become available electronically provides access to rich, large sets of patients' longitudinal medical information. By integrating and comparing relations found in the EHRs with those already reported in the literature, we are able to verify existing and to identify rare or novel associations. Of particular interest is the identification of rare disease co-morbidities, where the small numbers of diagnosed patients make robust statistical analysis difficult. Here, we introduce ADAMS, an Application for Discovering Disease Associations using Multiple Sources, which contains various statistical and language processing operations. We apply ADAMS to the New York-Presbyterian Hospital's EHR to combine the information from the relational diagnosis tables and textual discharge summaries with those from PubMed and Wikipedia in order to investigate the co-morbidities of the rare diseases Kaposi sarcoma, toxoplasmosis, and Kawasaki disease. In addition to finding well-known characteristics of diseases, ADAMS can identify rare or previously unreported associations. In particular, we report a statistically significant association between Kawasaki disease and diagnosis of autistic disorder

    Signs of the 2009 Influenza Pandemic in the New York-Presbyterian Hospital Electronic Health Records

    Get PDF
    Background In June of 2009, the World Health Organization declared the first influenza pandemic of the 21st century, and by July, New York City's New York-Presbyterian Hospital (NYPH) experienced a heavy burden of cases, attributable to a novel strain of the virus (H1N1pdm). Methods and Results We present the signs in the NYPH electronic health records (EHR) that distinguished the 2009 pandemic from previous seasonal influenza outbreaks via various statistical analyses. These signs include (1) an increase in the number of patients diagnosed with influenza, (2) a preponderance of influenza diagnoses outside of the normal flu season, and (3) marked vaccine failure. The NYPH EHR also reveals distinct age distributions of patients affected by seasonal influenza and the pandemic strain, and via available longitudinal data, suggests that the two may be associated with distinct sets of comorbid conditions as well. In particular, we find significantly more pandemic flu patients with diagnoses associated with asthma and underlying lung disease. We further observe that the NYPH EHR is capable of tracking diseases at a resolution as high as particular zip codes in New York City. Conclusion The NYPH EHR permits early detection of pandemic influenza and hypothesis generation via identification of those significantly associated illnesses. As data standards develop and databases expand, EHRs will contribute more and more to disease detection and the discovery of novel disease associations
    • …
    corecore