35 research outputs found

    A genome-wide association analysis of Framingham Heart Study longitudinal data using multivariate adaptive splines

    Get PDF
    The Framingham Heart Study is a well known longitudinal cohort study. In recent years, the community-based Framingham Heart Study has embarked on genome-wide association studies. In this paper, we present a Framingham Heart Study genome-wide analysis for fasting triglycerides trait in the Genetic Analysis Workshop16 Problem 2 using multivariate adaptive splines for the analysis of longitudinal data (MASAL). With MASAL, we are able to perform analysis of genome-wide data with longitudinal phenotypes and covariates, making it possible to identify genes, gene-gene, and gene-environment (including time) interactions associated with the trait of interest. We conducted a permutation test to assess the associations between MASAL selected markers and triglycerides trait and report significant gene-gene and gene-environment interaction effects on the trait of interest

    Memory management in genome-wide association studies

    Get PDF
    Genome-wide association is a powerful tool for the identification of genes that underlie common diseases. Genome-wide association studies generate billions of genotypes and pose significant computational challenges for most users including limited computer memory. We applied a recently developed memory management tool to two analyses of North American Rheumatoid Arthritis Consortium studies and measured the performance in terms of central processing unit and memory usage. We conclude that our memory management approach is simple, efficient, and effective for genome-wide association studies

    Ultraspecific probes for high throughput HLA typing

    Get PDF
    BACKGROUND:The variations within an individual's HLA (Human Leukocyte Antigen) genes have been linked to many immunological events, e.g. susceptibility to disease, response to vaccines, and the success of blood, tissue, and organ transplants. Although the microarray format has the potential to achieve high-resolution typing, this has yet to be attained due to inefficiencies of current probe design strategies.RESULTS:We present a novel three-step approach for the design of high-throughput microarray assays for HLA typing. This approach first selects sequences containing the SNPs present in all alleles of the locus of interest and next calculates the number of base changes necessary to convert a candidate probe sequences to the closest subsequence within the set of sequences that are likely to be present in the sample including the remainder of the human genome in order to identify those candidate probes which are "ultraspecific" for the allele of interest. Due to the high specificity of these sequences, it is possible that preliminary steps such as PCR amplification are no longer necessary. Lastly, the minimum number of these ultraspecific probes is selected such that the highest resolution typing can be achieved for the minimal cost of production. As an example, an array was designed and in silico results were obtained for typing of the HLA-B locus.CONCLUSION:The assay presented here provides a higher resolution than has previously been developed and includes more alleles than previously considered. Based upon the in silico and preliminary experimental results, we believe that the proposed approach can be readily applied to any highly polymorphic gene system.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]

    Intra-Tumoral Heterogeneity of HER2, FGFR2, cMET and ATM in Gastric Cancer: Optimizing Personalized Healthcare through Innovative Pathological and Statistical Analysis.

    No full text
    Current drug development efforts on gastric cancer are directed against several molecular targets driving the growth of this neoplasm. Intra-tumoral biomarker heterogeneity however, commonly observed in gastric cancer, could lead to biased selection of patients. MET, ATM, FGFR2, and HER2 were profiled on gastric cancer biopsy samples. An innovative pathological assessment was performed through scoring of individual biopsies against whole biopsies from a single patient to enable heterogeneity evaluation. Following this, false negative risks for each biomarker were estimated in silico. 166 gastric cancer cases with multiple biopsies from single patients were collected from Shanghai Renji Hospital. Following pre-set criteria, 56 ~ 78% cases showed low, 15 ~ 35% showed medium and 0 ~ 11% showed high heterogeneity within the biomarkers profiled. If 3 biopsies were collected from a single patient, the false negative risk for detection of the biomarkers was close to 5% (exception for FGFR2: 12.2%). When 6 biopsies were collected, the false negative risk approached 0%. Our study demonstrates the benefit of multiple biopsy sampling when considering personalized healthcare biomarker strategy, and provides an example to address the challenge of intra-tumoral biomarker heterogeneity using alternative pathological assessment and statistical methods

    Data-Driven Information Extraction from Chinese Electronic Medical Records.

    No full text
    This study aims to propose a data-driven framework that takes unstructured free text narratives in Chinese Electronic Medical Records (EMRs) as input and converts them into structured time-event-description triples, where the description is either an elaboration or an outcome of the medical event.Our framework uses a hybrid approach. It consists of constructing cross-domain core medical lexica, an unsupervised, iterative algorithm to accrue more accurate terms into the lexica, rules to address Chinese writing conventions and temporal descriptors, and a Support Vector Machine (SVM) algorithm that innovatively utilizes Normalized Google Distance (NGD) to estimate the correlation between medical events and their descriptions.The effectiveness of the framework was demonstrated with a dataset of 24,817 de-identified Chinese EMRs. The cross-domain medical lexica were capable of recognizing terms with an F1-score of 0.896. 98.5% of recorded medical events were linked to temporal descriptors. The NGD SVM description-event matching achieved an F1-score of 0.874. The end-to-end time-event-description extraction of our framework achieved an F1-score of 0.846.In terms of named entity recognition, the proposed framework outperforms state-of-the-art supervised learning algorithms (F1-score: 0.896 vs. 0.886). In event-description association, the NGD SVM is superior to SVM using only local context and semantic features (F1-score: 0.874 vs. 0.838).The framework is data-driven, weakly supervised, and robust against the variations and noises that tend to occur in a large corpus. It addresses Chinese medical writing conventions and variations in writing styles through patterns used for discovering new terms and rules for updating the lexica
    corecore