1,064 research outputs found
Dissecting Trait Heterogeneity: a Comparison of Three Clustering Methods Applied to Genotypic Data
Background: Trait heterogeneity, which exists when a trait has been defined with insufficient specificity such that it is actually two or more distinct traits, has been implicated as a confounding factor in traditional statistical genetics of complex hu man disease. In the absence of de tailed phenotypic data collected consistently in combination with genetic data, unsupervised computational methodologies offer the potential for discovering underlying trait heteroge neity. The performance of three such methods – Bayesian Classification, Hyperg raph-Based Clustering, and Fuzzy k -Modes Clustering – appropriate for categorical data were comp ared. Also tested was the ability of these methods to detect trait heterogeneity in the presence of locus heteroge neity and/or gene-gene interaction , which are two other complicating factors in discovering genetic models of complex human disease. To dete rmine the efficacy of applying the Bayesian Classification method to re al data, the reliability of its intern al clustering metr ics at finding good clusterings was evaluated using permutation testing. Results: Bayesian Classifica tion outperformed the other two method s, with the exception that the Fuzzy k -Modes Clustering performed best on the most comp lex genetic model. Bayesian Classificati on achieved excellent recovery for 75% of the da tasets simulated under the simplest genetic model, while it achieved moderate recovery for 56% of datase ts with a sample size of 500 or more (across all simulated models) and for 86% of datasets with 10 or fewer nonfuncti onal loci (across all si mulated models). Neither Hypergraph Clustering nor Fuzzy k -Modes Clustering achieved good or excellent cluster recovery for a majority of datasets even under a re stricted set of conditions. When usin g the average log of class strength as the internal clustering metric, th e false positive rate was controlled very well, at three percent or less for all three significance levels (0. 01, 0.05, 0.10), and the false negative rate was acceptably low (18 percent) for the least stringent sign ificance level of 0.10. Conclusion: Bayesian Classificati on shows promise as an unsuper vised computational method for dissecting trait hetero geneity in genotypic data. Its control of fa lse positive and false negative rates lends confidence to the validity of its results. Further investigation of how differ ent parameter settings may improve the performance of Bayesian Classification, especi ally under more comp lex genetic models, is ongoing
Recommended from our members
An ECOOP web portal for visualising and comparing distributed coastal oceanography model and in situ data
As part of a large European coastal operational oceanography project (ECOOP), we have developed a web portal for the display and comparison of model and in situ marine data. The distributed model and in situ datasets are accessed via an Open Geospatial Consortium Web Map Service (WMS) and Web Feature Service (WFS) respectively. These services were developed independently and readily integrated for the purposes of the ECOOP project, illustrating the ease of interoperability resulting from adherence to international standards. The key feature of the portal is the ability to display co-plotted timeseries of the in situ and model data and the quantification of misfits between the two. By using standards-based web technology we allow the user to quickly and easily explore over twenty model data feeds and compare these with dozens of in situ data feeds without being concerned with the low level details of differing file formats or the physical location of the data. Scientific and operational benefits to this work include model validation, quality control of observations, data assimilation and decision support in near real time. In these areas it is essential to be able to bring different data streams together from often disparate locations
Towards Improved Forecasts of Atmospheric and Oceanic Circulations over the Complex Terrain of the Eastern Mediterranean
Forecasting atmospheric and oceanic circulations accurately over the Eastern Mediterranean has proved to be an exceptional challenge. The existence of fine-scale topographic variability (land/sea coverage) and seasonal dynamics variations can create strong spatial gradients in temperature, wind and other state variables, which numerical models may have difficulty capturing. The Hellenic Center for Marine Research (HCMR) is one of the main operational centers for wave forecasting in the eastern Mediterranean. Currently, HCMR's operational numerical weather/ocean prediction model is based on the coupled Eta/Princeton Ocean Model (POM). Since 1999, HCMR has also operated the POSEIDON floating buoys as a means of state-of-the-art, real-time observations of several oceanic and surface atmospheric variables. This study attempts a first assessment at improving both atmospheric and oceanic prediction by initializing a regional Numerical Weather Prediction (NWP) model with high-resolution sea surface temperatures (SST) from remotely sensed platforms in order to capture the small-scale characteristics
A Multi-Season Study of the Effects of MODIS Sea-Surface Temperatures on Operational WRF Forecasts at NWS Miami, FL
Studies at the Short-term Prediction Research and Transition (SPORT) Center have suggested that the use of Moderate Resolution Imaging Spectroradiometer (MODIS) sea-surface temperature (SST) composites in regional weather forecast models can have a significant positive impact on short-term numerical weather prediction in coastal regions. Recent work by LaCasse et al (2007, Monthly Weather Review) highlights lower atmospheric differences in regional numerical simulations over the Florida offshore waters using 2-km SST composites derived from the MODIS instrument aboard the polar-orbiting Aqua and Terra Earth Observing System satellites. To help quantify the value of this impact on NWS Weather Forecast Offices (WFOs), the SPORT Center and the NWS WFO at Miami, FL (MIA) are collaborating on a project to investigate the impact of using the high-resolution MODIS SST fields within the Weather Research and Forecasting (WRF) prediction system. The project's goal is to determine whether more accurate specification of the lower-boundary forcing within WRF will result in improved land/sea fluxes and hence, more accurate evolution of coastal mesoscale circulations and the associated sensible weather elements. The NWS MIA is currently running WRF in real-time to support daily forecast operations, using the National Centers for Environmental Prediction Nonhydrostatic Mesoscale Model dynamical core within the NWS Science and Training Resource Center's Environmental Modeling System (EMS) software. Twenty-seven hour forecasts are run dally initialized at 0300, 0900, 1500, and 2100 UTC on a domain with 4-km grid spacing covering the southern half of Florida and adjacent waters of the Gulf of Mexico and Atlantic Ocean. Each model run is initialized using the Local Analysis and Prediction System (LAPS) analyses available in AWIPS. The SSTs are initialized with the NCEP Real-Time Global (RTG) analyses at 1/12deg resolution (approx.9 km); however, the RTG product does not exhibit fine-scale details consistent with its grid resolution. SPORT is conducting parallel WRF EMS runs identical to the operational runs at NWS MIA except for the use of MODIS SST composites in place of the RTG product as the initial and boundary conditions over water, The MODIS SST composites for initializing the SPORT WRF runs are generated on a 2-km grid four times daily at 0400, 0700, 1600, and 1900 UTC, based on the times of the overhead passes of the Aqua and Terra satellites. The incorporation of the MODIS SST data into the SPORT WRF runs is staggered such that SSTs are updated with a new composite every six hours in each of the WRF runs. From mid-February to July 2007, over 500 parallel WRF simulations have been collected for analysis and verification. This paper will present verification results comparing the NWS MIA operational WRF runs to the SPORT experimental runs, and highlight any substantial differences noted in the predicted mesoscale phenomena for specific cases
Recommended from our members
Lack of Association of Polymorphisms in Homocysteine Metabolism Genes with Pseudoexfoliation Syndrome and Glaucoma
Purpose: To evaluate genes involved in homocysteine metabolism as secondary risk factors for pseudoexfoliation syndrome (PXFS) and the associated glaucoma (PXFG). Methods: One hundred eighty-six unrelated patients with PXFS, including 140 patients with PXFG and 127 unrelated control subjects were recruited from the Massachusetts Eye and Ear Infirmary. All the patients and controls were Caucasian of European ancestry. Seventeen tag SNPs from 5 genes (methylenetetrahydrofolate reductase [MTHFR], methionine synthase [MTR], methionine synthase reductase [MTRR], methylenetetrahydrofolate dehydrogenase [MTHFD1], and cystathionine β-synthase [CBS]) were genotyped. Single-SNP association was analyzed using Fisher’s exact test (unconditional) or logistic regression after conditioning on the effects of age and three LOXL1 SNPs (rs1048661, rs3825942, and rs2165241). Interaction analysis was performed between the homocysteine and LOXL1 SNPs using logistic regression. Haplotype analysis and the set-based test were used to test for association of individual genes. Multiple comparisons were corrected using the Bonferroni method. Results: One SNP (rs8006686) in MTHFD1 showed a nominally significant association with PXFG (p=0.015, OR=2.23). None of the seventeen SNPs tested were significantly associated with PXFS or PXFG after correcting for multiple comparisons (Bonferroni corrected p>0.25). After controlling for the effects of age and three associated LOXL1 SNPs, none of the seventeen tested SNPs were associated with PXFS (p>0.12). No significant interaction effects on PXFS were identified between the homocysteine and LOXL1 SNPs (p>0.06). Haplotype analysis and the set-based test did not find significant association of individual genes with PXFS (p>0.23 and 0.20, respectively). Conclusions: Five genes that are critical components of the homocysteine metabolism pathway were evaluated as secondary factors for PXFS and PXFG. Our results suggest that these genes are not significant risk factors for the development of these conditions
SNPs in Multi-Species Conserved Sequences (MCS) as useful markers in association studies: a practical approach
<p>Abstract</p> <p>Background</p> <p>Although genes play a key role in many complex diseases, the specific genes involved in most complex diseases remain largely unidentified. Their discovery will hinge on the identification of key sequence variants that are conclusively associated with disease. While much attention has been focused on variants in protein-coding DNA, variants in noncoding regions may also play many important roles in complex disease by altering gene regulation. Since the vast majority of noncoding genomic sequence is of unknown function, this increases the challenge of identifying "functional" variants that cause disease. However, evolutionary conservation can be used as a guide to indicate regions of noncoding or coding DNA that are likely to have biological function, and thus may be more likely to harbor SNP variants with functional consequences. To help bias marker selection in favor of such variants, we devised a process that prioritizes annotated SNPs for genotyping studies based on their location within Multi-species Conserved Sequences (MCSs) and used this process to select SNPs in a region of linkage to a complex disease. This allowed us to evaluate the utility of the chosen SNPs for further association studies. Previously, a region of chromosome 1q43 was linked to Multiple Sclerosis (MS) in a genome-wide screen. We chose annotated SNPs in the region based on location within MCSs (termed MCS-SNPs). We then obtained genotypes for 478 MCS-SNPs in 989 individuals from MS families.</p> <p>Results</p> <p>Analysis of our MCS-SNP genotypes from the 1q43 region and comparison to HapMap data confirmed that annotated SNPs in MCS regions are frequently polymorphic and show subtle signatures of selective pressure, consistent with previous reports of genome-wide variation in conserved regions. We also present an online tool that allows MCS data to be directly exported to the UCSC genome browser so that MCS-SNPs can be easily identified within genomic regions of interest.</p> <p>Conclusion</p> <p>Our results showed that MCS can easily be used to prioritize markers for follow-up and candidate gene association studies. We believe that this novel approach demonstrates a paradigm for expediting the search for genes contributing to complex diseases.</p
Author Correction: A population-specific reference panel empowers genetic studies of Anabaptist populations.
A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has not been fixed in the paper
Enabling genomic-phenomic association discovery without sacrificing anonymity
Health information technologies facilitate the collection of massive quantities of patient-level data. A growing body of research demonstrates that such information can support novel, large-scale biomedical investigations at a fraction of the cost of traditional prospective studies. While healthcare organizations are being encouraged to share these data in a de-identified form, there is hesitation over concerns that it will allow corresponding patients to be re-identified. Currently proposed technologies to anonymize clinical data may make unrealistic assumptions with respect to the capabilities of a recipient to ascertain a patients identity. We show that more pragmatic assumptions enable the design of anonymization algorithms that permit the dissemination of detailed clinical profiles with provable guarantees of protection. We demonstrate this strategy with a dataset of over one million medical records and show that 192 genotype-phenotype associations can be discovered with fidelity equivalent to non-anonymized clinical data
- …