372 research outputs found
SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values
BACKGROUND: With the recent advances in high-throughput genotyping technologies that allow for large-scale association mapping of human complex traits, promising statistical designs and methods have been emerging. Efficient simulation software are key elements for the evaluation of the properties of new statistical tests. SLINK is a flexible simulation tool that has been widely used to generate the segregation and recombination processes of markers linked to, and possibly associated with, a trait locus, conditional on trait values in arbitrary pedigrees. In practice, its most serious limitation is the small number of loci that can be simulated, since the complexity of the algorithm scales exponentially with this number. RESULTS: I describe the implementation of a two-step algorithm to be used in conjunction with SLINK to enable the simulation of a large number of marker loci linked to a trait locus and conditional on trait values in families, with the possibility for the loci to be in linkage disequilibrium. SLINK is used in the first step to simulate genotypes at the trait locus conditional on the observed trait values, and also to generate an indicator of the descent path of the simulated alleles. In the second step, marker alleles or haplotypes are generated in the founders, conditional on the trait locus genotypes simulated in the first step. Then the recombination process between the marker loci takes place conditionally on the descent path and on the trait locus genotypes. This two-step implementation is often computationally faster than other software that are designed to generate marker data linked to, and possibly associated with, a trait locus. CONCLUSION: Because the proposed method uses SLINK to simulate the segregation process, it benefits from its flexibility: the trait may be qualitative with the possibility of defining different liability classes (which allows for the simulation of gene-environment interactions or even the simulation of multi-locus effects between unlinked susceptibility regions) or it may be quantitative and normally distributed. In particular, this implementation is the only one available that can generate a large number of marker loci conditional on the set of observed quantitative trait values in pedigrees
A novel approach to simulate gene-environment interactions in complex diseases
Background: Complex diseases are multifactorial traits caused by both genetic and environmental factors. They represent the major part of human diseases and include those with largest prevalence and mortality (cancer, heart disease, obesity, etc.). Despite a large amount of information that has been collected about both genetic and environmental risk factors, there are few examples of studies on their interactions in epidemiological literature. One reason can be the incomplete knowledge of the power of statistical methods designed to search for risk factors and their interactions in these data sets. An improvement in this direction would lead to a better understanding and description of gene-environment interactions. To this aim, a possible strategy is to challenge the different statistical methods against data sets where the underlying phenomenon is completely known and fully controllable, for example simulated ones.
Results: We present a mathematical approach that models gene-environment interactions. By this method it is possible to generate simulated populations having gene-environment interactions of any form, involving any number of genetic and environmental factors and also allowing non-linear interactions as epistasis. In particular, we implemented a simple version of this model in a Gene-Environment iNteraction Simulator (GENS), a tool designed to simulate case-control data sets where a one gene-one environment interaction influences the disease risk. The main aim has been to allow the input of population characteristics by using standard epidemiological measures and to implement constraints to make the simulator behaviour biologically meaningful.
Conclusions: By the multi-logistic model implemented in GENS it is possible to simulate case-control samples of complex disease where gene-environment interactions influence the disease risk. The user has full control of the main characteristics of the simulated population and a Monte Carlo process allows random variability. A knowledge-based approach reduces the complexity of the mathematical model by using reasonable biological constraints and makes the simulation more understandable in biological terms. Simulated data sets can be used for the assessment of novel statistical methods or for the evaluation of the statistical power when designing a study
Generating samples for association studies based on HapMap data
<p>Abstract</p> <p>Background</p> <p>With the completion of the HapMap project, a variety of computational algorithms and tools have been proposed for haplotype inference, tag SNP selection and genome-wide association studies. Simulated data are commonly used in evaluating these new developed approaches. In addition to simulations based on population models, empirical data generated by perturbing real data, has also been used because it may inherit specific properties from real data. However, there is no tool that is publicly available to generate large scale simulated variation data by taking into account knowledge from the HapMap project.</p> <p>Results</p> <p>A computer program (<it>gs</it>) was developed to quickly generate a large number of samples based on real data that are useful for a variety of purposes, including evaluating methods for haplotype inference, tag SNP selection and association studies. Two approaches have been implemented to generate dense SNP haplotype/genotype data that share similar local <it>linkage disequilibrium </it>(LD) patterns as those in human populations. The first approach takes haplotype pairs from samples as inputs, and the second approach takes patterns of haplotype block structures as inputs. Both quantitative and qualitative traits have been incorporated in the program. Phenotypes are generated based on a disease model, or based on the effect of a quantitative trait nucleotide, both of which can be specified by users. In addition to single-locus disease models, two-locus disease models have also been implemented that can incorporate any degree of epistasis. Users are allowed to specify all nine parameters in a 3 × 3 penetrance table. For several commonly used two-locus disease models, the program can automatically calculate penetrances based on the population prevalence and marginal effects of a disease that users can conveniently specify.</p> <p>Conclusion</p> <p>The program <it>gs </it>can effectively generate large scale genetic and phenotypic variation data that can be used for evaluating new developed approaches. It is freely available from the authors' web site at <url>http://www.eecs.case.edu/~jxl175/gs.html</url>.</p
The optimization of in vitro high-throughput chemical lysis of Escherichia coli. Application to ACP domain of the polyketide synthase ppsC from Mycobacterium tuberculosis
Protein production in Escherichia coli involves high-level expression in a culture, followed by harvesting of the cells and finally their disruption, or lysis, to release the expressed proteins. We compare three high-throughput chemical lysis methods to sonication, using a robotic platform and methodologies developed in our laboratory [1]. Under the same expression conditions, all lysis methods varied in the degree of released soluble proteins. With a set of 96 test proteins, we used our split GFP to quantify the soluble and insoluble protein fractions after lysis. Both the amount of soluble protein and the percentage recovered in the soluble fraction using SoluLyse® were well correlated with sonication. Two other methods, Bugbuster® and lysozyme, did not correlate well with sonication. Considering the effects of lysis methods on protein solubility is especially important when accurate protein solubility measurements are needed, for example, when testing adjuvants, growth media, temperature, or when establishing the effects of truncation or sequence variation on protein stability
New Molecular Reporters for Rapid Protein Folding Assays
The GFP folding reporter assay [1] uses a C-terminal GFP fusion to report on the folding success of upstream fused polypeptides. The GFP folding assay is widely-used for screening protein variants with improved folding and solubility [2]–[8], but truncation artifacts may arise during evolution, i.e. from de novo internal ribosome entry sites [9]. One way to reduce such artifacts would be to insert target genes within the scaffolding of GFP circular permuted variants. Circular permutants of fluorescent proteins often misfold and are non-fluorescent, and do not readily tolerate fused polypeptides within the fluorescent protein scaffolding [10]–[12]. To overcome these limitations, and to increase the dynamic range for reporting on protein misfolding, we have created eight GFP insertion reporters with different sensitivities to protein misfolding using chimeras of two previously described GFP variants, the GFP folding reporter [1] and the robustly-folding “superfolder” GFP [13]. We applied this technology to engineer soluble variants of Rv0113, a protein from Mycobacterium tuberculosis initially expressed as inclusion bodies in Escherichia coli. Using GFP insertion reporters with increasing stringency for each cycle of mutagenesis and selection led to a variant that produced large amounts of soluble protein at 37°C in Escherichia coli. The new reporter constructs discriminate against truncation artifacts previously isolated during directed evolution of Rv0113 using the original C-terminal GFP folding reporter. Using GFP insertion reporters with variable stringency should prove useful for engineering protein variants with improved folding and solubility, while reducing the number of artifacts arising from internal cryptic ribosome initiation sites
Variation at the Calpain 3 gene is associated with meat tenderness in zebu and composite breeds of cattle
<p>Abstract</p> <p>Background</p> <p>Quantitative Trait Loci (QTL) affecting meat tenderness have been reported on Bovine chromosome 10. Here we examine variation at the Calpain 3 (<it>CAPN3</it>) gene in cattle, a gene located within the confidence interval of the QTL, and which is a positional candidate gene based on the biochemical activity of the protein.</p> <p>Results</p> <p>We identified single nucleotide polymorphisms (SNP) in the genomic sequence of the <it>CAPN3 </it>gene and tested three of these in a sample of 2189 cattle. Of the three SNP genotyped, the <it>CAPN3:c.1538+225G>T </it>had the largest significant additive effect, with an allele substitution effect in the Brahman of <it>α </it>= -0.144 kg, SE = 0.060, <it>P </it>= 0.016, and the polymorphism explained 1.7% of the residual phenotypic variance in that sample of the breed. Significant haplotype substitution effects were found for all three breeds, the Brahman, the Belmont Red, and the Santa Gertrudis. For the common haplotype, the haplotype substitution effect in the Brahman was <it>α </it>= 0.169 kg, SE = 0.056, <it>P </it>= 0.003. The effect of this gene was compared to Calpastatin in the same sample. The SNP show negligible frequencies in taurine breeds and low to moderate minor allele frequencies in zebu or composite animals.</p> <p>Conclusion</p> <p>These associations confirm the location of a QTL for meat tenderness in this region of bovine chromosome 10. SNP in or near this gene may be responsible for part of the overall difference between taurine and zebu breeds in meat tenderness, and the greater variability in meat tenderness found in zebu and composite breeds. The evidence provided so far suggests that none of these tested SNP are causative mutations.</p
Extended Haplotypes in the Growth Hormone Releasing Hormone Receptor Gene (GHRHR) Are Associated with Normal Variation in Height
Mutations in the gene for growth hormone releasing hormone receptor (GHRHR) cause isolated growth hormone deficiency (IGHD) but this gene has not been found to affect normal variation in height. We performed a whole genome linkage analysis for height in a population from northern Sweden and identified a region on chromosome 7 with a lod-score of 4.7. The GHRHR gene is located in this region and typing of tagSNPs identified a haplotype that is associated with height (p = 0.00077) in the original study population. Analysis of a sample from an independent population from the most northern part of Sweden also showed an association with height (p = 0.0039) but with another haplotype in the GHRHR gene. Both haplotypes span the 3′ part of the GHRHR gene, including the region in which most of the mutations in IGHD have been located. The effect size of these haplotypes are larger than that of any gene previously associated with height, which indicates that GHRHR might be one of the most important genes so far identified affecting normal variation in human height
Gene Flow between the Korean Peninsula and Its Neighboring Countries
SNP markers provide the primary data for population structure analysis. In this study, we employed whole-genome autosomal SNPs as a marker set (54,836 SNP markers) and tested their possible effects on genetic ancestry using 320 subjects covering 24 regional groups including Northern ( = 16) and Southern ( = 3) Asians, Amerindians ( = 1), and four HapMap populations (YRI, CEU, JPT, and CHB). Additionally, we evaluated the effectiveness and robustness of 50K autosomal SNPs with various clustering methods, along with their dependencies on recombination hotspots (RH), linkage disequilibrium (LD), missing calls and regional specific markers. The RH- and LD-free multi-dimensional scaling (MDS) method showed a broad picture of human migration from Africa to North-East Asia on our genome map, supporting results from previous haploid DNA studies. Of the Asian groups, the East Asian group showed greater differentiation than the Northern and Southern Asian groups with respect to Fst statistics. By extension, the analysis of monomorphic markers implied that nine out of ten historical regions in South Korea, and Tokyo in Japan, showed signs of genetic drift caused by the later settlement of East Asia (South Korea, Japan and China), while Gyeongju in South East Korea showed signs of the earliest settlement in East Asia. In the genome map, the gene flow to the Korean Peninsula from its neighboring countries indicated that some genetic signals from Northern populations such as the Siberians and Mongolians still remain in the South East and West regions, while few signals remain from the early Southern lineages
The Cysteine-Rich Interdomain Region from the Highly Variable Plasmodium falciparum Erythrocyte Membrane Protein-1 Exhibits a Conserved Structure
Plasmodium falciparum malaria parasites, living in red blood cells, express proteins of the erythrocyte membrane protein-1 (PfEMP1) family on the red blood cell surface. The binding of PfEMP1 molecules to human cell surface receptors mediates the adherence of infected red blood cells to human tissues. The sequences of the 60 PfEMP1 genes in each parasite genome vary greatly from parasite to parasite, yet the variant PfEMP1 proteins maintain receptor binding. Almost all parasites isolated directly from patients bind the human CD36 receptor. Of the several kinds of highly polymorphic cysteine-rich interdomain region (CIDR) domains classified by sequence, only the CIDR1α domains bind CD36. Here we describe the CD36-binding portion of a CIDR1α domain, MC179, as a bundle of three α-helices that are connected by a loop and three additional helices. The MC179 structure, containing seven conserved cysteines and 10 conserved hydrophobic residues, predicts similar structures for the hundreds of CIDR sequences from the many genome sequences now known. Comparison of MC179 with the CIDR domains in the genome of the P. falciparum 3D7 strain provides insights into CIDR domain structure. The CIDR1α three-helix bundle exhibits less than 20% sequence identity with the three-helix bundles of Duffy-binding like (DBL) domains, but the two kinds of bundles are almost identical. Despite the enormous diversity of PfEMP1 sequences, the CIDR1α and DBL protein structures, taken together, predict that a PfEMP1 molecule is a polymer of three-helix bundles elaborated by a variety of connecting helices and loops. From the structures also comes the insight that DBL1α domains are approximately 100 residues larger and that CIDR1α domains are approximately 100 residues smaller than sequence alignments predict. This new understanding of PfEMP1 structure will allow the use of better-defined PfEMP1 domains for functional studies, for the design of candidate vaccines, and for understanding the molecular basis of cytoadherence
- …