122 research outputs found

    Caipirini: using gene sets to rank literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Keeping up-to-date with bioscience literature is becoming increasingly challenging. Several recent methods help meet this challenge by allowing literature search to be launched based on lists of abstracts that the user judges to be 'interesting'. Some methods go further by allowing the user to provide a second input set of 'uninteresting' abstracts; these two input sets are then used to search and rank literature by relevance. In this work we present the service 'Caipirini' (<url>http://caipirini.org</url>) that also allows two input sets, but takes the novel approach of allowing ranking of literature based on one or more sets of genes.</p> <p>Results</p> <p>To evaluate the usefulness of Caipirini, we used two test cases, one related to the human cell cycle, and a second related to disease defense mechanisms in <it>Arabidopsis thaliana</it>. In both cases, the new method achieved high precision in finding literature related to the biological mechanisms underlying the input data sets.</p> <p>Conclusions</p> <p>To our knowledge Caipirini is the first service enabling literature search directly based on biological relevance to gene sets; thus, Caipirini gives the research community a new way to unlock hidden knowledge from gene sets derived via high-throughput experiments.</p

    Synergistic Association of PTGS2 and CYP2E1 Genetic Polymorphisms with Lung Cancer Risk in Northeastern Chinese

    Get PDF
    BACKGROUND: Lung cancer is the most common cause of cancer-related deaths worldwide. The aim of this study was to investigate the association of five extensively-studied polymorphisms in PTGS2 (rs689466, rs5275, rs20417) and CYP2E1 (rs2031920, rs6413432) genes with lung cancer risk in a large northeastern Chinese population. METHODOLOGY/PRINCIPAL FINDINGS: This is a hospital-based case-control study involving 684 patients with lung cancer and 604 cancer-free controls. Genotyping was performed using the PCR-LDR method. Data were analyzed using Haplo.stats and MDR programs. There were significant differences between patients and controls in allele/genotype distributions of rs5275 (P = 0.002/0.003) and rs6413432 (P = 0.037/0.044), as well as in genotype distributions of rs689466 (P = 0.02). The risk for lung cancer associated with the rs5275-C mutant allele was decreased by 60% (95% CI [confidence interval]: 0.21-0.74; P = 0.004) under the recessive model. Carriers of rs689466-G mutant allele had a 28% (95% CI: 0.57-0.92; P = 0.008) reduced risk of developing lung cancer relative to the AA genotype carriers. In haplotype analysis, haplotype G-C-C-T (in order of rs689466, rs5275, rs2031920 and rs6413432) decreased the odds of lung cancer by 28% (95% CI: 0.51-0.93; P = 0.019) after adjusting for confounding factors, whereas haplotype A-T-T-T had 1.49-fold (95% CI: 1.21-1.79; P = 0.012) increased risk for lung cancer. Using MDR method, the overall best model including rs5275, rs689466 and rs6413432 polymorphisms was identified with a maximal testing accuracy of 66.1% and a maximal cross-validation consistency of 10 out of 10 (P = 0.003). CONCLUSIONS/SIGNIFICANCE: Our findings demonstrated a potentially synergistic association of PTGS2 and CYP2E1 polymorphisms with the underlying cause of lung cancer in northeastern Chinese

    A General Framework for Formal Tests of Interaction after Exhaustive Search Methods with Applications to MDR and MDR-PDT

    Get PDF
    The initial presentation of multifactor dimensionality reduction (MDR) featured cross-validation to mitigate over-fitting, computationally efficient searches of the epistatic model space, and variable construction with constructive induction to alleviate the curse of dimensionality. However, the method was unable to differentiate association signals arising from true interactions from those due to independent main effects at individual loci. This issue leads to problems in inference and interpretability for the results from MDR and the family-based compliment the MDR-pedigree disequilibrium test (PDT). A suggestion from previous work was to fit regression models post hoc to specifically evaluate the null hypothesis of no interaction for MDR or MDR-PDT models. We demonstrate with simulation that fitting a regression model on the same data as that analyzed by MDR or MDR-PDT is not a valid test of interaction. This is likely to be true for any other procedure that searches for models, and then performs an uncorrected test for interaction. We also show with simulation that when strong main effects are present and the null hypothesis of no interaction is true, that MDR and MDR-PDT reject at far greater than the nominal rate. We also provide a valid regression-based permutation test procedure that specifically tests the null hypothesis of no interaction, and does not reject the null when only main effects are present. The regression-based permutation test implemented here conducts a valid test of interaction after a search for multilocus models, and can be applied to any method that conducts a search to find a multilocus model representing an interaction

    A New Methodology to Associate SNPs with Human Diseases According to Their Pathway Related Context

    Get PDF
    Genome-wide association studies (GWAS) with hundreds of żthousands of single nucleotide polymorphisms (SNPs) are popular strategies to reveal the genetic basis of human complex diseases. Despite many successes of GWAS, it is well recognized that new analytical approaches have to be integrated to achieve their full potential. Starting with a list of SNPs, found to be associated with disease in GWAS, here we propose a novel methodology to devise functionally important KEGG pathways through the identification of genes within these pathways, where these genes are obtained from SNP analysis. Our methodology is based on functionalization of important SNPs to identify effected genes and disease related pathways. We have tested our methodology on WTCCC Rheumatoid Arthritis (RA) dataset and identified: i) previously known RA related KEGG pathways (e.g., Toll-like receptor signaling, Jak-STAT signaling, Antigen processing, Leukocyte transendothelial migration and MAPK signaling pathways); ii) additional KEGG pathways (e.g., Pathways in cancer, Neurotrophin signaling, Chemokine signaling pathways) as associated with RA. Furthermore, these newly found pathways included genes which are targets of RA-specific drugs. Even though GWAS analysis identifies 14 out of 83 of those drug target genes; newly found functionally important KEGG pathways led to the discovery of 25 out of 83 genes, known to be used as drug targets for the treatment of RA. Among the previously known pathways, we identified additional genes associated with RA (e.g. Antigen processing and presentation, Tight junction). Importantly, within these pathways, the associations between some of these additionally found genes, such as HLA-C, HLA-G, PRKCQ, PRKCZ, TAP1, TAP2 and RA were verified by either OMIM database or by literature retrieved from the NCBI PubMed module. With the whole-genome sequencing on the horizon, we show that the full potential of GWAS can be achieved by integrating pathway and network-oriented analysis and prior knowledge from functional properties of a SNP

    Three-Dimensional Geometric Analysis of Felid Limb Bone Allometry

    Get PDF
    Studies of bone allometry typically use simple measurements taken in a small number of locations per bone; often the midshaft diameter or joint surface area is compared to body mass or bone length. However, bones must fulfil multiple roles simultaneously with minimum cost to the animal while meeting the structural requirements imposed by behaviour and locomotion, and not exceeding its capacity for adaptation and repair. We use entire bone volumes from the forelimbs and hindlimbs of Felidae (cats) to investigate regional complexities in bone allometry.Computed tomographic (CT) images (16435 slices in 116 stacks) were made of 9 limb bones from each of 13 individuals of 9 feline species ranging in size from domestic cat (Felis catus) to tiger (Panthera tigris). Eleven geometric parameters were calculated for every CT slice and scaling exponents calculated at 5% increments along the entire length of each bone. Three-dimensional moments of inertia were calculated for each bone volume, and spherical radii were measured in the glenoid cavity, humeral head and femoral head. Allometry of the midshaft, moments of inertia and joint radii were determined. Allometry was highly variable and related to local bone function, with joint surfaces and muscle attachment sites generally showing stronger positive allometry than the midshaft.Examining whole bones revealed that bone allometry is strongly affected by regional variations in bone function, presumably through mechanical effects on bone modelling. Bone's phenotypic plasticity may be an advantage during rapid evolutionary divergence by allowing exploitation of the full size range that a morphotype can occupy. Felids show bone allometry rather than postural change across their size range, unlike similar-sized animals

    The Impact of Phenocopy on the Genetic Analysis of Complex Traits

    Get PDF
    A consistent debate is ongoing on genome-wide association studies (GWAs). A key point is the capability to identify low-penetrance variations across the human genome. Among the phenomena reducing the power of these analyses, phenocopy level (PE) hampers very seriously the investigation of complex diseases, as well known in neurological disorders, cancer, and likely of primary importance in human ageing. PE seems to be the norm, rather than the exception, especially when considering the role of epigenetics and environmental factors towards phenotype. Despite some attempts, no recognized solution has been proposed, particularly to estimate the effects of phenocopies on the study planning or its analysis design. We present a simulation, where we attempt to define more precisely how phenocopy impacts on different analytical methods under different scenarios. With our approach the critical role of phenocopy emerges, and the more the PE level increases the more the initial difficulty in detecting gene-gene interactions is amplified. In particular, our results show that strong main effects are not hampered by the presence of an increasing amount of phenocopy in the study sample, despite progressively reducing the significance of the association, if the study is sufficiently powered. On the opposite, when purely epistatic effects are simulated, the capability of identifying the association depends on several parameters, such as the strength of the interaction between the polymorphic variants, the penetrance of the polymorphism and the alleles (minor or major) which produce the combined effect and their frequency in the population. We conclude that the neglect of the possible presence of phenocopies in complex traits heavily affects the analysis of their genetic data

    High-Order SNP Combinations Associated with Complex Diseases: Efficient Discovery, Statistical Power and Functional Interactions

    Get PDF
    There has been increased interest in discovering combinations of single-nucleotide polymorphisms (SNPs) that are strongly associated with a phenotype even if each SNP has little individual effect. Efficient approaches have been proposed for searching two-locus combinations from genome-wide datasets. However, for high-order combinations, existing methods either adopt a brute-force search which only handles a small number of SNPs (up to few hundreds), or use heuristic search that may miss informative combinations. In addition, existing approaches lack statistical power because of the use of statistics with high degrees-of-freedom and the huge number of hypotheses tested during combinatorial search. Due to these challenges, functional interactions in high-order combinations have not been systematically explored. We leverage discriminative-pattern-mining algorithms from the data-mining community to search for high-order combinations in case-control datasets. The substantially improved efficiency and scalability demonstrated on synthetic and real datasets with several thousands of SNPs allows the study of several important mathematical and statistical properties of SNP combinations with order as high as eleven. We further explore functional interactions in high-order combinations and reveal a general connection between the increase in discriminative power of a combination over its subsets and the functional coherence among the genes comprising the combination, supported by multiple datasets. Finally, we study several significant high-order combinations discovered from a lung-cancer dataset and a kidney-transplant-rejection dataset in detail to provide novel insights on the complex diseases. Interestingly, many of these associations involve combinations of common variations that occur in small fractions of population. Thus, our approach is an alternative methodology for exploring the genetics of rare diseases for which the current focus is on individually rare variations

    A Network-Based Approach to Prioritize Results from Genome-Wide Association Studies

    Get PDF
    Genome-wide association studies (GWAS) are a valuable approach to understanding the genetic basis of complex traits. One of the challenges of GWAS is the translation of genetic association results into biological hypotheses suitable for further investigation in the laboratory. To address this challenge, we introduce Network Interface Miner for Multigenic Interactions (NIMMI), a network-based method that combines GWAS data with human protein-protein interaction data (PPI). NIMMI builds biological networks weighted by connectivity, which is estimated by use of a modification of the Google PageRank algorithm. These weights are then combined with genetic association p-values derived from GWAS, producing what we call ‘trait prioritized sub-networks.’ As a proof of principle, NIMMI was tested on three GWAS datasets previously analyzed for height, a classical polygenic trait. Despite differences in sample size and ancestry, NIMMI captured 95% of the known height associated genes within the top 20% of ranked sub-networks, far better than what could be achieved by a single-locus approach. The top 2% of NIMMI height-prioritized sub-networks were significantly enriched for genes involved in transcription, signal transduction, transport, and gene expression, as well as nucleic acid, phosphate, protein, and zinc metabolism. All of these sub-networks were ranked near the top across all three height GWAS datasets we tested. We also tested NIMMI on a categorical phenotype, Crohn’s disease. NIMMI prioritized sub-networks involved in B- and T-cell receptor, chemokine, interleukin, and other pathways consistent with the known autoimmune nature of Crohn’s disease. NIMMI is a simple, user-friendly, open-source software tool that efficiently combines genetic association data with biological networks, translating GWAS findings into biological hypotheses
    corecore