854 research outputs found

    Missing value imputation for epistatic MAPs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Epistatic miniarray profiling (E-MAPs) is a high-throughput approach capable of quantifying aggravating or alleviating genetic interactions between gene pairs. The datasets resulting from E-MAP experiments typically take the form of a symmetric pairwise matrix of interaction scores. These datasets have a significant number of missing values - up to 35% - that can reduce the effectiveness of some data analysis techniques and prevent the use of others. An effective method for imputing interactions would therefore increase the types of possible analysis, as well as increase the potential to identify novel functional interactions between gene pairs. Several methods have been developed to handle missing values in microarray data, but it is unclear how applicable these methods are to E-MAP data because of their pairwise nature and the significantly larger number of missing values. Here we evaluate four alternative imputation strategies, three local (Nearest neighbor-based) and one global (PCA-based), that have been modified to work with symmetric pairwise data.</p> <p>Results</p> <p>We identify different categories for the missing data based on their underlying cause, and show that values from the largest category can be imputed effectively. We compare local and global imputation approaches across a variety of distinct E-MAP datasets, showing that both are competitive and preferable to filling in with zeros. In addition we show that these methods are effective in an E-MAP from a different species, suggesting that pairwise imputation techniques will be increasingly useful as analogous epistasis mapping techniques are developed in different species. We show that strongly alleviating interactions are significantly more difficult to predict than strongly aggravating interactions. Finally we show that imputed interactions, generated using nearest neighbor methods, are enriched for annotations in the same manner as measured interactions. Therefore our method potentially expands the number of mapped epistatic interactions. In addition we make implementations of our algorithms available for use by other researchers.</p> <p>Conclusions</p> <p>We address the problem of missing value imputation for E-MAPs, and suggest the use of symmetric nearest neighbor based approaches as they offer consistently accurate imputations across multiple datasets in a tractable manner.</p

    Improved functional overview of protein complexes using inferred epistatic relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Epistatic Miniarray Profiling(E-MAP) quantifies the net effect on growth rate of disrupting pairs of genes, often producing phenotypes that may be more (negative epistasis) or less (positive epistasis) severe than the phenotype predicted based on single gene disruptions. Epistatic interactions are important for understanding cell biology because they define relationships between individual genes, and between sets of genes involved in biochemical pathways and protein complexes. Each E-MAP screen quantifies the interactions between a logically selected subset of genes (e.g. genes whose products share a common function). Interactions that occur between genes involved in different cellular processes are not as frequently measured, yet these interactions are important for providing an overview of cellular organization.</p> <p>Results</p> <p>We introduce a method for combining overlapping E-MAP screens and inferring new interactions between them. We use this method to infer with high confidence 2,240 new strongly epistatic interactions and 34,469 weakly epistatic or neutral interactions. We show that accuracy of the predicted interactions approaches that of replicate experiments and that, like measured interactions, they are enriched for features such as shared biochemical pathways and knockout phenotypes. We constructed an expanded epistasis map for yeast cell protein complexes and show that our new interactions increase the evidence for previously proposed inter-complex connections, and predict many new links. We validated a number of these in the laboratory, including new interactions linking the SWR-C chromatin modifying complex and the nuclear transport apparatus.</p> <p>Conclusion</p> <p>Overall, our data support a modular model of yeast cell protein network organization and show how prediction methods can considerably extend the information that can be extracted from overlapping E-MAP screens.</p

    Integration of genetic and genomics resources in einkorn wheat enables precision mapping of important traits

    Full text link
    Einkorn wheat (Triticum monococcum) is an ancient grain crop and a close relative of the diploid progenitor (T. urartu) of polyploid wheat. It is the only diploid wheat species having both domesticated and wild forms and therefore provides an excellent system to identify domestication genes and genes for traits of interest to utilize in wheat improvement. Here, we leverage genomic advancements for einkorn wheat using an einkorn reference genome assembly combined with skim-sequencing of a large genetic population of 812 recombinant inbred lines (RILs) developed from a cross between a wild and a domesticated T. monococcum accession. We identify 15,919 crossover breakpoints delimited to a median and average interval of 114 Kbp and 219 Kbp, respectively. This high-resolution mapping resource enables us to perform fine-scale mapping of one qualitative (red coleoptile) and one quantitative (spikelet number per spike) trait, resulting in the identification of small physical intervals (400 Kb to 700 Kb) with a limited number of candidate genes. Furthermore, an important domestication locus for brittle rachis is also identified on chromosome 7A. This resource presents an exciting route to perform trait discovery in diploid wheat for agronomically important traits and their further deployment in einkorn as well as tetraploid pasta wheat and hexaploid bread wheat cultivars

    Towards accurate imputation of quantitative genetic interactions

    Get PDF
    Recent technological breakthroughs have enabled high-throughput quantitative measurements of hundreds of thousands of genetic interactions among hundreds of genes in Saccharomyces cerevisiae. However, these assays often fail to measure the genetic interactions among up to 40% of the studied gene pairs. Here we present a novel method, which combines genetic interaction data together with diverse genomic data, to quantitatively impute these missing interactions. We also present data on almost 190,000 novel interactions.Tel Aviv University. Edmond J, Safra Bioinformatics CenterIsrael Science Foundation (grant no. 802/08)Raymond and Beverley Sackler Foundatio

    Exhaustive search for epistatic effects on the human methylome

    Get PDF
    Studies assessing the existence and magnitude of epistatic effects on complex human traits provide inconclusive results. The study of such effects is complicated by considerable increase in computational burden, model complexity, and model uncertainty, which in concert decrease model stability. An additional source introducing significant uncertainty with regard to the detection of robust epistasis is the biological distance between the genetic variation and the trait under study. Here we studied CpG methylation, a genetically complex molecular trait that is particularly close to genomic variation, and performed an exhaustive search for two-locus epistatic effects on the CpG-methylation signal in two cohorts of healthy young subjects. We detected robust epistatic effects for a small number of CpGs (N = 404). Our results indicate that epistatic effects explain only a minor part of variation in DNA-CpG methylation. Interestingly, these CpGs were more likely to be associated with gene-expression of nearby genes, as also shown by their overrepresentation in DNase I hypersensitivity sites and underrepresentation in CpG islands. Finally, gene ontology analysis showed a significant enrichment of these CpGs in pathways related to HPV-infection and cancer

    Genetic control and geo-climate adaptation of pod dehiscence provide novel insights into the soybean domestication and expansion

    Get PDF
    Loss of pod dehiscence is a key step during soybean [Glycine max (L.) Merr.] domestication. Genome-wide association analysis for soybean shattering identified loci harboring Pdh1, NST1A and SHAT1-5. Pairwise epistatic interactions were observed, and the dehiscent Pdh1 overcomes the resistance conferred by NST1A or SHAT1-5 locus, indicating that Pdh1 predominates pod dehiscence expression. Further candidate gene association analysis identified a nonsense mutation in NST1A associated with pod dehiscence. Allele composition and population differential analyses unraveled that Pdh1 and NST1A, but not SHAT1-5, underwent domestication and modern breeding selections. Geographic analysis showed that in Northeast China (NEC), indehiscence at both Pdh1 and NST1A were required by cultivated soybean; while indehiscent Pdh1 alone is capable of coping shattering in Huang-Huai-Hai (HHH) valleys where it originated; and no specific indehiscence was required in Southern China (SC). Geo-climatic investigation revealed strong correlation between relative humidity and frequency of indehiscent Pdh1 across China. This study demonstrates that the epistatic interaction between Pdh1 and NST1A fulfills a pivotal role in determining the level of resistance against pod dehiscence. Humidity shapes the distribution of indehiscent alleles. Our results also suggest that HHH valleys, not NEC, was at least one of the origin centers of cultivated soybean.Comment: 17 pages 8 figure

    Minimum Epistasis Interpolation for Sequence-Function Relationships

    Get PDF
    Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G

    Additive and non-additive genetic variance in juvenile Sitka spruce (Picea sitchensis Bong. Carr)

    Get PDF
    Many quantitative genetic models assume that all genetic variation is additive because of a lack of data with sufficient structure and quality to determine the relative contribution of additive and non-additive variation. Here the fractions of additive (fa) and non-additive (fd) genetic variation were estimated in Sitka spruce for height, bud burst and pilodyn penetration depth. Approximately 1500 offspring were produced in each of three sib families and clonally replicated across three geographically diverse sites. Genotypes from 1525 offspring from all three families were obtained by RADseq, followed by imputation using 1630 loci segregating in all families and mapped using the newly developed linkage map of Sitka spruce. The analyses employed a new approach for estimating fa and fd, which combined all available genotypic and phenotypic data with spatial modelling for each trait and site. The consensus estimate for fa increased with age for height from 0.58 at 2 years to 0.75 at 11 years, with only small overlap in 95% support intervals (I95). The estimated fa for bud burst was 0.83 (I95=[0.78, 0.90]) and 0.84 (I95=[0.77, 0.92]) for pilodyn depth. Overall, there was no evidence of family heterogeneity for height or bud burst, or site heterogeneity for pilodyn depth, and no evidence of inbreeding depression associated with genomic homozygosity, expected if dominance variance was the major component of non-additive variance. The results offer no support for the development of sublines for crossing within the species. The models give new opportunities to assess more accurately the scale of non-additive variation
    corecore