Article thumbnail

Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human

By James R. Wagner, Bing Ge, Dmitry Pokholok, Kevin L. Gunderson, Tomi Pastinen and Mathieu Blanchette

Abstract

Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases

Topics: Research Article
Publisher: Public Library of Science
OAI identifier: oai:pubmedcentral.nih.gov:2900287
Provided by: PubMed Central

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.

Suggested articles

Citations

  1. (2008). A genomewide approach to identifying novel-imprinted genes.
  2. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains.
  3. (2005). A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays.
  4. (2007). A second generation human haplotype map of over 3.1 million snps.
  5. (2009). A single-array preprocessing method for estimating full-resolution raw copy numbers from all affymetrix genotyping arrays including genomewideSNP 5 and 6.
  6. (2004). A survey of genetic and epigenetic variation affecting human gene expression.
  7. (1989). A tutorial on hidden markov models and selected applications in speech recognition.
  8. (2003). Allelic variation in gene expression is common in the human genome.
  9. (2006). Analysis of allelic differential expression in human white blood cells.
  10. (2007). Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data.
  11. (2007). Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.
  12. (2004). Cis-acting regulatory variation in the human genome’’.
  13. (2008). CNV discovery using SNP genotyping arrays.
  14. (2008). Computational methods for identification of recurrent copy number alteration patterns by array cgh. Cytogenetic and genome research 123:
  15. (2009). Conditional random pattern algorithm for loh inference and segmentation.
  16. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.
  17. (2008). Estimation and assessment of raw copy numbers at the single locus level.
  18. (2003). Evolution’s cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes.
  19. (2006). Exact and efficient bayesian inference for multiple changepoint problems.
  20. (2007). Flexible and accurate detection of genomic copy-number changes from aCGH.
  21. (2006). Genetics of global gene expression.
  22. (2008). Global differential allelic expression in the human genome: A robust approach to identify genetic and epigenetic cis-acting mechanisms regulating gene expression.
  23. (2009). Global patterns of cis variation in human cells revealed by high-density allelic expression analysis.
  24. (2008). Global validating discovered cis-acting regulatory genetic variants: Application of an allele specific expression approach to hapmap populations.
  25. (2007). Identification and analysis of functional elements in 1% of the human genome by the encode pilot project.
  26. (2006). Integrating copy number polymorphisms into array cgh analysis using a robust hmm.
  27. (1997). Machine Learning.
  28. (2008). Major copy proportion analysis of tumor samples using snp arrays.
  29. (2009). Many human large intergenic noncoding rnas associate with chromatin-modifying complexes and affect gene expression.
  30. (2005). Mapping common regulatory variants to human haplotypes.
  31. (2009). Mapping complex disease traits with global gene expression.
  32. (1977). Maximum likelihood from incomplete data via the em algorithm.
  33. (2008). Missing data imputation and haplotype phase inference for genome-wide association studies.
  34. (2003). Natural variation in human gene expression assessed in lymphoblastoid cells.
  35. (2007). Ncbi reference sequences (refseq): a curated non-redundant sequence database of genomes, transcripts and proteins.
  36. (2007). Olshen A
  37. (2007). Penncnv: An integrated hidden markov model designed for high-resolution copy number variation detection in whole-genome snp genotyping data.
  38. (1998). Profile hidden markov models (review).
  39. (2007). QuantiSNP: an objective bayes hidden-markov model to detect and accurately map copy number variation using snp genotyping data.
  40. (2007). Targeted discovery of novel human exons by comparative genomics.
  41. (2009). Targeted screening of cis-regulatory variation in human haplotypes.
  42. (2002). The human genome browser at ucsc.
  43. (2007). Widespread monoallelic expression on human autosomes.
  44. (2005). X-inactivation profile reveals extensive variability in x-linked gene expression in females.