2,516 research outputs found
Algorithms for comparing large pedigree graphs
The importance of pedigrees is translated by geneticists as a tool for diagnosing genetic diseases. Errors resulting during collection of data and missing information of individuals are considered obstacles in deducing pedigrees, especially larger ones. Therefore, the reconstructed pedigree graph evaluation needs to be undertaken for relevant diagnosis. This requires a comparison between the derived and the original data. The present study discusses the isomorphism of huge pedigrees with labeled and unlabeled leaves, where a pedigree has hundreds of families, which are monogamous and generational. The algorithms presented in this paper are based on a set of bipartite graphs covering the pedigree and the problem addressed is parameter tractable. The Bipartite graphs Covering the Pedigree (BCP) problem is said to possess a time complexity of where is the computing function that grows exponentially. The study presents an algorithm for the BCP problem that can be categorized as a polynomial-time-tractable evaluation of the reconstructed pedigree. The paper considers pedigree graphs that consist of both labeled and unlabeled leaves that make use of parameterized and kernelization algorithms to solve the problem. The kernelization algorithm executes in for the BCP graphs
Conflation of short identity-by-descent segments bias their inferred length distribution
Identity-by-descent (IBD) is a fundamental concept in genetics with many
applications. In a common definition, two haplotypes are said to contain an IBD
segment if they share a segment that is inherited from a recent shared common
ancestor without intervening recombination. Long IBD segments (> 1cM) can be
efficiently detected by a number of algorithms using high-density SNP array
data from a population sample. However, these approaches detect IBD based on
contiguous segments of identity-by-state, and such segments may exist due to
the conflation of smaller, nearby IBD segments. We quantified this effect using
coalescent simulations, finding that nearly 40% of inferred segments 1-2cM long
are results of conflations of two or more shorter segments, under demographic
scenarios typical for modern humans. This biases the inferred IBD segment
length distribution, and so can affect downstream inferences. We observed this
conflation effect universally across different IBD detection programs and human
demographic histories, and found inference of segments longer than 2cM to be
much more reliable (less than 5% conflation rate). As an example of how this
can negatively affect downstream analyses, we present and analyze a novel
estimator of the de novo mutation rate using IBD segments, and demonstrate that
the biased length distribution of the IBD segments due to conflation can lead
to inflated estimates if the conflation is not modeled. Understanding the
conflation effect in detail will make its correction in future methods more
tractable
Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data
Since most analysis software for genome-wide association studies (GWAS)
currently exploit only unrelated individuals, there is a need for efficient
applications that can handle general pedigree data or mixtures of both
population and pedigree data. Even data sets thought to consist of only
unrelated individuals may include cryptic relationships that can lead to false
positives if not discovered and controlled for. In addition, family designs
possess compelling advantages. They are better equipped to detect rare
variants, control for population stratification, and facilitate the study of
parent-of-origin effects. Pedigrees selected for extreme trait values often
segregate a single gene with strong effect. Finally, many pedigrees are
available as an important legacy from the era of linkage analysis.
Unfortunately, pedigree likelihoods are notoriously hard to compute. In this
paper we re-examine the computational bottlenecks and implement ultra-fast
pedigree-based GWAS analysis. Kinship coefficients can either be based on
explicitly provided pedigrees or automatically estimated from dense markers.
Our strategy (a) works for random sample data, pedigree data, or a mix of both;
(b) entails no loss of power; (c) allows for any number of covariate
adjustments, including correction for population stratification; (d) allows for
testing SNPs under additive, dominant, and recessive models; and (e)
accommodates both univariate and multivariate quantitative traits. On a typical
personal computer (6 CPU cores at 2.67 GHz), analyzing a univariate HDL
(high-density lipoprotein) trait from the San Antonio Family Heart Study
(935,392 SNPs on 1357 individuals in 124 pedigrees) takes less than 2 minutes
and 1.5 GB of memory. Complete multivariate QTL analysis of the three
time-points of the longitudinal HDL multivariate trait takes less than 5
minutes and 1.5 GB of memory
- …