6,054 research outputs found
Statistical Modeling of Epistasis and Linkage Decay using Logic Regression
Logic regression has been recognized as a tool that can identify and model non-additive genetic interactions using Boolean logic groups. Logic regression, TASSEL-GLM and SAS-GLM were compared for analytical precision using a previously characterized model system to identify the best genetic model explaining epistatic interaction of vernalization-sensitivity in barley. A genetic model containing two molecular markers identified in vernalization response in barley was selected using logic regression while both TASSEL-GLM and SAS-GLM included spurious associations in their models. The results also suggest the logic regression can be used to identify dominant/recessive relationships between epistatic alleles through its use of conjugate
operators
Statistical Modeling of Epistasis and Linkage Decay using Logic Regression
Logic regression has been recognized as a tool that can identify and model non-additive genetic interactions using Boolean logic groups. Logic regression, TASSEL-GLM and SAS-GLM were compared for analytical precision using a previously characterized model system to identify the best genetic model explaining epistatic interaction for vernalization-sensitivity in barley. A genetic model containing two molecular markers identified in vernalization response in barley was selected using logic regression while both TASSEL-GLM and SAS-GLM included spurious associations in their models. The results also suggest the logic regression can be used to identify dominant/recessive relationships between epistatic alleles through its use of conjugate operators
A New Advanced Backcross Tomato Population Enables High Resolution Leaf QTL Mapping and Gene Identification.
Quantitative Trait Loci (QTL) mapping is a powerful technique for dissecting the genetic basis of traits and species differences. Established tomato mapping populations between domesticated tomato (Solanum lycopersicum) and its more distant interfertile relatives typically follow a near isogenic line (NIL) design, such as the S. pennellii Introgression Line (IL) population, with a single wild introgression per line in an otherwise domesticated genetic background. Here, we report on a new advanced backcross QTL mapping resource for tomato, derived from a cross between the M82 tomato cultivar and S. pennellii This so-called Backcrossed Inbred Line (BIL) population is comprised of a mix of BC2 and BC3 lines, with domesticated tomato as the recurrent parent. The BIL population is complementary to the existing S. pennellii IL population, with which it shares parents. Using the BILs, we mapped traits for leaf complexity, leaflet shape, and flowering time. We demonstrate the utility of the BILs for fine-mapping QTL, particularly QTL initially mapped in the ILs, by fine-mapping several QTL to single or few candidate genes. Moreover, we confirm the value of a backcrossed population with multiple introgressions per line, such as the BILs, for epistatic QTL mapping. Our work was further enabled by the development of our own statistical inference and visualization tools, namely a heterogeneous hidden Markov model for genotyping the lines, and by using state-of-the-art sparse regression techniques for QTL mapping
Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance.
Mycobacterium tuberculosis is a serious human pathogen threat exhibiting complex evolution of antimicrobial resistance (AMR). Accordingly, the many publicly available datasets describing its AMR characteristics demand disparate data-type analyses. Here, we develop a reference strain-agnostic computational platform that uses machine learning approaches, complemented by both genetic interaction analysis and 3D structural mutation-mapping, to identify signatures of AMR evolution to 13 antibiotics. This platform is applied to 1595 sequenced strains to yield four key results. First, a pan-genome analysis shows that M. tuberculosis is highly conserved with sequenced variation concentrated in PE/PPE/PGRS genes. Second, the platform corroborates 33 genes known to confer resistance and identifies 24 new genetic signatures of AMR. Third, 97 epistatic interactions across 10 resistance classes are revealed. Fourth, detailed structural analysis of these genes yields mechanistic bases for their selection. The platform can be used to study other human pathogens
Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation
Volterra and polynomial regression models play a major role in nonlinear
system identification and inference tasks. Exciting applications ranging from
neuroscience to genome-wide association analysis build on these models with the
additional requirement of parsimony. This requirement has high interpretative
value, but unfortunately cannot be met by least-squares based or kernel
regression methods. To this end, compressed sampling (CS) approaches, already
successful in linear regression settings, can offer a viable alternative. The
viability of CS for sparse Volterra and polynomial models is the core theme of
this work. A common sparse regression task is initially posed for the two
models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type
algorithm is developed for sparse polynomial regressions. The identifiability
of polynomial models is critically challenged by dimensionality. However,
following the CS principle, when these models are sparse, they could be
recovered by far fewer measurements. To quantify the sufficient number of
measurements for a given level of sparsity, restricted isometry properties
(RIP) are investigated in commonly met polynomial regression settings,
generalizing known results for their linear counterparts. The merits of the
novel (weighted) adaptive CS algorithms to sparse polynomial modeling are
verified through synthetic as well as real data tests for genotype-phenotype
analysis.Comment: 20 pages, to appear in IEEE Trans. on Signal Processin
- …